Best django questions in November 2011

What is the best HTML approach when form inputs are spread throughout the page?

11 votes

I am building a faceted search system that has inputs in a sidebar (the facets are check boxes), and an input in the header of the page (the main query box). All of these inputs are submitted simultaneously when the user submits a search.

The only way I can think of to make this work is to wrap the entire page in an HTML form tag. Something like the following pseudo-html:

<form>
  <div id='header'>
    <logo/>
    <input id='q'/>
    <!-- a bunch more stuff -->
  </div>
  <div id='sidebar'>
    <div id='sidebar-facets-subsection'>
      <input id='facet1'/>
      <input id='facet2'/>
      <input id='facet3'/>
      <!-- a bunch more stuff -->
    </div>
    <div id='sidebar-form-subsection'>
      <form id='unrelated-form'>
        <input id='unrelated-input-1'/>
        <input id='unrelated-input-2'/>
      </form>
    </div>
  </div>
  <!-- a bunch more stuff -->
</form>

This would work, except for three things:

  1. I need to use other forms in the page, as I've indicated above.
  2. I use different django templates to generate the header and the sidebar, making the templates have dependencies on each other.
  3. It's a real mess since the sidebar is in reality about 100 lines, not three.

Is there a more clever way of doing this that I'm not aware of, or is creating huge HTML forms the norm? In circumstances like this, is it better to use Javascript to somehow generate the input entries in a more normal form? Or is that the only option?

Any creative solutions or ideas?

You can make it work with Javascript without sacrifying accesibility

  1. Put all the checkboxes in the header and wrap them in div
  2. Set up and empty but clean side bar
  3. Using Javascript, move you checkboxes from the header into the side bar
  4. Attach a callback to the form.submit event, and when the user submit the form, cancel the event then, take the data from the search field and the checkboxes and send it as an Ajax POST request.

Using a framework like jQuery, it's a 15 minutes job.

If the user has JS enable, the form will post the request and everything will work. If the user doesn't have javascript enable, the checkboxes will be in the header and so they will work, at just the price of a slightly less elegant design.

But people with Javascript disable are used to design changes so it's ok.

Simple query working for years, then suddenly very slow

7 votes

I've had a query that has been running fine for about 2 years. The database table has about 50 million rows, and is growing slowly. This last week one of my queries went from returning almost instantly to taking hours to run.

Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)).latest('id')

I have narrowed the slow query down to the Rank model. It seems to have something to do with using the latest() method. If I just ask for a queryset, it returns an empty queryset right away.

#count returns 0 and is fast
Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)).count() == 0
Rank.objects.filter(site=Site.objects.get(profile__client=client, profile__is_active=False)) == [] #also very fast

Here are the results of running EXPLAIN. http://explain.depesz.com/s/wPh

And EXPLAIN ANALYZE: http://explain.depesz.com/s/ggi

I tried vacuuming the table, no change. There is already an index on the "site" field (ForeignKey).

Strangely, if I run this same query for another client that already has Rank objects associated with her account, then the query returns very quickly once again. So it seems that this is only a problem when their are no Rank objects for that client.

Any ideas?

Versions: Postgres 9.1, Django 1.4 svn trunk rev 17047

Well, you've not shown the actual SQL, so that makes it difficult to be sure. But, the explain output suggests it thinks the quickest way to find a match is by scanning an index on "id" backwards until it finds the client in question.

Since you said it has been fast until recently, this is probably not a silly choice. However, there is always the chance that a particular client's record will be right at the far end of this search.

So - try two things first:

  1. Run an analyze on the table in question, see if that gives the planner enough info.
  2. If not, increase the stats (ALTER TABLE ... SET STATISTICS) on the columns in question and re-analyze. See if that does it.

http://www.postgresql.org/docs/9.1/static/planner-stats.html

If that's still not helping, then consider an index on (client,id), and drop the index on id (if not needed elsewhere). That should give you lightning fast answers.

Django admin search/filter functionality as a page table

6 votes

I was wondering if there is a way to use the power of Django Admin's filtering/ordering/paginating/search capabilities in a regular view.

What I mean is that I have a model, some fields on it. I'd like to have a "search" form, where fields would be defined much like using admin.ModelAdmin. User would be able to search (using provided fields), filter by values, paginate through pages of result table etc. All that without minimal amount of work on my part, eg. just configuration which fields should be used in the form. Something like this:

class SchoolAdmin(ModelAdmin):
    list_display = ('id', 'name', )
    list_display_links = ('name', )
    search_fields = ('name', )
    list_filter = ('type', )

Is there something like this available? Or do I have to code it myself?

Edit:

Features I require from such a plugin/application are:

  1. Display data as a table
  2. Sorting by columns
  3. Filtering (eg. "show only rows that has X = Y")
  4. Searching by columns
  5. Optionally configuration similar to ModelAdmin style

Alasdair's django-tables2 only matches 1st and 2nd conditions.

The django functionality you mention isn't really reusable in custom views as of Django 1.3. There was recently some discussion on the django-developers group about splitting out admin functionality to make it reusable.

I have come across two projects that might be useful to you, django-tables2 and django-filter. They both offer slightly different things, I think you're looking for a mixture of the two.

django-tables2

django-tables2 simplifies the task of turning sets of data into HTML tables. It has native support for pagination and sorting. It does for HTML tables what django.forms does for HTML forms

django-filter

Django-filter is a reusable Django application for allowing users to filter queryset dynamically. It requires Python 2.4 or higher. For usage and installation instructions consult the docs directory.

Django-filter can be used for generating interfaces similar to the Django admin's list_filter interface. It has an API very similar to Django's ModelForms.

Django Debug Toolbar: understanding the time panel

6 votes

I'm running the Django Debug Toolbar to profile my site and try to figure out why certain views are taking so long. It's been immensely valuable with regards to seeing what queries I'm running and how much they're costing me, but I can't understand how to read the time panel.

I've looked around everywhere for some documentation on this but can't seem to find anything. I should mention that I'm a self-taught, relatively new programmer, so these may be terms that are assumed to be familiar to the experienced programmer.

Here is the output:

Resource         Value
User CPU time    3760.000 msec
System CPU time  340.000 msec
Total CPU time   4100.000 msec
Elapsed time     4625.453 msec
Context switches 248 voluntary, 467 involuntary

Can anyone help me figure out how to read this, and what each of the values represents?

Thanks.

User CPU time: The time it took your browser receive response and render the page

System CPU time: The time it took the server to create and send the response

Total CPU time: total time to fully respond once request was received (user+system)

Elapsed time: Time since request was made.

Context switches: This has to do with threads. Voluntary switches are times when a thread slept of its own accord (usually to wait for some of processing to occur that it needs to continue), whereas involuntary switches are times when the system forced a thread to sleep in order to run some other thread (usually part of asynchronous processes). It's actually pretty low-level system stuff, that I couldn't do justice to here. If you're interested in learning more, just search for "context switching".

Django Internationalization

5 votes

I have a similar issue as found here Why doesn't Django produce locale files from template files in another directory?

However I don't understand the solution. My structure:

Project
   App1
      locale
      templates
   App2
      locale
      templates
   templates
      somefilethatneedstranslation.html

Now when I run this command from App1:

python ../manage.py App1 -l nl

It nicely creates a po file for the App1 templates in the App1 locale folder

However I want my global templates to be translated aswell.. note: I do NOT want a locale folder in my project root, so I tried adding a symlink to the templates folder from App1 but it does not append the translation results to the App1/locale/po file

from the App1 folder

ln -s ../templates/locale/* translations
python ../manage.py App1 -l nl --symlinks

What am I missing?

note:

from the templates folder

python ../manage.py templates -l nl

could work, but it won't because obviously templates is not an installed app, it seems I am missing the obvious...

The full deprecation message (which is also explained in the translation docs) is:

Translations in the project directory aren't supported anymore. LOCALE_PATHS setting instead.

This message is perhaps a bit unclear. While automatic discovery of translations in the project directory is deprecated, the use of LOCALE_PATHS to reference a project-level locale folder is totally acceptable.

If you have project-level templates, it doesn't make sense to have these templates translated in an app-specific locale location: keep a project-level locale directory, reference it in LOCALE_PATHS.

How to run own daemon processes with Django?

5 votes

In my Django project I have to do repeatedly some processing in the background. This processing needs access to Django stuff, so I put it into Django's commands and run it as cronjob. Right now I realize, that I have to do some of them more frequently (cronjob has limitation to invoke command at most every 1 minute). Another problem is that I don't have enough control, to protect running the same command in one time. It's happen when one processing takes longer than one minute. I think that I should run them like daemons, but I am looking for pure way to do it with Django. Have you ever faced with this problem or know any clean solution for it?

We do a lot of background processing for django using Celery http://celeryproject.org/. It requires some effort to set up and there is a bit of a learning curve, but once it's up and running it's just awesome.

How can I tell Django templates not to parse a block containing code that looks like template tags?

5 votes

I've got some html files that include templates to be used by jQuery.tmpl. Some tmpl tags (like {{if...}}) look like Django template tags and cause a TemplateSyntaxError. Is there a way I can specify the Django template system should ignore a few lines and output them exactly as they are?

The built-in way would be to manually escape each template item with the templatetag template tag ( https://docs.djangoproject.com/en/1.3/ref/templates/builtins/#templatetag ), but I suspect that that's not what you want to do.

What you really want is a way to mark a whole block as raw (rather than interpretable) text, which requires a new custom tag. You might want to check out the raw tag here: http://www.holovaty.com/writing/django-two-phased-rendering/

Django: sorl-thumbnail and easy-thumbnail in same project

5 votes

I'm working on and project that uses two separate modular Django apps. However, one app requires easy-thumbnails and the other requires sorl-thumbnails. Unfortunately, the two thumbnail libraries make use of the template tag syntax {% load thumbnail %}, so they clash and break when a template using them tries to render.

Are there any approaches to solve this type of clash? (For example, a template option does to the effect of {% load thumbnail as easy_thumbnail %}). Am I going to have to fork one of the apps and replace one of the thumbnail libraries with another? If so, which should I choose to go with?

Thank you for considering my question, Joe

Sure, just write your own stub easy_thumbnail wrapper...

  1. Create a thumbnailtags package in one of your django apps...
  2. ...making sure it's got an empty __init__.py
  3. In thumbnailtags/easy_thumbnail.py do something like:

    from django.template import Library
    from easy_thumbnails.templatetags import thumbnail
    
    register = Library()    
    
    def easy_thumbnail(parser, token):
        return thumbnail(parser, token)
    
    register.tag(easy_thumbnail)
    
  4. Use {% load easy_thumbnail %}

Note:

You might also be able to do 'import thumbnail as easy_thumbnail, and skip the def easy_thumbnail bit, tho I've not tried that.

Pros and cons of celery vs disco vs hadoop vs other distributed computing packages

5 votes

I'm working on a web application (built in django) intended for medium-to-large-scale data analysis. I'm imagining using a task queue feeding to a bank of servers, or maybe on-demand EC2 instances to handle the load.

Before getting into development, I'm trying to decide which package(s) to use for distributed computing. I've looked into celery, disco, and mapreduce on hadoop -- and they all look pretty good.

What advice can you offer on the pros and cons of different systems?

Which ones...?

  • ...are easiest to work with in python/django?
  • ...play nice with each other?
  • ...tend to work best for which tasks?
  • ...impose restrictions on other aspects of the system architecture (e.g. database design)?
  • ...have the largest user bases and best documentation?
  • ...have the steepest learning curves?

BTW, I've built multi-core and client-server applications using python's multiprocessing, but that's the extent of my practical experience with distributed computing. Most of the theory is familiar, but I've never used any of the packages mentioned here.

First of all, I have no experience with disco and little experience with hadoop. Then, to answer your questions one by one:

  • are easiest to work with in python/django?

    Celery is the winner. It has straightforward integration with django via django-celery, while feature-rich and simple to use. I assume that disco comes second ( you write python code ) and hadoop comes last ( you can write python code, but in obscure ways).

  • play nice with each other?

    Everybody can play nice with others, provided that there exists a common layer on which they can communicate ( XML, JSON, whatever...).

  • tend to work best for which tasks?

    Disco and hadoop use the mapReduce paradigm, and the word "big data" comes in mind. If you have lots of data and you want to perform some processing on all of them, then mapReduce is an optimal solution. Celery is a distributed task queue, which is more "open and agile" in the ways you can implement distributed processings/schemas.

  • impose restrictions on other aspects of the system architecture (e.g. database design)?

    I don't believe that there exists any (serious) restriction for any of the contestants (please correct me if I am wrong).

  • have the largest user bases and best documentation?

    Here hadoop is probably the winner. Celery has a decent community and lots of stackoverflow questions :). I don't know for Disco.

  • have the steepest learning curves?

    I believe that Celery has the steepest learning curve, seriously.. Hadoop is a bit tricky.. Don't know for Disco, but I suspect it's in the middle.

    To sum up, if you want a great pythonic tool for general distributed processing, easy to use and fast to learn, with full django-integration, go with Celery. On the other hand, if your data "cry" for mapReduce, then follow your heart..

Profiling on live Django server?

4 votes

I've never done code coverage in Python, but I'm looking for something like GCC's gcov, which tells me how many times each line executes, or Apple's Shark which gives a hierarchial breakdown of how long each function is taking.

My problem is that I have a live server which is experiencing high load, and I can't tell from the logs what's causing it. I would like to attach something to my Django instance to monitor which lines are the hottest and/or which functions are taking the longest time.

This is something like, but not exactly, code coverage. I would like to introduce it to a live running server, preferably without modifying too much.

Ideas?

cProfile + RunSnakeRun: http://www.vrplumber.com/programming/runsnakerun/

Examples of Django and Celery: Periodic Tasks

4 votes

I have been fighting the Django/Celery documentation for a while now and need some help.

I would like to be able to run Periodic Tasks using django-celery. I have seen around the internet (and the documentation) several different formats and schemas for how one should go about achieving this using Celery...

Can someone help with a basic, functioning example of the creation, registration and execution of a django-celery periodic task? In particular, I want to know whether I should write a task that extends the PeriodicTask class and register that, or whether I should use the @periodic_task decorator, or whether I should use the @task decorator and then set up a schedule for the task's execution.

I don't mind if all three ways are possible, but I would like to see an example of at least one way that works. Really appreciate your help.

What's wrong with the example from the docs?

from celery.task import PeriodicTask
from clickmuncher.messaging import process_clicks
from datetime import timedelta


class ProcessClicksTask(PeriodicTask):
    run_every = timedelta(minutes=30)

    def run(self, **kwargs):
        process_clicks()

You could write the same task using a decorator:

from celery.task.schedules import crontab
from celery.task import periodic_task

@periodic_task(run_every=crontab(minute="*/30"))
def process_clicks():
    ....

The decorator syntax simply allows you to turn an existing function/task into a periodic task without modifying them directly.

For the tasks to be executed celerybeat must be running.

Errors When Installing MySQL-python module for Python 2.7

4 votes

I'm currently trying to build and install the mySQLdb module for Python, but the command

python setup.py build

gives me the following error

running build running build_py copying MySQLdb/release.py -> build/lib.macosx-10.3-intel-2.7/MySQLdb error: could not delete 'build/lib.macosx-10.3-intel-2.7/MySQLdb/release.py': Permission denied

I verified that I'm a root user and when trying to execute the script using sudo, I then get a gcc-4.0 error:

running build running build_py copying MySQLdb/release.py -> build/lib.macosx-10.3-fat-2.7/MySQLdb running build_ext building '_mysql' extension gcc-4.0 -fno-strict-aliasing -fno-common -dynamic -g -O2 -DNDEBUG -g -O3 -Dversion_info=(1,2,3,'final',0) -D__version__=1.2.3 -I/usr/local/mysql/include -I/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -c _mysql.c -o build/temp.macosx-10.3-fat-2.7/_mysql.o -Os -g -fno-common -fno-strict-aliasing -arch x86_64 unable to execute gcc-4.0: No such file or directory error: command 'gcc-4.0' failed with exit status 1

Which is odd, because I'm using XCode 4 with Python 2.7. I've tried the easy_install and pip methods, both of which dont work and give me a permission denied error on release.py. I've chmodded that file to see if that was the problem but no luck. Thoughts?

Make sure that gcc-4.0 is in your PATH. Also, you can create an alias from gcc to gcc-4.0.

Take care about 32b and 64b versions. Mac OS X is a 64b operating system and you should right flags to make sure you're compiling for 64b architecture.

Embedding Vs Linking in MongoDB.when to embed and when to link?

4 votes

I read this page but didn't get when to use embedding feature and when to use linking.I have a project in django for which I am using MongoDB.In my models.py file I have following models:

class Projects(models.Model):
    projectName =models.CharField(max_length = 100,unique=True,db_index=True)
    projectManager = EmbeddedModelField('Users')

class Teams(models.Model):
    teamType = models.CharField(max_length =100)
    teamLeader = EmbeddedModelField('Users')
    teamProject = EmbeddedModelField('Projects')
    objects = MongoDBManager()

class Users(models.Model):
    name = models.CharField(max_length = 100,unique=True)
    designation = models.CharField(max_length =100 )
    teams = ListField(EmbeddedModelField('Teams'))



class Tasks(models.Model):
    title = models.CharField(max_length = 150)
    description = models.CharField(max_length=1000)
    priority = models.CharField(max_length=20)
    Status = models.CharField(max_length=20)
    assigned_to = EmbeddedModelField('Users')
    assigned_by = EmbeddedModelField('Users')
    child_tasks = ListField()
    parent_task = models.CharField(max_length = 150)

My question is if we do embedding do we have to update the object in all models.Like if I want to update the name of a 'user' ,I would have to run update for models:Projects, Teams, Users and Tasks or linking would be better in my case?

First, conceptually, name your model classes as singular objects.

Users should be User, Teams should be Team...

Think of the model as the mold from which multiple objects will be made. User model will product Users and be stored in a table called Users where each document/row is a User object.

Now, regarding your question, hymloth is exactly right. The way to make it a reference to a document instead of an embedded one is to change those particular fields to reference the id of a user in the user's collection. That way you are just storing an id to lookup instead of a copy of the user document. When you change the reference document, it will be changed in all of the places it is referenced as well. (Typical relational association)

I didn't see a field for that in Django-mongoDB either but maybe you can use the traditional django ForeignKey field for this purpose. I don't know if you can mix and match so give it a shot.

for example, your Teams class would have a field like this:

teamLeader = ForeignKey(User)

Let me know if that works.

How would you mock a web app in Python (for testing a Django project)

4 votes

My app in Django scraps and imports data from another application's HTML. I tested each parsing function and would like to test the crawler that will go through the other application, too. After this, I'd like to make some integration tests. For making the tests as easy to run as possible, I want to mock the imported web application by creating a little web app that serves some hardcoded HTML and has all the paths I am going to go through.

EDIT: Also, my mock has to have some little dynamic behaviors - for example, for testing both failed and successful logins. So I cannot provide only static files.

How would you create such an mock application? Would you subclass BaseHTTPServer? CGI? Use some framework (as does twill, using Quixote)? Or would be reasonable to use Django for it? That is the solution I am cogitating to use, but Django seems to be too complex for such problem; OTOH, another framework would be a too heavy dependency for such little need, and BaseHTTPServer is just too raw to use.

2nd EDIT: I am not interested on mocking classes, requests etc. etc. That is not the approach I want to use, and a suggestion to use such approach is not an answer to me (Although I am grateful to the nice people who kindly suggested me that until now). If it is too hard to think about my question, just forget that I talked about tests - how would you crudely simulate a web application using Python in general?

I tried to follow @Gagandeep Singh solution. This seemed to be the best one, and probably is a good solution in other situations, but it did not work for me.

The problem is that I had a Django app inside the test directory of another Django app. When I ran the tests of my app with manage.py test myapp, the used settings.py was the one from the whole project, not the file for my mocking app. I was starting Django through the management API and using multiprocessing, so I bet part of my problem came from such a complex interaction. Maybe I could solve it, but I just decided for another strategy.

I decided to override BaseHTTPServer and got some acceptable results. This is not an easy task but I was successful on starting my mocking app.