Best django questions in March 2011

What is a good Django workflow?

13 votes

I'm a beginner to Python and Django.

When starting a new project what do you do first before diving into the code?

For example, one could take the following steps:

  1. Configure the settings.py file first
  2. Configure models.py to lay out data structure
  3. Create template files
  4. Define the views/pages
  5. Syncdb
  6. etc

So my question is, what is a good workflow to get through the required steps for a Django application? This also serves as a checklist of things to do. In the definitive guide to Django, the author talks about approaching top down or bottom up. Can anyone expand further on this and perhaps share their process?

Thanks.

Follow the Agile approach. Finish one small case, from the start to the end. From the models to the tests to user experience. Then build on it. Iterate.

Thats the right way to software development.

To do it efficiently, you need: (don't bother right away, you will need it.)

Automated schema migration, automated build system, auto updating and deployment. - None of these, django has got anything to do with. Use pip, fabric, hudson, twill and south appropriately.

Take care not to over burden yourself with all these right away, particularly since you say, you are beginning.

Best Practice for Maintaining Consistent Third Party Libraries Amongst all Developers

9 votes

We have some "semi-technical" people doing design, art, CSS/HTML work etc. on a Django project--while we are simultaneously doing active backend development work. Our project is dependent on many different 3rd party libraries--e.g., PyYAML, django-registration.

We use Git for version control and it's reasonable for these semi-technical folks to do git pulls and git pushes. But when we make changes to 3rd party libraries is when we run into trouble. Having to help these users diagnose and maintain their library problems is a hassle.

e.g., we started using "Django Model Utils", and one of our CSS/HTML frontend guys does a git pull, tries to start his development environment and sees an error like ImportError: No module named model_utils.models and is at a loss for what to do. My options from that point are to either explain to him what library he needs, have him download, untar, install himself and hope everything works fine, or prepare and email him the exact files he needs.

What is the best way to deal with the "keeping all developers' third party libraries consistent" problem?

Ideally there would be some script like python update_dependencies.py which would live in our git repo. All we would have to do would be to maintain a list of the needed libraries and version numbers. Then when the script was run it would auto-magically go grab those libraries. Does that exist?


Just as an FYI this is normally how I keep my project organized...

My Django project lives in a directory...

./my_django_project/  # <-- The stuff in here is under our version control.

And I like to put my 3rd party libraries here...

./my_django_project_lib/

With a symlink at ./my_django_project/lib/ pointing to ./my_django_project_lib/. Then in my ./my_django_project/settings.py file I'll add the lib directory to my Python sys.path.

Then whenever I need to add a library or update a version I will usually build the library with python setup.py build and manually move the built library into the ./my_django_project_lib. This helps me keep track of exactly what dependencies a given project has. Is there, in general, a better way to do this?

I recommend using PIP with its requirements files, sounds exactly what you need. I use it in every project. You can set the required packages and their respective sources (got/svn/pypi), even the exact commit you need.

Then, to update your local libraries to the versions required by the new commit on github, just pull and do "pip install -r requirements.txt" on requirements.txt file living in the root of your git repo.

You can read about it here.

I'm not sure whether it can put the built packages in a preset directory, but it works incredibly well with Pythons virtualenv, which I also recommend.

Django Setup Default Logging

8 votes

I can't seem to figure out how to setup a "default" logger for my Django installation. I would like to use Django 1.3's new LOGGING setting in settings.py.

I've looked at the Django Logging Doc's example, but it looks to me like they only setup handlers which will do logging for particular loggers. In the case of their example they setup handler for the loggers named 'django','django.request', and 'myproject.custom'.

All I want to do is setup a default logging.handlers.RotatingFileHandler which will handle all loggers by default. i.e., if I make a new module somewhere in my project and it is denoted by something like: my_app_name.my_new_module, I should be able to do this and have all logging goto the rotating file logs.

# In file './my_app_name/my_new_module.py'
import logging
logger = logging.getLogger('my_app_name.my_new_module')
logger.debug('Hello logs!') # <-- This should get logged to my RotatingFileHandler that I setup in `settings.py`!

Figured it out...

You set the 'catch all' logger by referencing it with the empty string: ''.

As an example, in the following setup I have the all log events getting saved to logs/mylog.log, with the exception of django.request log events which will be saved to logs/django_request.log. Because 'propagate' is set to False for my django.request logger, the log event will never reach the the 'catch all' logger.

LOGGING = {
    'version': 1,
    'disable_existing_loggers': True,
    'formatters': {
        'standard': {
            'format': '%(asctime)s [%(levelname)s] %(name)s: %(message)s'
        },
    },
    'handlers': {
        'default': {
            'level':'DEBUG',
            'class':'logging.handlers.RotatingFileHandler',
            'filename': 'logs/mylog.log',
            'maxBytes': 1024*1024*5, # 5 MB
            'backupCount': 5,
            'formatter':'standard',
        },  
        'request_handler': {
                'level':'DEBUG',
                'class':'logging.handlers.RotatingFileHandler',
                'filename': 'logs/django_request.log',
                'maxBytes': 1024*1024*5, # 5 MB
                'backupCount': 5,
                'formatter':'standard',
        },
    },
    'loggers': {

        '': {
            'handlers': ['default'],
            'level': 'DEBUG',
            'propagate': True
        },
        'django.request': { # Stop SQL debug from logging to main logger
            'handlers': ['request_handler'],
            'level': 'DEBUG',
            'propagate': False
        },
    }
}

Any way to make {% extends '...' %} conditional? - Django

7 votes

Hi,

I would like to share a template between AJAX and regualr HTTP calls, the only difference is that one template needs to be served with the base.html html, the other one without.

Any idea?

Use a variable.

{% extends base_template %}

and in your view, set it to "base.html" in your view, or a new "ajax.html" file which just provides the block and nothing else.

Why doesn't memory get released to system after large queries (or series of queries) in django?

7 votes

First off, DEBUG = False in settings.py, so no, connections['default'].queries is not growing and growing until it uses up all of memory.

Lets start off with the fact that I've loaded the User table from django.contrib.auth.models.User with 10000 users (each named 'test#' where # is a number between 1 and 10000).

Here is the view:

from django.contrib.auth.models import User
from django.http import HttpResponse

import time

def leak(request):
    print "loading users"

    users = []
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())
    users += list(User.objects.all())

    print "sleeping"
    time.sleep(10)

    return HttpResponse('')

I've attached the view above to the /leak/ url and start the development server (with DEBUG=False, and I've tested and it has nothing to do with running a development server vs other instances).

After running:

% curl http://localhost:8000/leak/

The runserver process' memory grows to around the size seen from ps aux output below and then stays at that level.

USER       PID %CPU %MEM    VSZ    RSS TTY      STAT START   TIME COMMAND
dlamotte 25694 11.5 34.8 861384 705668 pts/3    Sl+  19:11   2:52 /home/dlamotte/tmp/django-mem-leak/env/bin/python ./manage.py runserver

Then running the above curl command above does not seem to grow the instance's memory usage (which I expected from a true memory leak?), it must be re-using the memory? However, I feel that there is something wrong here that the memory does not get released to the system (however, I understand that it may be better performance that python does NOT release the memory).

Following this, I naively attempted to see if python would release large chunks of memory that it allocated. So I attempt the following from a python session:

>>> a = ''
>>> a += 'aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa' * 10000000
>>> del a

The memory is allocated on the a += ... line as expected, but when del a happens, the memory is released. Why is the behavior different for django query sets? Is it something that django is intending to do? Is there a way to change this behavior?

I've literally spent 2 days debugging this behavior with no idea where to go next (I've learned to use guppy AND objgraph which seem to not point to anything interesting that I can figure out).

UPDATE: This could be simply python memory management at work and have nothing to do with Django (suggested on django-users mailing list), but I'd like confirmation by somehow replicating this in python outside of Django.

UPDATE: Using python version 2.6.5

I decided to move my comments into an answer to make things clearer.

Since Python 2.5, the CPython memory allocation tracks internal memory usage by the small object allocator, and attempts to return completely free arenas to the underlying OS. This works most of the time, but the fact that objects can't be moved around in memory means that fragmentation can be a serious problem.

Try the following experiment (I used 3.2, but 2.5+ should be similar if you use xrange):

# Create the big lists in advance to avoid skewing the memory counts
seq1 = [None] * 10**6 # Big list of references to None
seq2 = seq1[::10]

# Create and reference a lot of smaller lists
seq1[:] = [[] for x in range(10**6)] # References all the new lists
seq2[:] = seq1[::10] # Grab a second reference to 10% of the new lists

# Memory fragmentation in action
seq1[:] = [None] * 10**6 # 90% of the lists are no longer referenced here
seq2[:] = seq1[::10] # But memory freed only after last 10% are dropped

Note, even if you drop the references to seq1 and seq2, the above sequence will likely leave your Python process holding a lot of extra memory.

When people talk about PyPy using less memory than CPython, this is a major part of what they're talking about. Because PyPy doesn't use direct pointer references under the hood, it is able to use a compacting GC, thus avoiding much of the fragmentation problem and more reliably returning memory to the OS.

Django or web.py, which is better to build a large website with Python?

6 votes

I'd like to use Python to build a website with more than 100,000 PV each day. Now what I concern is to choose which web framework. I know lots of people use Django, and some people use web.py. Django seems powerful, and I also like the simplicity of web.py. Which framework should I use? (Please introduce the performance and the maintenance complexity, thanks!) Can web.py build complicated applications? Are there other frameworks better than these two?

Django makes building complicated sites really simple. Before Django, I was messing around with PHP, and I was doing a really terrible job putting it together. Django leads you in the right direction with some good practices which makes your site really easy to maintain and update. I really like the ORM and how you can easily work with data from the database without having to write a single line of SQL. It makes development less of a slog.

I don't have any experience with web.py, and I can't compare the performance of the two. But you can't go wrong with Django at least.

Organizing Django unit tests

5 votes

Right now I have my Django unit tests living at mcif/tests.py. I would prefer to have something more like mcif/tests/foo_test.py, mcif/tests/bar_test.py, etc., but if I organize my tests that way, Django flips out.

Is there a way to do what I'm trying to do or do I have to have all my tests in one file?

Make a package: myapp/tests/

Within the package, put as many different testing modules as you want. In the __init__.py within tests, import the tests from those modules. (Or some variation on this theme.)

edit: Wow, didn't notice you already mentioned a tests package.

The important thing is to get everything available from the package. Django will get the tests from the package, so they have to be visible in __init__.py.

Fetching inherited model objects in django

5 votes

Hi, I have a django application with the following model:

Object A is a simple object extending from Model with a few fields, and let's say, a particular one is a char field called "NAME" and an Integer field called "ORDER". A is abstract, meaning there are no A objects in the database, but instead...

Objects B and C are specializations of A, meaning they inherit from A and they add some other fields.

Now suppose I need all the objects whose field NAME start with the letter "Z", ordered by the ORDER field, but I want all the B and C-specific fields too for those objects. Now I see 2 approaches:

a) Do the queries individually for B and C objects and fetch two lists, merge them, order manually and work with that.

b) Query A objects for names starting with "Z" ordered by "ORDER" and with the result query the B and C objects to bring all the remaining data.

Both approaches sound highly inefficient, in the first one I have to order them myself, in the second one I have to query the database multiple times.

Is there a magical way I'm missing to fetch all B and C objects, ordered in one single method? Or at least a more efficient way to do this than the both mentioned?

Thanks in Advance!

Bruno

If A can be concrete, you can do this all in one query using select_related.

from django.db import connection
q = A.objects.filter(NAME__istartswith='z').order_by('ORDER').select_related('b', 'c')
for obj in q:
   obj = obj.b or obj.c or obj
   print repr(obj), obj.__dict__ # (to prove the subclass-specific attributes exist)
print "query count:", len(connection.queries)

Explicitly set MySQL table storage engine using South and Django

5 votes

I'm running into an issue that South creates the DB table for a new model as INNODB when I migrate but creates the table as MYISAM when another developer runs their own migration.

The problem with this is that all my other tables are MYISAM so using the new tables leads to many foreign key constraint errors.

How can I explicitly make sure the table is created using MYISAM?

What could be causing the table to be created using a different storage engine in different environments?

To be sure that all migrations are always done using INNODB, you should set the storage engine as INNODB in the database definition directly, like thisĀ :

DATABASES = {
    'default': {
        ...
        'OPTIONS'  : { 'init_command' : 'SET storage_engine=INNODB', },
    }

But you should know that it can have a performance hit. So you may want to set this option only when running migrations.

Naming variable, best convention

5 votes

What is the most used convention for naming variables in Python / Django? ex: pub_date or pubdate

What about for classes and methods?

Django's coding style

Python: Save dynamically created object types

5 votes

Hi,

I'm creating some object types dynamically using type function. Ex

return type('DynamicType', (object,), dict)

The dict depends on user input. Now I want that I should be able to save this returned class type and use the same one over different sessions. One possible method is to save the dict as text(or into database) and creating this object type again from that dict. But is there any other way in which I can save the "type" directly?

How about creating a Factory class with methods to create, pickle, and unpickle dynamically created type objects? The following is a rough start. To use, simply replace calls to pickle.dump(type, fh) with TypeFactory.pickle(type, fh), and replace calls to pickle.load(fh) with TypeFactory.unpickle(fh).

import pickle

class TypeFactory(object):
    def __init__(self):
        pass
    @staticmethod
    def create_type(name='DynamicType', dict={}):
        return type(name, (object,), dict)
    @staticmethod
    def pickle(t, fh):
        dict = t.__dict__.copy()
        name = t.__name__
        for key in dict.keys():
            if key.startswith('__') and key.endswith('__'):
                del dict[key]
        pickle.dump((name, dict), fh)
    @classmethod
    def unpickle(cls, fh):
        name, dict = pickle.load(fh)
        return cls.create_type(name, dict)

Django: the role of the project name

5 votes

Hello!

I am thinking of stating a new Django project and I have to choose a project name now, so I can type:

djangoadmin startproject <something>

This raises doubts, I'm not sure of the name, and I think that I might want to change it in the future. So, I have two questions:

  • What role does project name play in the project code and deployment?
  • What steps do I need to take to change my project's name?

Thank you!

Main project name is used as a base for your namespace.

By default, you will have in settings.py line: "ROOT_URLCONF = 'something.urls'".

To change a project name, you need to change every single import that is referring to it.

Of course you can always use modules without 'something' prefix, then you must ensure that there will be no name/namespace conflict between modules. I'm using this option, because I can have same code in a few copies without additional hassle.

Handling PosgreSQL serial field type in South

4 votes

I am using a legacy database which does a couple things that make sense in a db way, but not sure how to represent them in Django so that South and Django itself can deal with them.

I have a Parts table with PartCode as the key I have a Vendor table with VendorCode as the key

I have a PartsVendor table with FK's to Parts and Vendor, as well as additional information about the relationship. I am using the "through" parameter so it stands on it's own, but it uses the PartCode+VendorCode as a composite key, something not supported in Django. Only when using South or functions like dumpdata where it wants to see a primary key do I run into an issue. However, those are pretty big issues.

My temporary solution was to just add an _id field as AutoField and added a serial field in Postgres which works fine, but then when using South it chokes on the fact that is default=False and NOT NULL is true.

I've gone down the path of trying to write a custom field, but this seemed like a dead end since I am not actually changing anything about the field type.

In PostgreSQL, serial type isn't actually a regular type.

What serial does is set an integer field up with the default value of the next number in a sequence. The sequence is stored elsewhere in the database (and can be manually created as well)

I've not tested this, but it would seem logical serial fields would all be represented as Integers in Django. Apply the default attribute to the Field and leave it off your inserts.

I hope that helps. :)

Single installation, multiple domains and apps?

4 votes

I am planning to use Django for a multi-site project, where each site is mostly independent, but would share a few models across all the sites.

I am wondering if there is a way to make each Django 'app' it's own site, complete with unique domain name, and still allow each site to access a common app that contains some models for user accounts, profiles, etc.

The plan is to allow single sign on to each site, sharing the account information via the common app, and creating cookies for each site once the user logs in.

I know Django has a 'sites' feature, but i'm not sure if this is robust enough for my needs.

Can anyone recommend a way to do this or point me towards any articles that might help?

UPDATE

Just wondering, would it be possible via apache, and possibly some modification to the urls.py to be able to point a certain domain to a url structure?

for example, lets say the main site is mainsite.com, and i want one of the other domains to point to mainsite.com/secondarysite where secondarysite is a django app within the same instance, and have apache mask the fact that the secondarysite.com domain is actually pointing to a different location?

Re. your update: You don't need to fake the URL structure like that, unless you also want URLs like mainsite.com/secondarysite to be directly available to the user.

I'll assume you're using name-based virtual hosting. One simple but very flexible approach would be to have each <VirtualHost> directive invoke a different wsgi script via the usual mod_wsgi configuration directives, and then each wsgi script can set os.environ['DJANGO_SETTINGS_MODULE'] pointing to a different settings file.

Each of those settings files can have a different ROOT_URLCONF, so you can configure views at different paths if you need to. Leverage the include mechanism.

If you want both domains to use the same database for everything, just have both settings files load the database config from a third file, eg. settings_shared.py. Or if you want to route some models to a shared database, and others not, that's possible too. It's easy to imagine how that could be configured for each domain:

from settings_shared import DATABASES_SHARED
DATABASES = DATABASES_SHARED.copy()
DATABASES.update({ ... })

This approach takes care of configuring URLs and databases for each domain. But it doesn't take care of single sign-on.

For the SSO portion, it really depends what sort of user experience you want and how much time you have ;)

Try searching for "django sso". There are a lot of relevant questions here, eg: Implementing Single Sign On (SSO) using Django, How to build a secure Django single signon between different sites?, Django + Google SSO openid, Integrating Django and .Net applications using Single Sign On (SSO), (Django) Sharing authentication across two sites that are on different domains

What are the advantages of using metaclasses in django-like form implementations?

4 votes

First a little background ... I was going over the Django source code for forms to understand the implementation of forms in Django (and to learn some Python along the way). Django implements forms using a DeclaredMetaFields MetaClass.

Here is a very crude class diagram of a Django-like form implementation (link to sample code in gist).

Django-form-like implementation of ContactForm - Class Diagram

And here is an instance diagram.

Django-form-like implementation of ContactForm - Instance Diagram

Here is a very crude implementation the same class without resorting to meta-classes (link to sample code in gist).

A simple and crude implementation of ContactForm

I understand the metaclass concepts etc. and understand how the Django code works. Now for the questions.

  1. Other than the obvious benefits such as syntactical elegance etc. are there any other benefits for the meta-class implementation?
  2. Is the meta-class like implementation possible without resorting to an intermediate object like BoundField?

Well syntactical benefits matters a lot. After all even classes in OOP languages is just a syntactical benefit of the language.

In your example of very crude implementation of meta-class-less form implementation you describe fields in Dict. Well you might have overlooked that it is actually SortedDict, because ordering of fields matters. So I'll need to define fields_order list as well.

Next big thing is ModelForm. Meta-class approach allows to simply say which Model do I use and which fields in Meta attribute and it automatically creates and maps fields to model. Without Metaclass approach I would probably have to use something like CreateFieldsFromModel and MapFieldsToModel. Or you might do that for me in __init__ method. But wait, __init__ method is already complex enough with lots of arguments like data, initial, files, and more.

class MyForm(Form):
    fields = {
        'one': ...
        ...
    }
    fields_order = [...]
    model_fields = ???

Class MyForm2(MyForm):
    fields = MyForm.fields + {...}

# ... hey this API sucks! I think I'll go with another framework.

And there are many more things which can be configured in forms and everything is documented.

So for me, because of huge configuration logic, it looks like Form just asks to be implemented through definition-object and factory-logic. And here comes python with its metaclasses to hide the factory from the user. And it is cool because it makes beginners to think less.

And well yea its syntactical sugar all around and its all about making easy to use API.

And yes it is possible not to use Metaclasses/BoundField or whatever else. In the end it is possible to write all implementation of forms in one function and have all definition in one big dict (or maybe lets make some use out of overused xml?) But will that be easy to use, easy to understand, easy to extend?

customize django runserver output

3 votes

I want edit out put of django runserver ... i want add now object address like apps.views.index

add all query in this request

how can change code for this setting?

i think best way is use logging and add some code to this

like

from django.db import connection
sql=connection.queries

and

doc = {
                'record_hash': hash,
                'level': record.level,
                'channel': record.channel or u'',
                'location': u'%s:%d' % (record.filename, record.lineno),
                "message": record.msg,
                'module': record.module or u'<unknown>',
                'occurrence_count': 0,
                'solved': False,
                'app_id': app_id,
                'sql': sql,
            }

read more about this in http://docs.djangoproject.com/en/dev/topics/logging/