Best django questions in April 2012

Do RESTful service parameters have to be discoverable?

8 votes

Preamble: My understanding of REST is shallow at best, so any corrections or clarifications to my questions are welcome.

I have a situation where I need the user of a RESTful service to submit an arbitrary real positive number. I therefore assume I shouldn't require it in the url, even though the returned object should be the same, and should use a parameter instead (or is this assumption wrong?).

Given that, to conform to REST, does the parameter have to be discoverable somehow? I haven't been able to find anything that makes this clear to me.

If not, I further assume that the parameter needs to be documented in some other way, thus locking (a portion of) your current api, which, as I understand it, is not desirable, as resources should be found by following hypertext links rather than hard coding locations (and parameters in this case).

Assuming that parameters do have to be discoverable, is there some way of doing this in tastypie/django?

The "discoverable" part of REST typically refers to new resources (represented by URIs) returned by the server that weren't present before. Client applications can then choose to interact with those resources at their leisure.

For example, my application might GET a /library URI that returns a representation of what's in my local library. The data returned (encoded as a particular JSON-based media type) might look like this:

{
   "printedBooks" : "/library/books",
   "audioTapes" : "/library/tapes"
}

Now let's say a few months later I do the same GET on the same URI, I might now be returned this payload:

{
   "printedBooks" : "/library/books",
   "audioTapes" : "/library/tapes",
   "magazines" : "/library/magazines"
}

Now there's a new link to a magazines resource which I can presumably GET to find out what kind of magazines are available. If my client application doesn't care about it, no big deal - it just keeps using the other two resources as before. At some point in the future I might code up support for magazine searches too.

But if I've written some kind of super dynamic fancy-shmancy client application that can automatically adapt to the presence of this new resource, the users of the application get to use it without any effort at all. No upgrade was required: the server offered new functionality and the client made the most of it. Web browsers work in this way (albeit with a human driving a large part of the "dynamism").

To your specific question about parameters: those are typically specified as part of the server's API documentation. I don't know of any API that specifies parameter syntax in an automated fashion that would allow the client to be 100% generic and adaptable.

A well-defined RESTful API stands on the shoulders of the already-specified technologies which it uses (HTTP, URIs, content type negotiation, headers, etc) and leverages rather than redefines them. That's why I know that I can probably do a GET on the /library/magazines URI and request a JSON-based encoding, and it's pretty likely that I'll succeed. I know that if I have the URI to a particular magazine (say, /library/magazines/1234) then I can attempt to remove it by calling DELETE on it, and I'll know if it succeeded based on the returned HTTP status code. I didn't have to read any documentation or use any coding magic to do any of these things because these are actions already specified in HTTP.

However, to create a new magazine by POSTing to /library/magazines, I need to know what the representation of my parameter data should look like, regardless of whether I pass those parameters within the URI or within the body of the POST. That is data that needs to be specified out-of-band in the API documentation.

So, to summarize: RESTful servers can send clients back new information that they hadn't previously seen, and that's the heart of discoverability in REST. But when a client wants to send data to a server, it needs to send data that the server will understand and accept, and that's typically described in documentation.

Can I slow down Django

5 votes

Simple question really

./manage.py runserver

Can I slow down localhost:8000 on my development machine so I can simulate file uploads and work on the look and feel of ajax uploading?

depending on where you want to simulate such you could simply sleep?

from time import sleep
sleep(500)

Django ORM query for friends of a user

4 votes

I am having trouble getting a Django ORM query to work correctly. I have this Friendship model:

class Friendship(models.Model):
    user1 = models.ForeignKey(User, related_name='friendships1')
    user2 = models.ForeignKey(User, related_name='friendships2')
    class Meta:
        unique_together = ('user1', 'user2',)

To find the friends for a given user, we have to check user1 and user2 because we can never be sure what side of the relationship they will be on. So, to get all friends for a given user, I use the following query:

user = request.user
User.objects.filter(
    Q(friendships1__user2=user, friendships1__status__in=statuses) |
    Q(friendships2__user1=user, friendships2__status__in=statuses)
)

This seems to me like it should work, but it does not. It gives me duplicates. Here is the SQL that it generates:

SELECT auth_user.*
FROM auth_user
LEFT OUTER JOIN profile_friendship ON (auth_user.id = profile_friendship.user1_id)
LEFT OUTER JOIN profile_friendship T4 ON (auth_user.id = T4.user2_id)
WHERE (
    (profile_friendship.status IN ('Accepted') AND profile_friendship.user2_id = 1 )
    OR (T4.user1_id = 1 AND T4.status IN ('Accepted'))
);

Here is the SQL that I want, which produces correct results:

SELECT f1.id as f1id, f2.id AS f2id, u.*
FROM auth_user u
LEFT OUTER JOIN profile_friendship f1 ON (u.id = f1.user1_id AND f1.user2_id = 1 AND f1.status IN ('Accepted'))
LEFT OUTER JOIN profile_friendship f2 ON (u.id = f2.user2_id AND f2.user1_id = 1 AND f2.status IN ('Accepted'))
WHERE f1.id IS NOT NULL OR f2.id IS NOT NULL

I know I can do this in a raw query, but then I don't think I'll be able to chain. Is there a nice clean way to do this without going raw?

Simple solution:

user = request.user
User.objects.filter(
    Q(friendships1__user2=user, friendships1__status__in=statuses) |
    Q(friendships2__user1=user, friendships2__status__in=statuses)
).distinct()

Anyone know any drawback?

Trouble using South with Django and Heroku

4 votes

I had an existing Django project that I've just added South to.

  • I ran syncdb locally.
  • I ran manage.py schemamigration app_name locally
  • I ran manage.py migrate app_name --fake locally
  • I commit and pushed to heroku master
  • I ran syncdb on heroku
  • I ran manage.py schemamigration app_name on heroku
  • I ran manage.py migrate app_name on heroku

I then receive this:

$ heroku run python notecard/manage.py migrate notecards
Running python notecard/manage.py migrate notecards attached to terminal... up, run.1
Running migrations for notecards:
 - Migrating forwards to 0005_initial.
 > notecards:0003_initial
Traceback (most recent call last):
  File "notecard/manage.py", line 14, in <module>
    execute_manager(settings)
  File "/app/lib/python2.7/site-packages/django/core/management/__init__.py", line 438, in execute_manager
    utility.execute()
  File "/app/lib/python2.7/site-packages/django/core/management/__init__.py", line 379, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/app/lib/python2.7/site-packages/django/core/management/base.py", line 191, in run_from_argv
    self.execute(*args, **options.__dict__)
  File "/app/lib/python2.7/site-packages/django/core/management/base.py", line 220, in execute
    output = self.handle(*args, **options)
  File "/app/lib/python2.7/site-packages/south/management/commands/migrate.py", line 105, in handle
    ignore_ghosts = ignore_ghosts,
  File "/app/lib/python2.7/site-packages/south/migration/__init__.py", line 191, in migrate_app
    success = migrator.migrate_many(target, workplan, database)
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 221, in migrate_many
    result = migrator.__class__.migrate_many(migrator, target, migrations, database)
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 292, in migrate_many
    result = self.migrate(migration, database)
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 125, in migrate
    result = self.run(migration)
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 99, in run
    return self.run_migration(migration)
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 81, in run_migration
    migration_function()
  File "/app/lib/python2.7/site-packages/south/migration/migrators.py", line 57, in <lambda>
    return (lambda: direction(orm))
  File "/app/notecard/notecards/migrations/0003_initial.py", line 15, in forwards
    ('user', self.gf('django.db.models.fields.related.ForeignKey')(to=orm['auth.User'])),
  File "/app/lib/python2.7/site-packages/south/db/generic.py", line 226, in create_table
    ', '.join([col for col in columns if col]),
  File "/app/lib/python2.7/site-packages/south/db/generic.py", line 150, in execute
    cursor.execute(sql, params)
  File "/app/lib/python2.7/site-packages/django/db/backends/util.py", line 34, in execute
    return self.cursor.execute(sql, params)
  File "/app/lib/python2.7/site-packages/django/db/backends/postgresql_psycopg2/base.py", line 44, in execute
    return self.cursor.execute(query, args)
django.db.utils.DatabaseError: relation "notecards_semester" already exists

I have 3 models. Section, Semester, and Notecards. I've added one field to the Notecards model and I cannot get it added on Heroku.

Thank you.

You must fake the migrations that create the tables, then run the other migrations as usual.

manage.py migrate app_name 000X --fake
manage.py migrate app_name 

With 000X being the number of the migration in which you create the table.

Django ORM, group by day

4 votes

I am trying to group products by DAY, however date_created is a datetime field.

Product.objects.values('date_created') \
               .annotate(available=Count('available_quantity'))

returns:

[
    {'date_created': datetime.datetime(2012, 4, 14, 13, 3, 6), 'available': 1},
    {'date_created': datetime.datetime(2012, 4, 14, 17, 12, 9), 'available': 1},
    ...
]

I want:

[
    {'date_created': datetime.datetime(2012, 4, 14), 'available': 2}, ...
]

edit: database backend MYSQL

Inspired by this question try this for mysql

Product.objects.extra(select={'day': 'date( date_created )'}).values('day') \
               .annotate(available=Count('date_created'))

How to prepare Django for a possible slashdotting?

4 votes

I would like to prepare my website for a possible influx in traffic. This is my first time using Django as a framework, so I'm unsure of the modifications that should be made to assure that I'm ready and won't go down. What are some of the common things one can do to prepare a Django website for production-level traffic?

I'm also wondering what to expect in terms of traffic numbers. I'm currently hosted at Webfaction with 600GB/month of traffic. Will this quickly run out? Are there statistics on how big 'slashdotted' events are?

  1. Use memcache and caching middleware.
  2. Be sure to offload serving statics.
  3. Use CDN for statics. This doesn't directly affect Django, but will reduce your network traffic.

Anything beyond that — read up what others are using:

What is the proper way to manually sequence a column in Postgres?

4 votes

I have a SaaS pet project for invoicing. In it, I want my clients to each start with ticket number 1001. Clearly, I can't use a simple auto field in Postgres and just add 1000 to the value, because all my clients will share the same database and the same tickets table.. I've tried using an integer column type and querying (pseudo SQL) SELECT LATEST number FROM tickets WHERE client_id = [current client ID] to get the latest number, and then using that number + 1 to get the next number. The problem is that with concurrency, it's easily possible for two tickets to end with the same number this way. the number I need to be able to do this within Django or with raw SQL (vs using Bash or anything else of the sort).

I'm not looking for a way to force my example to work. I'm just looking for a solution to my the problem of needing independently incrementing ticket numbers for each client.

I don't think there is a "cheap" solution to this problem. The only solution that is safe (but not necessarily fast) in a multi-user environment is to have a "counter" table with one row for each customer.

Each transaction has to first lock the customer's entry before inserting a new ticket, something like this:

UPDATE cust_numbers
  SET current_number = current_number + 1
WHERE cust_id = 42
RETURNING current_number;

That will do three things in one step

  1. increase the current "sequential" number for that customer
  2. lock the row so other transactions doing the same will have to wait for a lock
  3. return the new value of that column.

With that new number you can now insert a new ticket. If the transaction is committed, it will also release the lock on the cust_numbers table, thus other transactions "waiting for a number" can proceed.

You could wrap the two steps (update.. returning & the insert) into a single stored function so that the logic behind this is centralized. Your application would only call select insert_ticket(...) without knowing how the ticket number is generated.

You might also want to create a trigger on the customer table to automatically insert a row into the cust_numbers table when a new customer is created.

The disadvantage of this is that you effectively serialize the transactions that are inserting new tickets for the same customer. Depending on the volumn of inserts in your system this might turn out to be a performance problem.

Edit
Another disadvantage of this is, that you are not forced to insert tickets that way which might lead to problems if e.g. a new developer forgets about this.

Removing the mysite prefix from INSTALLED_APPS in "mysite.app" prevents double imports, why?

4 votes

I ran into the issue where post_save was being called twice for no apparent reason. It now seems that the cause was a double import as described here: Why is post_save being raised twice during the save of a Django model? the accept answer suggest removing the mysite portion of mysite.foo which worked, but why does it do a double import?

The issue is caused by mixing import paths in Python. Take the following structure w/ proj findable in sys.path, for example.

proj/
   __init__.py
   app/
       __init__.py
       foo.py

# In proj directory, enter Python shell
>>> import sys

>>> before = set(sys.modules)
>>> import app.foo
>>> set(sys.modules) - before
set(['app', 'app.foo'])

>>> before = set(sys.modules)
>>> from proj.app import foo
>>> set(sys.modules) - before
set(['proj.app.foo', 'proj', 'proj.app'])

Python actually treats proj.app.foo and app.foo as different modules. You could find that app/__init__.py and app/foo.py get imported twice, thus anything in them actually get executed twice. To fix this, we should use consistent import path: either from proj level or from ../proj level. In the link you posted, 'mysite.blog' is fine AS LONG AS there is no other importing like import blog in the project or Django files.

In Django 1.4, the issue is mostly solved by moving manage.py one directory up from project directory to its parent directory which is no longer a package, thus limit importing to proj level.

Also you could prevent duplicate signals by using dispatch_uid

Same condition as decorator and as normal function?

4 votes

I want to check a certain argument value against a regex-pattern and only continue if they match. This happens in many places within my app, so I decided to let a function do the checking and call that function whenever I need it. Now, in most cases, I need that check to be performed right at the beginning of a view, so I created it as a decorator like so:

def validate(f):
    def _inner(request, argument=None):
        if argument is None:
            return HttpResponse(content="No argument given", status=400)
        elif not re.match('^SOME_REGEX$', argument):
            return HttpResponse(content="Invalid argument", status=400)
        else:
            return f(request, argument)
    return _inner

But there are other cases where I need to call that checker from within a function, as part of nested conditions. It seems I can't call it directly, e.g. validate(argument). Is there any way I can use the same code as a decorator as well as a normal function? Or do I have to type it twice?

You certainly don't have to type it twice, you can just create a validate function which takes a value and validates it:

def validate(argument):
    return re.match('^SOME_REGEX$', argument)

and then write a decorator which calls the validate function as needed:

def requires_valid(f):
    def _inner(request, argument=None):
        if argument is None:
            return HttpResponse(content="No argument given", status=400)
        elif not validate(argument):
            return HttpResponse(content="Invalid argument", status=400)
        else:
            return f(request, argument)
    return _inner

Obviously, I don't know your use case so you might want to move the check for None into validate but the point is, you don't have to repeat the same regex twice.

And if you feel like delving into deeper magic and insist on using the same function both as a decorator and a verifier, you might try something like this:

def validate(f):
    if callable(f):
        def _inner(request, argument=None):
            if argument is None:
                return HttpResponse(content="No argument given", status=400)
            elif not validate(argument):
                return HttpResponse(content="Invalid argument", status=400)
            else:
                return f(request, argument)
        return _inner
    else:
        return re.match('^SOME_REGEX$', f)

But I'd advise against this, since you have one function that does two very different things, depending on the type of the parameter. This results in code that is much more difficult to understand. (“You decorate a view with this function which takes a string and returns bool?!”)

Could Nginx access memcached to check for a certain value to determing where to redirect?

3 votes

I have a middleware in my Django application that redirects mobile clients to a user-configurable mobile domain. It's not a simple m.[current domain], since users define the domain themselves. To save queries, I can store a mapping similar to {'www.example.com': 'mobile-version.example.com'}. However, I'd like to save the wsgi server and full Django stack from being reached on mobile requests, because this simple logic is the only thing that is happening. My thought was, if I could place this logic in Nginx somehow, I'd be able to bypass Django altogether, saving some resources. Is this possible? I've read where people have served entire sites via memcached (seems like a cheaper replacement for simple Varnish usage), but the methodology seems to be a bit different.

The logic would be something like:

$mobile_domain = memcached.get_by_key("mobile_domain_for:" + $current_domain)
IF $mobile_domain:
    redirect $mobile_domain + $path_info + $query_strings

It looks like the third-party memc-nginx-module has the ability to look up specific memcached keys.

Tools to help developers reading class hierarchy faster

3 votes

I mostly spend time on Python/Django and Objective-C/CocoaTouch and js/jQuery in the course of my daily work.

My editor of choice is vim for Python/Django and js/jQuery and xcode for Objective-C/CocoaTouch.

One of the bottlenecks on my development speed is the pace at which I read existing code, particularly open source libraries which I use.

In Python/Django for example, when I encounter some new features introduced by django developers, I get curious and begin exploring the code base manually. For example, when class-based views were introduced from django 1.3 onwards, reference - https://docs.djangoproject.com/en/dev/topics/class-based-views/ - I will check out the example code shown:

from django.views.generic import TemplateView

class AboutView(TemplateView):
    template_name = "about.html"

And try it out on one of my projects. More importantly, I am curious about what goes on behind the scenes, so I will dig into the source code -

# django/views/generic/__init__.py file

from django.views.generic.base import View, TemplateView, RedirectView
from django.views.generic.dates import (ArchiveIndexView, YearArchiveView, MonthArchiveView,
                                     WeekArchiveView, DayArchiveView, TodayArchiveView,
                                     DateDetailView)
from django.views.generic.detail import DetailView
from django.views.generic.edit import FormView, CreateView, UpdateView, DeleteView
from django.views.generic.list import ListView


class GenericViewError(Exception):
    """A problem in a generic view."""
    pass

From here, I will trace it backwards to the django/views/generic/base.py file and find out exactly what TemplateView class does:-

class TemplateView(TemplateResponseMixin, View):
    """
    A view that renders a template.
    """
    def get_context_data(self, **kwargs):
        return {
            'params': kwargs
        }

    def get(self, request, *args, **kwargs):
        context = self.get_context_data(**kwargs)
        return self.render_to_response(context)

And here's it shows that TemplateView class inherits from TemplateResponseMixin and View classes... and I continue digging further... and so on...

The problem is, this is an extremely inefficient and slow process (to "follow" class hierachies manually and opening up each file along the way).

So the question is - is there an easy way/UI tool (or other visual solution) that parses Python code in a particular project and visualize class hierarchies which I can then inspect easily by "clicking" on a specific class I am interested to read about?

Note that I am aware of IPython shell but that doesn't seem as user-friendly as a visual display tool.

For example, there's F-Script in the world of Objective-C/iOS/Mac programming, which not only provides a shell (much like python or IPython shell), but provides a visual way for developers to introspect class hierachies.

Reference screenshot:-

enter image description here

So is there a class-hierarchy visualization tool (for Python specifically, but even better if it's generic and can be used for different languages)??? What are your methods of getting up to speed efficiently when reading open source source code???

UPDATED

Per advice below, I tried out ctags and vim plugin taglist and I was able to use :TlistOpen to open up a side buffer in vim like this:-

enter image description here

This looks really cool as :TlistOpen now essentially shows me all the classes and functions that are available on my currently open buffer.

My problem now is that when I attempt to do Ctrl] while my cursor is on TemplateView, I get the following error:-

enter image description here

What am I doing wrong? Is it because my django source code is in a virtualenv? Or is there something specific I have to do to make ctags/taglist "aware" of the django source code?

Tags are a very good start indeed. (There's too much stuff all over the place on it, so I'll just provide you with one extra keyword to search with: ctags.)

In Vim, it ends up (in the basic case) with Ctrl+] to go to a class/function definition and Ctrl+T to return.