Best python questions in June 2011

Why is foo(*arg, x) not allowed in Python?

25 votes

Look at the following example

point = (1, 2)
size = (2, 3)
color = 'red'

class Rect(object):
    def __init__(self, x, y, width, height, color):
        pass

It would be very tempting to call:

Rect(*point, *size, color)

Possible workarounds would be:

Rect(point[0], point[1], size[0], size[1], color)

Rect(*(point + size), color=color)

Rect(*(point + size + (color,)))

But why is Rect(*point, *size, color) not allowed, is there any semantic ambiguity or general disadvantage you could think of?

EDIT: Specific Questions

Why are multiple *arg expansions not allowed in function calls?

Why are positional arguments not allowed after *arg expansions?

As far as I know, it was a design choice, but there seems to be a logic behind it.

EDIT: the *args notation in a function call was designed so you could pass in a tuple of variables of an arbitrary length that could change between calls. In that case, having something like f(*a, *b, c) doesn't make sense as a call, as if a changes length all the elements of b get assigned to the wrong variables, and c isn't in the right place either.

Keeping the language simple, powerful, and standardized is a good thing. Keeping it in sync with what actually goes on in processing the arguments is also a very good thing.

Think about how the language unpacks your function call. If multiple *arg are allowed in any order like Rect(*point, *size, color), note that all that matters to properly unpack is that point and size have a total of four elements. So point=(), size=(1,2,2,3), andcolor='red') would allow Rect(*point, *size, color) to work as a proper call. Basically, the language when it parses the *point and *size is treating it as one combined *arg tuple, so Rect(*(point + size), color=color) is more faithful representation.

There never needs to be two tuples of arguments passed in the form *args, you can always represent it as one. Since assignment of parameters is only dependent on the order in this combined *arg list, it makes sense to define it as such.

If you can make function calls like f(*a, *b), the language almost begs to allow you to define functions with multiple *args in the parameter list, and those couldn't be processed. E.g.,

 def f(*a, *b): 
     return (sum(a), 2*sum(b))

How would f(1,2,3,4) be processed?

I think this is why for syntactical concreteness, the language forces function calls and definitions to be in the following specific form; like f(a,b,x=1,y=2,*args,**kwargs) which is order dependent.

Everything there has a specific meaning in a function definition and function call. a and b are parameters defined without default values, next x and y are parameters defined with default values (that could be skipped; so come after the no default parameters). Next, *args is populated as a tuple with all the args filled with the rest of the parameters from a function call that weren't keyword parameters. This comes after the others, as this could change length, and you don't want something that could change length between calls to affect assignment of variables. At the end **kwargs takes all the keyword arguments that weren't defined elsewhere. With these concrete definitions you never need to have multiple *args or **kwargs.

Logging Uncaught Exceptions in Python

22 votes

How do you cause uncaught exceptions to output via the logging module rather than to stderr?

I realize the best way to do this would be:

try:
    raise Exception, 'Throwing a boring exception'
except Exception, e:
    logging.exception(e)

but my situation is such that it would be really nice if logging.exception(...) were invoked automatically whenever an exception isn't caught.

As Ned pointed out, sys.excepthook is invoked every time an exception is raised and uncaught. The practical implication of this is that in your code you can override the default behavior of sys.excepthook to do whatever you want (including using logging.exception).

As a straw man example:

>>> import sys
>>> def foo(type, value, traceback):
...     print 'My Error Information'
...     print 'Type:', type
...     print 'Value:', value
...     print 'Traceback:', traceback
... 

Override sys.excepthook:

>>> sys.excepthook = foo

Commit obvious syntax error (leave out the colon) and get back custom error information:

>>> def bar(a, b)
My Error Information
Type: <type 'exceptions.SyntaxError'>
Value: invalid syntax (<stdin>, line 1)
Traceback: None

For more information about sys.excepthook: http://docs.python.org/library/sys.html#sys.excepthook

Find the largest binary gap in a number?

19 votes

I had to solve a problem earlier on today which was very interesting: finding the largest binary gap in a number. (The longest gap between two 1s in a binary stream) For instance, 9 has a max binary gap of 2, since 9 in binary is 1001, 529 has a max binary gap of 4, since 529 in binary is 1000010001.

The way I solved it was probably not the best nor a mathematical solution, I simply converted the int in Python to a binary string via bin(n)[2:], then found the index of all matches of 1 in the string, then looped through each and counted the difference between each index and returned the greatest result.

There has to be a better, more mathological (yes, I just made that up) solution for this. I'm terrible with math, which is why I reverted to using strings... does anyone have a purely mathematical/extremely performant solution to the task at hand? I'd like to learn from the pros :)

Another string based solution

def max_gap(x):
    return len(max(bin(x)[2:].rstrip('0').split('1')))

for python2.6+ you can use format(x, 'b') instead of bin(x)[2:] for readability

Implementation of functions with very basic scripting.

18 votes

I've been playing around with python for some time and decided to better my generalized understanding of programming languages by writing a custom script handler in python. I have so far successfully implemented a basic memory handler and hooked a memory address ordinate to printing to the screen. My question can be posed as:

How can functions be implemented here? A goto statement is too easy, I would like to try something more difficult. (edit) Eventually i want to be able to do:

f0(x, y, z):=x^2

...in a shell that runs a script that runs this module (silly, eh?)

# notes: separate addresses from data lest the loop of doom cometh

class Interpreter:

  def __init__(self):
    self.memory = { }
    self.dictionary = {"mov" : self.mov,
                       "put" : self.put,
                       "add" : self.add,
                       "sub" : self.sub,
                       "clr" : self.clr,
                       "cpy" : self.cpy,
                       "ref" : self.ref }
    self.hooks = {self.val("0") : self.out }

  def interpret(self, line):
    x = line.split(" ")
    vals = tuple(self.val(y) for y in x[1:])
    dereferenced = []
    keys_only = tuple(key for key in self.memory)
    for val in vals:
      while val in self.memory: val = self.memory[val]
      dereferenced.append(val)
    vals = tuple(y for y in dereferenced)
    self.dictionary[x[0]](vals)

  def val(self, x):
    return tuple(int(y) for y in str(x).split("."))

  def mov(self, value):
    self.ptr = value[0]

  def put(self, value):
    self.memory[self.ptr] = value[0]

  def clr(self, value):
    if self.ptr in self.hooks and self.ptr in self.memory:
      x = self.hooks[self.ptr]
      y = self.memory[self.ptr]
      for z in y: x(z)
    del self.memory[self.ptr]

  def add(self, values):
    self.put(self.mat(values, lambda x, y: x + y))

  def sub(self, values):
    self.put(self.mat(values, lambda x, y: x - y))

  def mat(self, values, op):
    a, b = self.memory[values[0]], self.memory[values[1]]
    if len(a) > len(b): a, b = b, a
    c = [op(a[x], b[x]) for x in xrange(len(b))] + [x for x in a[len(a):]]
    return [tuple(x for x in c)]

  def cpy(self, value):
    self.put(value)

  def out(self, x):
    print chr(x),

  def ref(self, x):
    self.put(x)

interp = Interpreter()
for x in file(__file__.split('/')[-1].split(".")[-2] + ".why"):
  interp.interpret(x.strip())

a sample script:

mov 1
put 104.101.108.108.111.10
mov 0
ref 1
clr 0

(EDIT) I've made the decision to use this attempt as inspiration and start from scratch on this project. (Hopefully I'll find some real time to sit down and code before classes start up again.) I intend to award the best answer in a few days. I hope that that information fails to dissuade potential contributors from submitting anything they feel to be helpful for this sort of coding problem.

I am struggling a bit to understand what you are asking. Where is your function definition to be given? In the script handler or in the script?

If it is in the script handler, the obvious solution would be to use the lambda expression. Using the example you used in the question f0(x, y, z):=x^2 would translate in:

>>> f0 = lambda x, y, z : x**2
>>> f0(2,3,4)
4

If the function definitions are to be placed in the script itself, you could get away with a combination of lambda and eval expressions. Here's a quick example that I just hammered together to illustrate the idea.

class ScriptParser(object):

    # See 'to_python' to check out what this does
    mapping = {'^':'**', '!':' not ', '&':' and '}

    def to_python(self, calc):
        '''
        Parse the calculation syntax from the script grammar to the python one.
        This could be grown to a more complex parser, if needed. For now it will
        simply assume any operator as defined in the grammar used for the script
        has an equivalent in python.
        '''
        for k, v in self.mapping.items():
            calc = calc.replace(k, v)
        return calc

    def feed(self, lfs):
        '''
        Parse a line of the script containing a function defintion
        '''
        signature, calc = lfs.split(':=')
        funcname, variables = [s.strip() for s in signature.split('(')]
        # as we stripped the strings, it's now safe to do...'
        variables = variables[:-1]
        setattr(self, funcname,
                eval('lambda ' + variables + ' : ' + self.to_python(calc)))

def main():
    lines = ['f0(x, y, z) := x^2',
             'f1(x) := x**2 + x**3 + x*1000']
    sp = ScriptParser()
    for line in lines:
        sp.feed(line)
        print('Script definition  : %s' % line)
    for i in range(5):
        res0 = sp.f0(i, None, None)
        res1 = sp.f1(i)
        print('f0(%d) = %d' % (i, res0))
        print('f1(%d) = %d' % (i, res1))
        print('--------')

if __name__ == '__main__':
    main()

Running this program outputs:

Script definition  : f0(x, y, z) := x^2
Script definition  : f1(x) := x**2 + x**3 + x*1000
f0(0) = 0
f1(0) = 0
--------
f0(1) = 1
f1(1) = 1002
--------
f0(2) = 4
f1(2) = 2012
--------
f0(3) = 9
f1(3) = 3036
--------
f0(4) = 16
f1(4) = 4080
--------

Keep in mind though that:

  1. Using eval has security implications that you should be aware of.
  2. Writing your own grammar parser is a truly cool learning experience!! :)

HTH, Mac.

Is del called on an object that doesn't complete init?

16 votes

Will __del__ be called if an object's __init__ does not complete (such as by throwing an exception)?

class test():
    def __init__(self):
        raise

    def __del__(self):
        print "__del__ called"

try:
    test()
except:
    pass

Yes.

Explanation: __del__ is called when the last reference to the object is removed. If the exception is not caught, __del__ is not called in this case, because the stack still keeps a reference to the object, and it's kept until the program exits to display the exception. If the exception is caught, the object is deleted as soon as the exception is ignored and all the information relating to the exception is discarded.

Of course, __del__ is not guaranteed to run successfully if the program is about to quit, unless care is taken, and in some circumstances not even then -- see __del__ warning.

Addenum: Cédrik Julien said in his answer (now amended): "If __new__ raised an exception, your object won't be created and __del__ won't be called". This is not always right. Here's an example where __del__ is called even though the exception is raised in __new__:

class test():
    def __new__(cls):
        o = object.__new__(cls)
        raise
        return o

    def __del__(self):
        print "__del__ called"

So some exception occurred while we did some stuff to the test object o before returning it. But since the object was already created, __del__ is called. The lesson is: in the __del__ method, don't assume anything that was supposed to happen after object.__new__() was called has in fact happened. Otherwise you could raise an exception trying to access a non-existing attribute or by relying on some other assumption that is not valid.

Python: 'object in list' checks and '__cmp__' overflow

14 votes

this is my first time at stack overflow so I'm sorry if the format doesn't fit quite right with the site. I just recently started learning programming, almost 2 weeks have passed since. I'm learning python from http://openbookproject.net/thinkcs/python/english3e/index.html and everything had been quite nice until now, where I just got stuck for hours. I googled a lot but couldn't find a proper solution to my problem so here I am.

I'm trying to get the OldMaidGame() run without problems as explained on CH17. http://openbookproject.net/thinkcs/python/english3e/ch17.html - Most of the code also comes from the previous chapter.

What I've found out is I can't get the Deck.remove, Hand.remove_matches, or any other kind of remove function to work. After some debugging I found out that the problem occurs when the program checks if the given card is present in the deck/hand/etc. It can't ever make a match. Then after some looking back on the chapter, (in ch16), I found out that 'if card in deck/hand/etc: remove(card)' etc looks up the .cmp() of the object to determine if the card actually exists in the deck/hand/etc. This is my version of the cmp after doing the additions for 'ace's on the given code from the e-book.

def __cmp__(self, other):
    """ Compares cards, returns 1 if greater, -1 if lesser, 0 if equal """
    # check the suits
    if self.suit > other.suit: return 1
    if self.suit < other.suit: return -1
    # suits are the same... check ranks
    # check for aces first.
    if self.rank == 1 and other.rank == 1: return 0
    if self.rank == 1 and other.rank != 1: return 1
    if self.rank != 1 and other.rank == 1: return -1
    # check for non-aces.
    if self.rank > other.rank: return 1
    if self.rank < other.rank: return -1
    # ranks are the same... it's a tie
    return 0

The cmp itself seems fine afaik, ofc I could use some tips on how to make it better (like with ace checks). So I have no idea why the card in deck/hand checks always return false. This was the given remove function:

class Deck:
    ...
    def remove(self, card):
        if card in self.cards:
            self.cards.remove(card)
            return True
        else:
            return False

Desperately trying to get it to work, I came up with this:

class Deck:
    ...
    def remove(self, card):
        """ Removes the card from the deck, returns true if successful """
        for lol in self.cards:
            if lol.__cmp__(card) == 0:
                self.cards.remove(lol)
                return True
        return False

Seemed to work fine, until I moved on to the other non-working remove functions:

class OldMaidHand(Hand):
    def remove_matches(self):
        count = 0
        original_cards = self.cards[:]
        for card in original_cards:
            match = Card(3 - card.suit, card.rank)
            if match in self.cards:
                self.cards.remove(card)
                self.cards.remove(match)
                print("Hand {0}: {1} matches {2}".format(self.name, card, match))
                count = count + 1
        return count

I again made some adjustments:

class OldMaidHand(Hand):
    def remove_matches(self):
        count = 0
        original_cards = self.cards[:]
        for card in original_cards:
            match = Card(3 - card.suit, card.rank)
            for lol in self.cards:
                if lol.__cmp__(match) == 0:
                    self.cards.remove(card)
                    self.cards.remove(match)
                    print("Hand {0}: {1} matches {2}".format(self.name, card, match))
                    count = count + 1
        return count

The removing worked fine for the card, but it would give an error (x not in list) when I tried to remove match. Another our or so, I might've been able to make that work too, but since it already feels like I'm on the wrong road since I can't fix the original 'card in deck/hand/etc' etc, I came here looking for some answers/tips.

Thanks for reading and I greatly appreciate any help you can give :)

--------------------- EDIT 1 *>

This is my current code: http://pastebin.com/g77Y4Tjr

--------------------- EDIT 2 *>

I've tried every single cmp advised here, and I still can't get it to find a card with 'in'.

>>> a = Card(0, 5)
>>> b = Card(0, 1)
>>> c = Card(3, 1)
>>> hand = Hand('Baris')
>>> hand.add(a)
>>> hand.add(b)
>>> hand.add(c)
>>> d = Card(3, 1)
>>> print(hand)
Hand Baris contains
5 of Clubs
 Ace of Clubs
  Ace of Spades
>>> d in hand.cards
False
>>> 

I've also tried the card.py @DSM has used successfully, and I get errors there too, like at the sort function it says it cant compare the two card objects.
So I was wondering, maybe it is a problem with Python 3.2, or maybe the syntax has changed somewhere?

"So I was wondering, maybe it is a problem with Python 3.2, or maybe the syntax has changed somewhere?"

Oh, you're running Python 3.2? This'll never work in Python 3: python 3 doesn't use __cmp__!

See the data model (look for __eq__). Also read the what's new in Python 3 for some other things it's way too easy to miss.

Sorry, this is on us Python programmers here; we should have caught this far earlier. Most of probably looked at all the code, realized without even thinking about it that the source was obviously python 2 code, and assumed that's what we were working with. The cmp function doesn't even exist in Python 3.2, but the reason that it doesn't blow up with a NameError is because __cmp__ is never called.

If I run the code in Python 3.2, I reproduce your problem exactly:

>>> c = Card(0,2)
>>> str(c)
'2 of Clubs'
>>> c in [c]
True
>>> c in Deck().cards
False

In Python 3, you either implement all the rich cmps or __eq__ and one of them and use a total_ordering decorator.

from functools import total_ordering

@total_ordering
class Card(object):
    """Represents a standard playing card."""
    suit_names = ["Clubs", "Diamonds", "Hearts", "Spades"]
    rank_names = [None, "Ace", "2", "3", "4", "5", "6", "7", 
              "8", "9", "10", "Jack", "Queen", "King"]
    def __init__(self, suit=0, rank=2):
        self.suit = suit
        self.rank = rank
    def __str__(self):
        return '%s of %s' % (Card.rank_names[self.rank],
                             Card.suit_names[self.suit])
    def __repr__(self): return str(self)
    def __lt__(self, other):
        t1 = self.suit, self.rank
        t2 = other.suit, other.rank
        return t1 < t2
    def __eq__(self, other):
        t1 = self.suit, self.rank
        t2 = other.suit, other.rank
        return t1 == t2


>>> c = Card(2,3)
>>> c
3 of Hearts
>>> c in Deck().cards
True

Uninstantiable superclass

14 votes

So, I'm writing a module for connecting to external account providers (Twitter, Facebook etc) and I have a superclass that is useless on its own, but contains generic methods that need to be invoked by the subclasses for persisting auth tokens, getting auth tokens and deauthorizing the provider. My question is, is there a way to make it uninstantiable or should I follow the consenting adults rule and just let anyone who uses it make mistakes as they see fit? Other than a docstring is there a good way to denote that someone shouldn't use this superclass on its own?

I'm seconding Sven Marnach's edit: I think you should follow the "consenting adults" rule and mention in the docstring that the class is not meant to be instantiated.

The key phrase in your question is "I have a superclass that is useless on its own." It won't invoke cthulhu when instantiated; it won't cause some kind of catastrophic, hard-to-debug failure somewhere else in your program; it will just be a minor waste of time. That's not worth crippling the class for, I think.

How to create a thread-safe singleton in python

13 votes

I would like to hold running threads in my Django application. Since I cannot do so in the model or in the session, I thought of holding them in a singleton. I've been checking this out for a while and haven't really found a good how-to for this.

Does anyone know how to create a thread-safe singleton in python?

EDIT:

More specifically what I wand to do is I want to implement some kind of "anytime algorithm", i.e. when a user presses a button, a response returned and a new computation begins (a new thread). I want this thread to run until the user presses the button again, and then my app will return the best solution it managed to find. to do that, i need to save somewhere the thread object - i thought of storing them in the session, what apparently i cannot do.

The bottom line is - i have a FAT computation i want to do on the server side, in different threads, while the user is using my site.

Unless you have a very good reason - you should execute the long running threads in a different process altogether, and use Celery to execute them:

Celery is an open source asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well.

The execution units, called tasks, are executed concurrently on one or more worker nodes using multiprocessing, Eventlet or gevent. Tasks can execute asynchronously (in the background) or synchronously (wait until ready).

Celery guide for djangonauts: http://django-celery.readthedocs.org/en/latest/getting-started/first-steps-with-django.html

For singletons and sharing data between tasks/threads, again, unless you have a good reason, you should use the db layer (aka, models) with some caution regarding db locks and refreshing stale data.

Update: regarding your use case, define a Computation model, with a status field. When a user starts a computation, an instance is created, and a task will start to run. The task will monitor the status field (check db once in a while). When a user clicks the button again, a view will change the status to user requested to stop, causing the task to terminate.

How do Python parsers handle indentation?

13 votes

When parsing a freeform language like C, it is easy for the parser to determine when several expressions are related to one another simply by looking at the symbols emitted by the parser. For example, in the code

if (x == 5) {
    a = b;
    c = d;
}

The parser can tell that a = b and c = d are part of a single expression because there is an explicit set of braces around the block. Moreover, the parser knows that the two statements in the block are related because there is a semicolon between them. This could easily be encoded as a CFG using something like this:

STMT        ::=  IF_STMT | EXPR; | BLOCK_STMT | STMT STMT
IF_STMT     ::=  if ( EXPR ) STMT
BLOCK_STMT  ::=  { STMT }

In Python and other whitespace-sensitive languages, though, it's not as easy to do this because the structure of the statements can only be inferred from their absolute position, which I don't think can easily be encoded into a CFG. For example, the above code in Python would look like this:

if x == 5:
    a = b
    c = d

Try as I might, I can't see a way to write a CFG that would accept this, because I can't figure out how to encode "two statements at the same level of nesting" into a CFG.

How do Python parsers group statements as they do? Do they rely on a scanner that automatically inserts extra tokens denoting starts and ends of statements? Do they produce a rough AST for the program, then have an extra pass that assembles statements based on their indentation? Is there a clever CFG for this problem that I'm missing? Or do they use a more powerful parser than a standard LL(1) or LALR(1) parser that's able to take whitespace level into account?

Thanks!

The indentations are handled with two "pseudo tokens" - INDENT and DEDENT. There are some details here. For more information, you should look at the source for the python tokeniser and parser.

Assigning a function to an object attribute

13 votes

Based on my understanding of Python's data model, and specifically the subsection "Instance Methods", whenever you read an attribute whose value is of type "user-defined function", some magic kicks in and you get a bound instance method instead of the actual, original function. That magic is why you don't explicitly pass the self parameter when you're calling a method.

But then, I would expect to be able to replace an object's method with a function with the same signature:

class Scriptable:
    def __init__(self, script = None):
        if script is not None:
            self.script = script   # replace the method
    def script(self):
        print("greetings from the default script")

>>> scriptable = Scriptable()
>>> scriptable.script()
greetings from the default script

>>> def my_script(self):
...     print("greetings from my custom script")
...
>>> scriptable = Scriptable(my_script)
>>> scriptable.script()
Traceback (most recent call last):
  ...
TypeError: script() takes exactly 1 positional argument (0 given)

I'm creating an instance of Scriptable, and setting its script attribute to a user-defined function with a single parameter, just like what's defined in the class. So when I read the scriptable.script attribute, I would expect the magic to kick in and give me a bound instance method that takes no parameters (just like I get when I didn't replace script). Instead, it seems to be giving back the exact same function I passed in, self parameter and all. The method-binding magic isn't happening.

Why does the method-binding magic work when I define a method inside the class declaration, but not when I assign the attribute? What makes Python treat these situations differently?

I'm using Python3 if it makes any difference.

Here is how you do it:

import types
class Scriptable:
    def __init__(self, script = None):
        if script is not None:
            self.script = types.MethodType(script, self)   # replace the method
    def script(self):
        print("greetings from the default script")

As ba__friend noted in the comments, methods are stored on the class object. A descriptor on the class object returns functions as bound methods when you access the attribute from a instance.

When you assign a function to a instance nothing happens special happens, so you have to wrap the function yourself.

Python: inconsistence in the way you define the function __setattr__?

12 votes

Consider this code:

class Foo1(dict):
    def __getattr__(self, key): return self[key]
    def __setattr__(self, key, value): self[key] = value

class Foo2(dict):
    __getattr__ = dict.__getitem__
    __setattr__ = dict.__setitem__

o1 = Foo1()
o1.x = 42
print(o1, o1.x)

o2 = Foo2()
o2.x = 42
print(o2, o2.x)

I would expect the same output. However, with CPython 2.5, 2.6 (similarly in 3.2) I get:

({'x': 42}, 42)
({}, 42)

With PyPy 1.5.0, I get the expected output:

({'x': 42}, 42)
({'x': 42}, 42)

Which is the "right" output? (Or what should be the output according to the Python documentation?)

I suspect it has to do with a lookup optimization. From the source code:

 /* speed hack: we could use lookup_maybe, but that would resolve the
       method fully for each attribute lookup for classes with
       __getattr__, even when the attribute is present. So we use
       _PyType_Lookup and create the method only when needed, with
       call_attribute. */
    getattr = _PyType_Lookup(tp, getattr_str);
    if (getattr == NULL) {
        /* No __getattr__ hook: use a simpler dispatcher */
        tp->tp_getattro = slot_tp_getattro;
        return slot_tp_getattro(self, name);
    }

The fast path does does not look it up on the class dictionary.

Therefore, the best way to get the desired functionality is to place an override method in the class.

class AttrDict(dict):
    """A dictionary with attribute-style access. It maps attribute access to
    the real dictionary.  """
    def __init__(self, *args, **kwargs):
        dict.__init__(self, *args, **kwargs)

    def __repr__(self):
        return "%s(%s)" % (self.__class__.__name__, dict.__repr__(self))

    def __setitem__(self, key, value):
        return super(AttrDict, self).__setitem__(key, value)

    def __getitem__(self, name):
        return super(AttrDict, self).__getitem__(name)

    def __delitem__(self, name):
        return super(AttrDict, self).__delitem__(name)

    __getattr__ = __getitem__
    __setattr__ = __setitem__

     def copy(self):
        return AttrDict(self)

Which I found works as expected.

To ask permission or apologize?

12 votes

I come from a python background, where it's often said that it's easier to apologize than to ask permission. Specifically given the two snippets:

if type(A) == int:
  do_something(A)
else:
  do_something(int(A))

try:
  do_something(A)
except TypeError:
  do_something(int(A))

Then under most usage scenarios the second one will be faster when A is usually an integer (assuming do_something needs an integer as input and will raise its exception fairly swiftly) as you lose the logical test from every execution loop, at the expense of a more costly exception, but far less frequently.

What I wanted to check was whether this is true in C#, or whether logical tests are fast enough compared to exceptions to make this a small corner case?

Oh and I'm only interested in release performance, not debug.


OK my example was too vague try this one:

Naive solution:

return float(A) % 20 # coerse A to a float so it'll only fail if we actually don't
                     # have anything that can be represented as a real number.

Logic based solution:

if isinstance(A, Number): # This is cheaper because we're not creating a new
    return A % 20         # object unless we really have to.
else:
    return float(A) %20

Exception based solution:

try: # Now we're doing any logical tests in the 99% of cases where A is a number
  return A % 20
except TypeError:
  return float(A) % 20

Examples using FSOs, database connections, or stuff over a network are better but a bit long-winded for a question.

Probably not. .NET exceptions are relatively expensive.

Several .NET functions offer both variants for this reason. (int.TryParse, which returns a success code is often recommended because it is faster than int.Parse which throws an exception on failure)

But the only answer that matters is what your own profiling data tells you. If you need performance, then you need to measure, measure, measure.

Because what was fastest on my computer, with my code, with my version of the .NET framework, at this time may not be the fastest on your computer, with your code, with your version of the .NET framework at the time when you read it.

Python equivialent of C programming techniques (while loops)

12 votes

In the C programming language, I often have done the following:

while ((c = getch()) != EOF) {
 /* do something with c */
}

In Python, I have not found anything similar, since I am not allowed to set variables inside the evaluated expression. I usually end up with having to setup the evaluated expression twice!

c = sys.stdin.read(1)
while not (c == EOF):
 # Do something with c
 c = sys.stdin.read(1)

In my attempts to find a better way, I've found a way that only require to setup and the evaluated expression once, but this is getting uglier...

while True:
 c = sys.stdin.read(1)
 if (c == EOF): break
 # do stuff with c

So far I've settled with the following method for some of my cases, but this is far from optimal for the regular while loops...:

class ConditionalFileObjectReader:
 def __init__(self,fobj, filterfunc):
  self.filterfunc = filterfunc
  self.fobj = fobj
 def __iter__(self):
  return self
 def next(self):
  c = self.fobj.read(1)
  if self.filterfunc(c): raise StopIteration
  return c

for c in ConditionalFileObjectReader(sys.stdin,lambda c: c == EOF):
 print c

All my solutions to solve a simple basic programming problem has become too complex... Do anyone have a suggestion how to do this the proper way?

It's possible to write much simpler code in place of your ConditionalFileObjectReader, considering that EOF seems to be what you care about, rather than any arbitrary condition:

def readbytes(file):
    while True:
        c = file.read(1)
        if c == '':
            return
        yield c

for c in readbytes(sys.stdin):
    print c

So you still have 'while True ... break', which seems to be the preferred loop in Python[*], but at least you only have it once to solve the whole class of problem, "how to iterate over the bytes in a file-like object without blocking/buffering each line", and you have it in a short loop that doesn't "do stuff with c" - that's a separate concern.

Inspired by Wallacoloo's example with iter, similar to the above you could produce something more general than iter:

def until(nextvalue, pred):
    while True:
        value = nextvalue()
        if pred(value):
            return
        yield value

for c in until(lambda: sys.stdin.read(1), lambda x: x == ''):
    print c

I'm not sure whether I like this or not, but might be worth playing with. It tries to solve the general problem "iterate over the return values of some function, until a return value satisfies some condition".

[*] dare I say, the Pythonic equivalent of fancy loop syntax in other languages?

Is there an implementation of Hadley's ddply for python?

12 votes

I find Hadley's plyr package for R extremely helpful, its a great DSL for transforming data. The problem that is solves is so common, that I face it other use cases, when not manipulating data in R, but in other programming languages.

Does anyone know if there exists an a module that does a similar thing for python? Something like:

def ddply(rows, *cols, op=lambda group_rows: group_rows):
    """group rows by cols, then apply the function op to each group
       and return the results aggregating all groups
       rows is a dict or list of values read by csv.reader or csv.DictReader"""
    pass

It shouldn't be too difficult to implement, but would be great if it already existed. I'd implement it, I'd use itertools.groupby to group by cols, then apply the op function, then use itertools.chain to chain it all up. Is there a better solution?

This is the implementation I drafted up:

def ddply(rows, cols, op=lambda group_rows: group_rows): 
    """group rows by cols, then apply the function op to each group 
    rows is list of values or dict with col names (like read from 
    csv.reader or   csv.DictReader)"""
    def group_key(row):                         
        return (row[col] for col in cols)
    rows = sorted(rows, key=group_key)
    return itertools.chain.from_iterable(
        op(group_rows) for k,group_rows in itertools.groupby(rows, key=group_key)) 

Another step would be to have a set of predefined functions that could be applied as op, like sum and other utility functions.

Authorization in social networking website

10 votes

I need to accomplish the following related to privileges:

I have 3 users:

- User A
- User B
- User C

Each of the users has the following documents with associated access settings:

- User A
    - Document A1, only allow contacts to view
    - Document A2, allow everyone to view
    - Document A3, allow no one to view except myself
    - Document A4, allow contacts, and contacts of contacts to view
- User B
    - Documents B1, B2, B3, B4 with similar privileges
- User C
    - Documents C1, C2, C3, C4 with similar privileges

User A has User B as a contact but is not a contact of User C (User B and User C are contacts).

Thus, User A would be able to view the following:

- Document B1 (contacts can view)
- Document B2 (everyone can view) 
- Document B4 (contacts of contacts)
- Document C2 (everyone can view)
- Document C4 (contacts of contacts)

Could someone please explain how these privileges would be handled. And if you could link me to any documentation or articles that would help me hit the ground running. Thank you.

A general answer is to find the distance between the document owner and a given contact. In Computer Science terms, this is a directed graph.

There's a good article with some SQL queries that covers this topic at http://techportal.ibuildings.com/2009/09/07/graphs-in-the-database-sql-meets-social-networks/. Rather than trying to summarize the entire article, here's how to conceptualize the problem:

  • Start with a blank piece of paper.
  • Draw a dot somewhere on the page for each person (in this case, Users A, B, and C). In CS terms, this is a "node".
  • Draw an arrow from a user to all of their contacts. In CS terms, this is a "directed edge", or an "arc".
    • This isn't explicit in the question, but it looks like User C must be a contact of User B, or a contact another of User A's other contacts (since User A can read C2 and C4).
    • So in this case, you would draw from User A -> User B, and User B -> User C.

As an aside, if being a "contact" is mutual, you can draw a line segment (or bidirectional arrow) instead of an arrow. In CS terms, this would be an "undirected" vs. a "directed" graph. Facebook relationships are an undirected relationship; if someone is my friend, then I am also their friend. By contrast, if someone is in my Outlook address book, I'm not necessarily in theirs. So this is a directed relationship.

As more users are added to the drawing, you'll notice that a user's contacts are one step away, and their contacts-of-contacts are two steps away. But you can only travel in the direction of the arrow.

So the problem for contacts is, "How do I find all nodes whose graph distance is one?" And the question for contacts-of-contacts is, "How do I find all nodes whose graph distance is two?". Although "two or less" is probably more appropriate, since you'd expect direct contacts to have access to all of the "contacts-of-contacts" content.

For the general case, there are some SQL queries described in the article that might provide some insight. But for your specific need, I'd consider just using some joins.

Let's consider a Users table, with primary key id along with its other fields, and a HasContact table which has only two columns: userId and contactId. We'll assume that User A has id 1, User B is 2, and User C is 3. HasContact has rows (1, 2) and (2, 3) to represent the relationships described above.

A pretty simple set of SQL joins can produce a list of all friends, or all friends-of-friends.

The following query would return all IDs of a User's contacts:

SELECT contact.id
  FROM Users "user"
    LEFT JOIN Relationships "rel"
      ON user.id = rel.userid
    LEFT JOIN Users "contact"
      ON rel.contactId = contact.id
  WHERE user.id = $id_of_current_user

If you know the user IDs, an authorization query could be quite simple:

SELECT count(*)
  FROM Relationships "rel"
  WHERE rel.userid = $document_owner_user_id
    AND rel.contactid = $id_of_current_user

If the query returns 0, then we know that the current user is not one of the document owner's contacts.

We can update that second query to indicate whether a user is a contact-of-a-contact:

SELECT count(*)
  FROM Relationships "rel_1"
    INNER JOIN Relationships "rel_2"
      ON rel_1.contactId = rel_2.userId
  WHERE rel_1.userid = $document_owner_user_id
    AND rel_2.contactid = $id_of_current_user

This should return nonzero, as long as there are entries in the Relationships table such that ($document_owner_user_id, X) and (X, $id_of_current_user) both exist. Otherwise, it will return zero.

I know this is a long and somewhat indirect answer, so please comment if you have any questions.

Improving Python/django view code

7 votes

I am very new to Python/Django and programming in general. With the limited tools in my programming bag, I have written three views functions for after a user registers: it allows the user to add information and upload a thumbnail before activating his account.

I have posted the code that I have written so far so that someone with far more experience than I have can show me how to improve the code. No doubt this is crude code with all the marks of a novice, but I learn best from writing code -- seeing ways to improve it and learning new tools -- and rewriting it.

I know that the answer to this question will take a considerable amount of time. Therefore, I will be awarding a 200 point bounty to this question. SO will only allow me to add a bounty two days after a question has been posted, so I will be adding a bounty to this question on Tuesday (as soon as it's available to add). Please note that since I won't be selecting an answer until after I have posted a bounty, answers that are provided before the bounty has been added will still be 'as if' there is a bounty on the question

The following is my self-commented code. In particular, I have a lot of boilerplate code for the first 10-14 lines of each function to redirect a user based upon whether he is logged in, if he has already filled out this info, if he has session info, etc.

# in model.py
choices = ([(x,str(x)) for x in range(1970,2015)])
choices.reverse()

class UserProfile(models.Model):
    """
    Fields are user, network, location, graduation, headline, and position.
    user is a ForeignKey, unique = True (OneToOne). network is a ForeignKey.
    loation, graduation, headline, and position are optional.
    """
    user = models.ForeignKey(User, unique=True)
    network = models.ForeignKey(Network)
    location = models.CharField(max_length=100, blank=True)
    graduation = models.CharField(max_length=100, blank=True, choices=choices)
    headline = models.CharField(max_length=100, blank=True)
    positions = models.ManyToManyField(Position, blank=True)
    avatar = models.ImageField(upload_to='images/%Y/%m/%d', blank=True, default='default_profile_picture.jpg')
    # if the user has already filled out the 'getting started info', set boolean=True
    getting_started_boolean = models.BooleanField(default=False) 

General context: after a user has registered, I am giving them two session variables:

    request.session['location'] = get_location_function(request)
    request.session['username'] = new_user   # this is an email address

After a user has registered, they are re-directed to the getting_started pages.

First page:

# in views.py

def getting_started_info(request, positions=[]):
    """
    This is the first of two pages for the user to
    add additional info after they have registrered.
    There is no auto log-in after the user registers,
    so the individiaul is an 'inactive user' until he
    clicks the activation link in his email.
    """
    location = request.session.get('location')
    if request.user.is_authenticated():
        username = request.user.username        # first see if the user is logged in
        user = User.objects.get(email=username) # if so, get the user object
        if user.get_profile().getting_started_boolean: 
             return redirect('/home/')                       # redirect to User home if user has already filled out  page
        else:
            pass
    else:                                                   
        username = request.session.get('username', False)    # if not logged in, see if session info exists from registration
        if not username:
            return redirect('/account/login')                # if no session info, redirect to login page
        else:
            user = User.objects.get(email=username)
    if request.method == 'POST':
          if 'Next Step' in request.POST.values():      # do custom processing on this form
              profile = UserProfile.objects.get(user=user)
              profile.location = request.POST.get('location')
              populate_positions = []
              for position in positions:
                  populate_positions.append(Position.objects.get(label=position))
              profile.positions = request.POST.get('position')
              profile.headline = request.POST.get('headline') 
              profile.graduation = request.POST.get('graduation') 
              profile.save()
              return redirect('/account/gettingstarted/add_pic/')         
    else:
        form = GettingStartedForm(initial={'location': location})
    return render_to_response('registration/getting_started_info1.html', {'form':form, 'positions': positions,}, context_instance=RequestContext(request))

Second page:

def getting_started_pic(request):
    """
    Second page of user entering info before first login.
    This is where a user uploads a photo.
    After this page has been finished, set getting_started_boolean = True,
    so user will be redirected if hits this page again.
    """
    if request.user.is_authenticated():
        username = request.user.username                      
        user = User.objects.get(email=username)            
        if user.get_profile().getting_started_boolean: 
             return redirect('/home/')                      
        else:
            pass
    else:                                                   
        username = request.session.get('username', False)    
        if not username:
            return redirect('/account/login')                
        else:
            user = User.objects.get(email=username)
    try:
        profile = UserProfile.objects.get(user=User.objects.get(email=username)) # get the profile to display the user's picture
    except UserProfile.DoesNotExist:        # if no profile exists, redirect to login 
        return redirect('/account/login')   # this is a repetition of "return redirect('/account/login/')" above
    if request.method == 'POST':
        if 'upload' in request.POST.keys():
            form = ProfilePictureForm(request.POST, request.FILES, instance = profile)
            if form.is_valid():
                if UserProfile.objects.get(user=user).avatar != 'default_profile_picture.jpg': # if the user has an old avatar image
                    UserProfile.objects.get(user=user).avatar.delete()   # delete the image file unless it is the default image
                object = form.save(commit=False)
                try:
                    t = handle_uploaded_image(request.FILES['avatar']) # do processing on the image to make a thumbnail
                    object.avatar.save(t[0],t[1])
                except KeyError:
                    object.save()
                return render_to_response('registration/getting_started_pic.html', {'form': form, 'profile': profile,}, context_instance=RequestContext(request))
        if 'finish' in request.POST.keys():
            UserProfile.objects.filter(user=user).update(getting_started_boolean='True') # now add boolean = True so the user won't hit this page again
            return redirect('/account/gettingstarted/check_email/')       
    else:
        form = ProfilePictureForm()
    return render_to_response('registration/getting_started_pic.html', {'form': form, 'profile': profile,}, context_instance=RequestContext(request))

Third page:

def check_email(request):
    """
    End of getting started. Will redirect to user home
    if activation link has been clicked. Otherwise, will
    allow user to have activation link re-sent.
    """
    if request.user.is_authenticated():    # if the user has already clicked his activation link, redirect to User home
        return redirect('/home/')
    else:                                  # if the user is not logged in, load this page
        resend_msg=''
        user = email = request.session.get('username')
        if not email:
            return redirect('/account/login/')
        if Site._meta.installed:
            site = Site.objects.get_current()
        else:
            site = RequestSite(request)
        if request.method == 'POST':
            RegistrationProfile.objects.resend_activation(email, site)
            resend_msg = 'An activation email has been resent to %s' %(email)
            return render_to_response('registration/getting_started_check_email.html', {'email':email, 'resend_msg':resend_msg}, context_instance=RequestContext(request))
        return render_to_response('registration/getting_started_check_email.html', {'email':email, 'resend_msg':resend_msg}, context_instance=RequestContext(request))

I originally tried to replicate the behaviour of your signup process using django.contrib.formtools.wizard, but it was becoming far too complicated, considering there are only two steps in your process, and one of them is simply selecting an image. I would highly advise looking at a form-wizard solution if you intend to keep the multi-step signup process though. It will mean the infrastructure takes care of carrying state across requests, and all you need to do is define a series of forms.

Anyway, I've opted to simplify your whole process to one step. Using a basic model form, we are able to simply capture ALL of the UserProfile information you need on one page, with very very little code.

I've also gone with class-based-views, introduced in Django 1.3. It makes boilerplate code (such as your check at the top of each function for what process you're up to) much nicer to manage, at the cost of more upfront complexity. Once you understand them though, they are fantastic for a lot of use cases. Ok, so; on to the code.

# in models.py

graduation_choices = ([(x,str(x)) for x in range(1970,2015)])
graduation_choices.reverse()

class UserProfile(models.Model):
    # usually you want null=True if blank=True. blank allows empty forms in admin, but will 
    # get a database error when trying to save the instance, because null is not allowed
    user = models.OneToOneField(User)       # OneToOneField is more explicit
    network = models.ForeignKey(Network)
    location = models.CharField(max_length=100, blank=True, null=True)
    graduation = models.CharField(max_length=100, blank=True, null=True, choices=graduation_choices)
    headline = models.CharField(max_length=100, blank=True, null=True)
    positions = models.ManyToManyField(Position, blank=True)
    avatar = models.ImageField(upload_to='images/%Y/%m/%d', blank=True, null=True)

    def get_avatar_path(self):
        if self.avatar is None:
            return 'images/default_profile_picture.jpg'
        return self.avatar.name

    def is_complete(self):
        """ Determine if getting started is complete without requiring a field. Change this method appropriately """
        if self.location is None and self.graduation is None and self.headline is None:
            return False
        return True

I stole a piece of this answer for handling the default image location as it was very good advice. Leave the 'which picture to render' up to the template and the model. Also, define a method on the model which can answer the 'completed?' question, rather than defining another field if possible. Makes the process easier.

# forms.py

class UserProfileForm(forms.ModelForm):
    class Meta:
        model = UserProfile
        widgets = {
            'user': forms.HiddenInput() # initial data MUST be used to assign this
        }

A simple ModelForm based on the UserProfile object. This will ensure that all fields of the model are exposed to a form, and everything can be saved atomically. This is how I've mainly deviated from your method. Instead of using several forms, just one will do. I think this is a nicer user experience also, especially since there aren't very many fields at all. You can also reuse this exact form for when a user wants to modify their information.

# in views.py - using class based views available from django 1.3 onward

class SignupMixin(View):
    """ If included within another view, will validate the user has completed 
    the getting started page, and redirects to the profile page if incomplete
    """
    def dispatch(self, request, *args, **kwargs):
        user = request.user
        if user.is_authenticated() and not user.get_profile().is_complete()
            return HttpResponseRedirect('/profile/')
        return super(SignupMixin, self).dispatch(request, *args, **kwargs)

class CheckEmailMixin(View):
    """ If included within another view, will validate the user is active,
    and will redirect to the re-send confirmation email URL if not.

    """
    def dispatch(self, request, *args, **kwargs):
        user = request.user
        if user.is_authenticated() and not user.is_active
            return HttpResponseRedirect('/confirm/')
        return super(CheckEmailMixin, self).dispatch(request, *args, **kwargs)

class UserProfileFormView(FormView, ModelFormMixin):
    """ Responsible for displaying and validating that the form was 
    saved successfully. Notice that it sets the User automatically within the form """

    form_class = UserProfileForm
    template_name = 'registration/profile.html' # whatever your template is...
    success_url = '/home/'

    def get_initial(self):
        return { 'user': self.request.user }

class HomeView(TemplateView, SignupMixin, CheckEmailMixin):
    """ Simply displays a template, but will redirect to /profile/ or /confirm/
    if the user hasn't completed their profile or confirmed their address """
    template_name = 'home/index.html'

These views will probably be the most complicated part, but I feel are much easier to understand than reams of spaghetti view function code. I've documented the functions briefly inline, so it should make it slightly easier to understand. The only thing left is to wire up your URLs to these view classes.

# urls.py

urlpatterns = patterns('',

    url(r'^home/$', HomeView.as_view(), name='home'),
    url(r'^profile/$', UserProfileFormView.as_view(), name='profile'),
    url(r'^confirm/$', HomeView.as_view(template_name='checkemail.html'), name='checkemail'),
)

Now this is all untested code, so it may need tweaks to get working, and to integrate into your particular site. Also, it completely departs from your multi-step process. The multi-step process would be nice in the case of many many many fields.. but a separate page JUST to do the avatar seems a bit extreme to me. Hopefully, whichever way you go, this helps.

Some links regarding class based views:

API Reference
Topic Introduction

I also wanted to mention a few things about your code in general. For instance you have this:

populate_positions = []
for position in positions:
    populate_positions.append(Position.objects.get(label=position))

Which could be replaced with this:

populate_positions = Position.objects.filter(label__in=positions)

The former will hit the DB for every position. The latter will do a single query when evaluated.

Also;

if request.user.is_authenticated():
    username = request.user.username                      
    user = User.objects.get(email=username)

The above is redundant. You've got access to the user object already, and then trying to fetch it again.

user = request.user

Done.

By the way, if you want to use email addresses as a username, you will have problems. The database will only accept a maximum of 30 characters (it is how the User model is writtin in contrib.auth). Read some of them comments on this thread that discuss some of the pitfalls.

RegExp match repeated characters

7 votes

For example I have string:

 aacbbbqq

As the result I want to have following matches:

 (aa, c, bbb, qq)  

I know that I can write something like this:

 ([a]+)|([b]+)|([c]+)|...  

But I think i's ugly and looking for better solution. I'm looking for regular expression solution, not self-written finite-state machines.

You can match that with: (\w)\1*

How to grab numbers in the middle of a string? (Python)

7 votes
random string
this is 34 the string 3 that, i need 234
random string
random string
random string
random string

random string
this is 1 the string 34 that, i need 22
random string
random string
random string
random string

random string
this is 35 the string 55 that, i need 12
random string
random string
random string
random string

Within one string there are multiple lines. One of the lines is repeated but with different numbers each time. I was wondering how can I store the numbers in those lines. The numbers will always be in the same position in the line, but can be any number of digits.

Edit: The random strings could have numbers in them as well.

Use regular expressions:

>>> import re
>>> comp_re = re.compile('this is (\d+) the string (\d+) that, i need (\d+)')
>>> s = """random string
this is 34 the string 3 that, i need 234
random string
random string
random string
random string

random string
this is 1 the string 34 that, i need 22
random string
random string
random string
random string

random string
this is 35 the string 55 that, i need 12
random string
random string
random string
random string
"""
>>> comp_re.findall(s)
[('34', '3', '234'), ('1', '34', '22'), ('35', '55', '12')]

Why won't LD_PRELOAD work with Python?

7 votes

Using function interposition for open() with Python doesn't seem to work after the first few calls. I suspect Python is doing some kind of initialization, or something is temporarily bypassing my function.

Here the open call is clearly hooked:

$ cat a
hi
$ LD_PRELOAD=./libinterpose_python.so cat a
sandbox_init()
open()
hi

Here it happens once during Python initialization:

$ LD_PRELOAD=./libinterpose_python.so python
sandbox_init()
Python 2.7.2 (default, Jun 12 2011, 20:20:34) 
[GCC 4.6.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
open()
>>> 
sandbox_fini()

Here it doesn't happen at all, and there's no error to indicate the file handle had write privileges removed:

$ LD_PRELOAD=./libinterpose_python.so python3 -c 'b = open("a", "w"); b.write("hi\n"); b.flush()'
sandbox_init()
sandbox_fini()

The code is here. Build with make -f Makefile.interpose_python.

Solution

It turns out there is an open64() function:

$ objdump -T /lib32/libc.so.6  | grep '\bopen'
00064f10 g    DF .text  000000fc  GLIBC_2.4   open_wmemstream
000cc010 g    DF .text  0000007b  GLIBC_2.0   openlog
000bf6d0  w   DF .text  000000b6  GLIBC_2.1   open64
00094460  w   DF .text  00000055  GLIBC_2.0   opendir
0005f9b0 g    DF .text  000000d9  GLIBC_2.0   open_memstream
000bf650  w   DF .text  0000007a  GLIBC_2.0   open
000bf980  w   DF .text  00000081  GLIBC_2.4   openat
000bfb90  w   DF .text  00000081  GLIBC_2.4   openat64

The open64() function is a part of the large file extensions, and is equivalent to calling open() with the O_LARGEFILE flag.

Running the example code with the open64 section uncommented gives:

$ LD_PRELOAD=./libinterpose_python.so python3 -c 'b = open("a", "w"); b.write("hi\n"); b.flush()'
sandbox_init()
open64()
open64()
open64()
Traceback (most recent call last):
  File "<string>", line 1, in <module>
open64()
open64()
open64()
open64()
open64()
open64()
open64()
IOError: [Errno 9] Bad file descriptor
sandbox_fini()

Which clearly shows all of Python's open calls, and several propagated errors due to the write flag being stripped from the calls.

There are open() and open64() functions, you might need to redefine both.

Celery task that runs more tasks

6 votes

I am using celerybeat to kick off a primary task that kicks of a number of secondary tasks. I have both tasks written already.

Is there a way to easily do this? Does Celery allow for tasks to be run from within tasks?

My example:

@task
def compute(users=None):
    if users is None:
        users = User.objects.all()

    tasks = []
    for user in users:
        tasks.append(compute_for_user.subtask((user.id,)))

    job = TaskSet(tasks)
    job.apply_async() # raises a IOError: Socket closed

@task
def compute_for_user(user_id):
    #do some stuff

compute gets called from celerybeat, but causes an IOError when it tries to run apply_async. Any ideas?

To answer your opening questions: As of version 2.0, Celery provides an easy way to start tasks from other tasks. What you are calling "secondary tasks" are what it calls "subtasks". See the documentation for Sets of tasks, Subtasks and Callbacks, which @Paperino was kind enough to link to.

Your code shows that you are already familiar with this interface. Your actual question seems to be, "Why am I getting a 'Socket Closed' IOError when I try to run my set of subtasks?" I don't think anyone can answer that, because you have not provided enough information about your program. Your excerpt cannot be run as-is, so we cannot examine the problem you're having for ourselves. Please post the stacktrace provided with the IOError, and with any luck, someone that can help you with your crasher will come along.