Best python questions in November 2011

Why is early return slower than else?

55 votes

This is a follow-up question to an answer I gave a few days back. Edit: it seems that the OP of that question already used the code I posted to him to ask the same question, but I was unaware of it. Apologies. The answers provided are different though!

Substantially I observed that:

>>> def without_else(param=False):
...     if param:
...         return 1
...     return 0
>>> def with_else(param=False):
...     if param:
...         return 1
...     else:
...         return 0
>>> from timeit import Timer as T
>>> T(lambda : without_else()).repeat()
[0.3011460304260254, 0.2866089344024658, 0.2871549129486084]
>>> T(lambda : with_else()).repeat()
[0.27536892890930176, 0.2693932056427002, 0.27011704444885254]
>>> T(lambda : without_else(True)).repeat()
[0.3383951187133789, 0.32756996154785156, 0.3279120922088623]
>>> T(lambda : with_else(True)).repeat()
[0.3305950164794922, 0.32186388969421387, 0.3209099769592285]

...or in other words: having the else clause is faster regardless of the if condition being triggered or not.

I assume it has to do with different bytecode generated by the two, but is anybody able to confirm/explain in detail?

EDIT: Seems not everybody is able to reproduce my timings, so I thought it might be useful to give some info on my system. I'm running Ubuntu 11.10 64 bit with the default python installed. python generates the following version information:

Python 2.7.2+ (default, Oct  4 2011, 20:06:09) 
[GCC 4.6.1] on linux2

Here are the results of the disassembly in Python 2.7:

>>> dis.dis(without_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  4     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
>>> dis.dis(with_else)
  2           0 LOAD_FAST                0 (param)
              3 POP_JUMP_IF_FALSE       10

  3           6 LOAD_CONST               1 (1)
              9 RETURN_VALUE        

  5     >>   10 LOAD_CONST               2 (0)
             13 RETURN_VALUE        
             14 LOAD_CONST               0 (None)
             17 RETURN_VALUE        

This is a pure guess, and I haven't figured out an easy way to check whether it is right, but I have a theory for you.

I tried your code and get the same of results, without_else() is repeatedly slightly slower than with_else():

>>> T(lambda : without_else()).repeat()
[0.42015745017874906, 0.3188967452567226, 0.31984281521812363]
>>> T(lambda : with_else()).repeat()
[0.36009842032996175, 0.28962249392031936, 0.2927151355828528]
>>> T(lambda : without_else(True)).repeat()
[0.31709728471076915, 0.3172671387005721, 0.3285821242644147]
>>> T(lambda : with_else(True)).repeat()
[0.30939889008243426, 0.3035132258429485, 0.3046679117038593]

Considering that the bytecode is identical, the only difference is the name of the function. In particular the timing test does a lookup on the global name. Try renaming without_else() and the difference disappears:

>>> def no_else(param=False):
    if param:
        return 1
    return 0

>>> T(lambda : no_else()).repeat()
[0.3359846013948413, 0.29025818923918223, 0.2921801513879245]
>>> T(lambda : no_else(True)).repeat()
[0.3810395594970828, 0.2969634408842694, 0.2960104566362247]

My guess is that without_else has a hash collision with something else in globals() so the global name lookup is slightly slower.

Edit: A dictionary with 7 or 8 keys probably has 32 slots, so on that basis without_else has a hash collision with __builtins__:

>>> [(k, hash(k) % 32) for k in globals().keys() ]
[('__builtins__', 8), ('with_else', 9), ('__package__', 15), ('without_else', 8), ('T', 21), ('__name__', 25), ('no_else', 28), ('__doc__', 29)]

To clarify how the hashing works:

__builtins__ hashes to -1196389688 which reduced modulo the table size (32) means it is stored in the #8 slot of the table.

without_else hashes to 505688136 which reduced modulo 32 is 8 so there's a collision. To resolve this Python calculates:

Starting with:

j = hash % 32
perturb = hash

Repeat this until we find a free slot:

j = (5*j) + 1 + perturb;
perturb >>= 5;
use j % 2**i as the next table index;

which gives it 17 to use as the next index. Fortunately that's free so the loop only repeats once.

Each probe into the table can find one of these:

  • The slot is empty, in that case the probing stops and we know the value is not in the table.

  • The slot is unused but was used in the past in which case we go try the next value calculated as above.

  • The slot is full but the full hash value stored in the table isn't the same as the hash of the key we are looking for (that's what happens in the case of __builtins__ vs without_else).

  • The slot is full and has exactly the hash value we want, then Python checks to see if the key and the object we are looking up are the same object (which in this case they will be because short strings that could be identifiers are interned so identical identifiers use the exact same string).

  • Finally when the slot is full, the hash matches exactly, but the keys are not the identical object, then and only then will Python try comparing them for equality. This is comparatively slow, but in the case of name lookups shouldn't actually happen.

Python: why are * and ** faster than / and sqrt()?

44 votes

While optimising my code I realised the following:

>>> from timeit import Timer as T
>>> T(lambda : 1234567890 / 4.0).repeat()
[0.22256922721862793, 0.20560789108276367, 0.20530295372009277]
>>> from __future__ import division
>>> T(lambda : 1234567890 / 4).repeat()
[0.14969301223754883, 0.14155197143554688, 0.14141488075256348]
>>> T(lambda : 1234567890 * 0.25).repeat()
[0.13619112968444824, 0.1281130313873291, 0.12830305099487305]

and also:

>>> from math import sqrt
>>> T(lambda : sqrt(1234567890)).repeat()
[0.2597470283508301, 0.2498021125793457, 0.24994492530822754]
>>> T(lambda : 1234567890 ** 0.5).repeat()
[0.15409398078918457, 0.14059877395629883, 0.14049601554870605]

I assume it has to do with the way python is implemented in C, but I wonder if anybody would care to explain why is so?

The (somewhat unexpected) reason for your results is that Python seems to fold constant expressions involving floating-point multiplication and exponentiation, but not division. math.sqrt() is a different beast altogether since there's no bytecode for it and it involves a function call.

On Python 2.6.5, the following code:

x1 = 1234567890.0 / 4.0
x2 = 1234567890.0 * 0.25
x3 = 1234567890.0 ** 0.5
x4 = math.sqrt(1234567890.0)

compiles to the following bytecodes:

  # x1 = 1234567890.0 / 4.0
  4           0 LOAD_CONST               1 (1234567890.0)
              3 LOAD_CONST               2 (4.0)
              6 BINARY_DIVIDE       
              7 STORE_FAST               0 (x1)

  # x2 = 1234567890.0 * 0.25
  5          10 LOAD_CONST               5 (308641972.5)
             13 STORE_FAST               1 (x2)

  # x3 = 1234567890.0 ** 0.5
  6          16 LOAD_CONST               6 (35136.418286444619)
             19 STORE_FAST               2 (x3)

  # x4 = math.sqrt(1234567890.0)
  7          22 LOAD_GLOBAL              0 (math)
             25 LOAD_ATTR                1 (sqrt)
             28 LOAD_CONST               1 (1234567890.0)
             31 CALL_FUNCTION            1
             34 STORE_FAST               3 (x4)

As you can see, multiplication and exponentiation take no time at all since they're done when the code is compiled. Division takes longer since it happens at runtime. Square root is not only the most computationally expensive operation of the four, it also incurs various overheads that the others do not (attribute lookup, function call etc).

If you eliminate the effect of constant folding, there's little to separate multiplication and division:

In [16]: x = 1234567890.0

In [17]: %timeit x / 4.0
10000000 loops, best of 3: 87.8 ns per loop

In [18]: %timeit x * 0.25
10000000 loops, best of 3: 91.6 ns per loop

math.sqrt(x) is actually a little bit faster than x ** 0.5, presumably because it's a special case of the latter and can therefore be done more efficiently, in spite of the overheads:

In [19]: %timeit x ** 0.5
1000000 loops, best of 3: 211 ns per loop

In [20]: %timeit math.sqrt(x)
10000000 loops, best of 3: 181 ns per loop

edit 2011-11-16: Constant expression folding is done by Python's peephole optimizer. The source code (peephole.c) contains the following comment that explains why constant division isn't folded:

    case BINARY_DIVIDE:
        /* Cannot fold this operation statically since
           the result can depend on the run-time presence
           of the -Qnew flag */
        return 0;

The -Qnew flag enables "true division" defined in PEP 238.

defining "boolness" of a class in python

28 votes

Why doesn't this work as one may have naively expected?

class Foo(object):
  def __init__(self):
    self.bar = 3
  def __bool__(self):
    return self.bar > 10

foo = Foo()

if foo:
  print 'x'
else:
  print 'y'

(The output is x)

For Python 2-3 compatibility, just add this to your example:

Foo.__nonzero__ = Foo.__bool__

or expand the original definition of Foo to include:

__nonzero__ = __bool__

You could of course define them in reverse too, where the method name is __nonzero__ and you assign it to __bool__, but I think the name __nonzero__ is just a legacy of the original C-ishness of Python's interpretation of objects as truthy or falsy based on their equivalence with zero. Just add the statement above and your code will work with Python 2.x, and will automatically work when you upgrade to Python 3.x (and eventually you an drop the assignment to __nonzero__).

Why does removing the else slow down my code?

19 votes

Consider the following functions:

def fact1(n):
    if n < 2:
        return 1
    else:
        return n * fact1(n-1)

def fact2(n):
    if n < 2:
        return 1
    return n * fact2(n-1)

They should be equivalent. But there's a performance difference:

>>> T(lambda : fact1(1)).repeat(number=10000000)
[2.5754408836364746, 2.5710129737854004, 2.5678811073303223]
>>> T(lambda : fact2(1)).repeat(number=10000000)
[2.8432059288024902, 2.834425926208496, 2.8364310264587402]

The version without the else is 10% slower. This is pretty significant. Why?

For me, they are virtually the same speed: (Python 2.6.6 on Debian)

In [4]: %timeit fact1(1)
10000000 loops, best of 3: 151 ns per loop

In [5]: %timeit fact2(1)
10000000 loops, best of 3: 154 ns per loop

The byte code is also very similar:

In [6]: dis.dis(fact1)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (2)
              6 COMPARE_OP               0 (<)
              9 JUMP_IF_FALSE            5 (to 17)
             12 POP_TOP             

  3          13 LOAD_CONST               2 (1)
             16 RETURN_VALUE        
        >>   17 POP_TOP             

  5          18 LOAD_FAST                0 (n)
             21 LOAD_GLOBAL              0 (fact)
             24 LOAD_FAST                0 (n)
             27 LOAD_CONST               2 (1)
             30 BINARY_SUBTRACT     
             31 CALL_FUNCTION            1
             34 BINARY_MULTIPLY     
             35 RETURN_VALUE        
             36 LOAD_CONST               0 (None)
             39 RETURN_VALUE        

In [7]: dis.dis(fact2)
  2           0 LOAD_FAST                0 (n)
              3 LOAD_CONST               1 (2)
              6 COMPARE_OP               0 (<)
              9 JUMP_IF_FALSE            5 (to 17)
             12 POP_TOP             

  3          13 LOAD_CONST               2 (1)
             16 RETURN_VALUE        
        >>   17 POP_TOP             

  4          18 LOAD_FAST                0 (n)
             21 LOAD_GLOBAL              0 (fact)
             24 LOAD_FAST                0 (n)
             27 LOAD_CONST               2 (1)
             30 BINARY_SUBTRACT     
             31 CALL_FUNCTION            1
             34 BINARY_MULTIPLY     
             35 RETURN_VALUE        

The only difference is that the version with the else includes code to return None in case control reaches the end of the function body.

Why does 4 < '3' return True in Python 2?

18 votes

Why does 4 < '3' return True in Python 2?

Is it because when I place single quotes around a number Python sees it as a string and strings are bigger than numbers?

Yes, any number will be less than any string (including the empty string) in Python 2.

In Python 3, you can't make arbitrary comparisons. You'll get a TypeError.


From the link in eryksun's comment:

if (PyNumber_Check(v))
    vname = "";
else
    vname = v->ob_type->tp_name;
if (PyNumber_Check(w))
    wname = "";
else
    wname = w->ob_type->tp_name;
c = strcmp(vname, wname);

So at least in recent versions of CPython 2.x, type names are compared, with an empty string used instead of the type name for any numeric type.

__new__ and __init__ in Python

15 votes

I am learning Python and so far I can tell the things below about __new__ and __init__:

  1. __new__ is for object creation
  2. __init__ is for object initialization
  3. __new__ is invoked before __init__ as __new__ returns a new instance and __init__ invoked afterwards to initialize inner state.
  4. __new__ is good for immutable object as they cannot be changed once they are assigned. So we can return new instance which has new state.
  5. We can use __new__ and __init__ for both mutable object as its inner state can be changed.

But I have another questions now.

  1. When I create a new instance such as a = MyClass("hello","world"), how these arguments are passed? I mean how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.
  2. self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?

I made a little experiments as such below:

>>> class MyClass(tuple):
    def __new__(tuple):
        return [1,2,3]

and I did below:

>>> a = MyClass()
>>> a
[1, 2, 3]

Albeit I said I want to return tuple, this code works fine and returned me [1,2,3]. I knew we were passing the first parameters as the type we wanted to receive once the __new__ function is invoked. We are talking about New function right? I don't know other languages return type other than bound type?

And I did anther things as well:

>>> issubclass(MyClass,list)
False
>>> issubclass(MyClass,tuple)
True
>>> isinstance(a,MyClass)
False
>>> isinstance(a,tuple)
False
>>> isinstance(a,list)
True

I didn't do more experiment because the further wasn't bright and I decided to stop there and decided to ask StackOverflow.

The SO posts I read:

  1. Python object creation
  2. Python's use of __new__ and __init__?

how I should structure the class using __init__ and __new__ as they are different and both accepts arbitrary arguments besides default first argument.

Only rarely will you have to worry about __new__. Usually, you'll just define __init__ and let the default __new__ pass the constructor arguments to it.

self keyword is in terms of name can be changed to something else? But I am wondering cls is in terms of name is subject to change to something else as it is just a parameter name?

Both are just parameter names with no special meaning in the language. But their use is a very strong convention in the Python community; most Pythonistas will never change the names self and cls in these contexts and will be confused when someone else does.

Note that your use of def __new__(tuple) re-binds the name tuple inside the constructor function. When actually implementing __new__, you'll want to do it as

def __new__(cls, *args, **kwargs):
    # do allocation to get an object, say, obj
    return obj

Albeit I said I want to return tuple, this code works fine and returned me [1,2,3].

MyClass() will have the value that __new__ returns. There's no implicit type checking in Python; it's the responsibility of the programmer to return the correct type ("we're all consenting adults here"). Being able to return a different type than requested can be useful for implementing factories: you can return a subclass of the type requested.

This also explains the issubclass/isinstance behavior you observe: the subclass relationship follows from your use of class MyClass(tuple), the isinstance reflects that you return the "wrong" type from __new__.

For reference, check out the requirements for __new__ in the Python Language Reference.

Edit: ok, here's an example of potentially useful use of __new__. The class Eel keeps track of how many eels are alive in the process and refuses to allocate if this exceeds some maximum.

class Eel(object):
    MAX_EELS = 20
    n_eels = 0

    def __new__(cls, *args, **kwargs):
        if cls.n_eels == cls.MAX_EELS:
            raise HovercraftFull()

        obj = super(Eel, cls).__new__(cls)
        cls.n_eels += 1
        return obj

    def __init__(self, voltage):
        self.voltage = voltage

    def __del__(self):
        self.n_eels -= 1

    def electric(self):
        """Is this an electric eel?"""
        return self.voltage > 0

Mind you, there are smarter ways to accomplish this behavior.

Inheriting methods' docstrings in Python

14 votes

I have an OO hierarchy with docstrings that take as much maintenance as the code itself. E.g.,

class Swallow(object):
    def airspeed(self):
        """Returns the airspeed (unladen)"""
        raise NotImplementedError

class AfricanSwallow(Swallow):
    def airspeed(self):
        # whatever

Now, the problem is that AfricanSwallow.airspeed does not inherit the superclass method's docstring. I know I can keep the docstring using the template method pattern, i.e.

class Swallow(object):
    def airspeed(self):
        """Returns the airspeed (unladen)"""
        return self._ask_arthur()

and implementing _ask_arthur in each subclass. However, I was wondering whether there's another way to have docstrings be inherited, perhaps some decorator that I hadn't discovered yet?

Write a function in a class-decorator style to do the copying for you. In Python2.5, you can apply it directly after the class is created. In later versions you can make it an @decorator.

Here's a first cut at how to do it:

def fix_docs(cls):
    for name, func in vars(cls).items():
        if not func.__doc__:
            print func, 'needs doc'
            for parent in cls.__bases__:
                parfunc = getattr(parent, name)
                if parfunc and getattr(parfunc, '__doc__', None):
                    func.__doc__ = parfunc.__doc__
                    break
    return cls


class Animal:
    def walk(self):
        'Walk like a duck'

class Dog(Animal):
    def walk(self):
        pass

Dog = fix_docs(Dog)
print Dog.walk.__doc__

How to check if a value exists in a dictionary (python)

14 votes

Hi I have the following dictionary in python:

d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}

I need a way to find if a value such as "one" or "two" exists in this dictionary.

For example, if I wanted to know if the index "1" existed I would simply have to type:

"1" in d

And then python would tell me if that is true or false, however I need to do that same exact thing except to find if a value exists. Thanks for any help.

>>> d = {'1': 'one', '3': 'three', '2': 'two', '5': 'five', '4': 'four'}
>>> 'one' in d.values()
True

Out of curiosity, some comparative timing:

>>> T(lambda : 'one' in d.itervalues()).repeat()
[0.28107285499572754, 0.29107213020324707, 0.27941107749938965]
>>> T(lambda : 'one' in d.values()).repeat()
[0.38303399085998535, 0.37257885932922363, 0.37096405029296875]
>>> T(lambda : 'one' in d.viewvalues()).repeat()
[0.32004380226135254, 0.31716084480285645, 0.3171098232269287]

EDIT: And in case you wonder why... the reason is that each of the above returns a different type of object, which may or may not be well suited for lookup operations:

>>> type(d.viewvalues())
<type 'dict_values'>
>>> type(d.values())
<type 'list'>
>>> type(d.itervalues())
<type 'dictionary-valueiterator'>

EDIT2: As per request in comments...

>>> T(lambda : 'four' in d.itervalues()).repeat()
[0.41178202629089355, 0.3959040641784668, 0.3970959186553955]
>>> T(lambda : 'four' in d.values()).repeat()
[0.4631338119506836, 0.43541407585144043, 0.4359898567199707]
>>> T(lambda : 'four' in d.viewvalues()).repeat()
[0.43414998054504395, 0.4213531017303467, 0.41684913635253906]

Python package without __init__

13 votes

I pip install-ed the flufl.enum Python package and I noticed that it works despite missing a flufl/__init__.py module as regular Python packages. Even stranger is this:

>>> import flufl
>>> flufl
<module 'flufl' (built-in)>

I tried to reproduce this creating foo/bar/__init__.py without foo/__init__.py and (predictably) import foo fails. How does flufl do it?

the magic is done in the flufl.enum-3.2-py2.7-nspkg.pth file, which is put into site-packages by "pip install":

import sys,new,os
p = os.path.join(sys._getframe(1).f_locals['sitedir'], *('flufl',))
ie = os.path.exists(os.path.join(p,'__init__.py'))
m = not ie and sys.modules.setdefault('flufl',new.module('flufl'))
mp = (m or []) and m.__dict__.setdefault('__path__',[])
(p not in mp) and mp.append(p)

pth files are evaluated at startup. In particular, this file creates a new module named "flufl" and puts it into sys.modules. That also explains why you see it as "built-in":

>>> import new
>>> new.module('foo')
<module 'foo' (built-in)>

Why is Clojure 10 times slower than Python for the equivalent solution of Euler 50?

12 votes

I recently started to learn Clojure and decided to practice on Euler problems to get a hang of the available data structures and practice recursion and looping.

I tried various approaches to Problem 50, but no matter what I did, finding the solution for 1000000 never finished. After I looked up what others did, I guessed what I was doing should not take forever either, so I typed in the equivalent algorithm in Python to see if the problem lies in my lack of understanding of some Clojure thing, or Java setting. Python finished in 10 seconds. For primes under 100000, the Python version finished in 0.5 sec, Clojure in 5.

I'm posting the Clojure version which was created specifically to match the Python code. Can you help me understand why there is such a difference in performance? Should I use unchecked-add, type hints, primitives (but where?) or what?

So here's Clojure:

(defn prime? [n]
  (let [r (int (Math/sqrt n))]
    (loop [d 2]
      (cond
        (= n 1) false
        (> d r) true
        (zero? (rem n d)) false
        :other (recur (inc d))))))

(defn primes []
  (filter prime? (iterate inc 2)))


(defn cumulative-sum [s]
  (reduce 
    (fn [v, x] (conj v (+ (last v) x))) 
    [(first s)] 
    (rest s)))


(defn longest-seq-under [n]
  "Longest prime seq with sum under n"
  (let [ps (vec (take-while #(< % n) (primes))) ; prime numbers up to n
        prime-set (set ps)  ; set for testing of inclusion
        cs (cumulative-sum ps)
        cnt (count ps)
        max-len (count (take-while #(< % n) cs)) ; cannot have longer sequences
        sub-sum (fn [i j] ; sum of primes between the i-th and j-th      
                  (- (cs j) (get cs (dec i) 0)))
        seq-with-len (fn [m] ; try m length prime sequences and return the first where the sum is prime
                       (loop [i 0] ; try with the lowest sum
                         (if (> i (- cnt m)) ; there are no more elements for and m length sequence
                           nil ; could not find any
                           (let [j (+ i (dec m)) ; fix length
                                 s (sub-sum i j)]
                             (if (>= s n) ; overshoot
                               nil
                               (if (prime-set s) ; sum is prime
                                 [i (inc j)] ; we just looked for the first
                                 (recur (inc i))))))))] ; shift window
        (loop [m max-len] ; try with the longest sequence
          (if (not (zero? m))
            (let [[i j] (seq-with-len m) ]
              (if j 
                (subvec ps i j)
                (recur (dec m))))))))                    



(assert (= [2 3 5 7 11 13] (longest-seq-under 100)))

(let [s1000  (longest-seq-under 1000)]
  (assert (= 21 (count s1000)))
  (assert (= 953 (reduce + s1000))))

; (time (reduce + (longest-seq-under 100000))) ; "Elapsed time: 5707.784369 msecs"

And here's the same in Python:

from math import sqrt
from itertools import takewhile

def is_prime(n) :
    for i in xrange(2, int(sqrt(n))+1) :
        if n % i == 0 :
            return False
    return True

def next_prime(n):
    while not is_prime(n) :
        n += 1
    return n

def primes() :
    i = 1
    while True :
        i = next_prime(i+1)
        yield i

def cumulative_sum(s):
    cs = []
    css = 0
    for si in s :
        css += si
        cs.append( css )
    return cs


def longest_seq_under(n) :
    ps = list(takewhile( lambda p : p < n, primes()))
    pss = set(ps)
    cs = cumulative_sum(ps)
    cnt = len(ps)
    max_len = len(list(takewhile(lambda s : s < n, cs)))

    def subsum(i, j):
        return cs[j] - (cs[i-1] if i > 0 else 0)

    def interval_with_length(m) :
        for i in xrange(0, cnt-m+1) :
            j = i + m - 1            
            sij = subsum(i,j)
            if sij >= n :
                return None, None
            if sij in pss : # prime
                return i, j+1
        return None, None

    for m in xrange(max_len, 0, -1) :
        f, t = interval_with_length(m)
        if t :
            return ps[f:t]


assert longest_seq_under(100) == [2, 3, 5, 7, 11, 13]
assert sum(longest_seq_under(1000)) == 953

# import timeit
# timeit.Timer("sum(longest_seq_under(100000))", "from __main__ import longest_seq_under").timeit(1) # 0.51235757617223499

Thanks

I will accept my own comment as answer for the question why Python worked and Clojure did not: using last of a vector is a linear operation that prevented the cumulative sum from being computed the way I intended it to be.

Updating the function to use a transient vector like this:

(defn cumulative-sum-2 [s]
  (loop [[x & xs] s
         ss 0
         acc (transient [])]
    (if x      
      (let [ssx (+ ss x)]
        (recur xs ssx (conj! acc ssx)))
      (persistent! acc))))

results in the Clojure version to run only twice as long as Python, consistently. I kind of hoped Clojure would be faster then Python for the same operations, wonder if I still miss something. I'm using 1.2 by the way.

Thanks

Using the python multiprocessing module for IO with pygame on Mac OS 10.7

12 votes

I use pygame for running experiments in cognitive science, and often I have heavy I/O demands so I like to fork off these tasks to separate processes (when using a multi-core machine) to improve performance of my code. However, I encountered a scenario where some code works on my colleague's linux machine (Ubuntu LTS), but not on my mac. Below is code representing a minimal reproducible example. My mac is a 2011 Macbook Air running 10.7.2 and using the default python 2.7.1. I tried both pygame as installed via pre-built binary, and I also then tried after installing both SDL and pygame from source.

import pygame
import multiprocessing
pygame.init()

def f():
    while True:
        pygame.event.pump() #if this is replaced by pass, this code works

p = multiprocessing.Process(target=f)
p.start()

while True:
    pass

As noted in the code, it seems that the culprit is putting pygame.event.pump() in a separate process. When I run this on my mac, I first get the following printed repeatedly in terminal:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.

Then I get a crash report as copied to this gist.

Any suggestions for how to fix this?

Maybe you should initialize the pygame (which initialize SDL-> OpenGL) in each forked (child) process like in sample:

import multiprocessing

def f():
  import pygame
  pygame.init()

  while True:
    pygame.event.pump()

if __module__ == "__main__"
  p = multiprocessing.Process(target=f)
  p.start()

  import pygame
  pygame.init()

  while True:
    pygame.event.pump()

Efficient strings containing each other

12 votes

I have two sets of strings (A and B), and I want to know all pairs of strings a in A and b in B where a is a substring of b.

The first step of coding this was the following:

for a in A:
    for b in B:
        if a in b:
            print (a,b)

However, I wanted to know-- is there a more efficient way to do this with regular expressions (e.g. instead of checking if a in b:, check if the regexp '.*' + a + '.*': matches 'b'. I thought that maybe using something like this would let me cache the Knuth-Morris-Pratt failure function for all a. Also, using a list comprehension for the inner for b in B: loop will likely give a pretty big speedup (and a nested list comprehension may be even better).

I'm not very interested in making a giant leap in the asymptotic runtime of the algorithm (e.g. using a suffix tree or anything else complex and clever). I'm more concerned with the constant (I just need to do this for several pairs of A and B sets, and I don't want it to run all week).

Do you know any tricks or have any generic advice to do this more quickly? Thanks a lot for any insight you can share!


Edit:

Using the advice of @ninjagecko and @Sven Marnach, I built a quick prefix table of 10-mers:

    import collections
    prefix_table = collections.defaultdict(set)
    for k, b in enumerate(B):
        for i in xrange(len(prot_seq)-10):
            j = i+10+1
            prefix_table[b[i:j]].add(k)

    for a in A:
        if len(a) >= 10:
            for k in prefix_table[a[:10]]:
                # check if a is in b
                # (missing_edges is necessary, but not sufficient)
                if a in B[k]:
                    print (a,b)
        else:
            for k in xrange(len(prots_and_seqs)):
                # a is too small to use the table; check if
                # a is in any b
                if a in B[k]:
                    print (a, b)

Of course you can easily write this as a list comprehension:

[(a, b) for a in A for b in B if a in b]

This might slightly speed up the loop, but don't expect too much. I doubt using regular expressions will help in any way with this one.

Edit: Here are some timings:

import itertools
import timeit
import re
import collections

with open("/usr/share/dict/british-english") as f:
    A = [s.strip() for s in itertools.islice(f, 28000, 30000)]
    B = [s.strip() for s in itertools.islice(f, 23000, 25000)]

def f():
    result = []
    for a in A:
        for b in B:
            if a in b:
                result.append((a, b))
    return result

def g():
    return [(a, b) for a in A for b in B if a in b]

def h():
    res = [re.compile(re.escape(a)) for a in A]
    return [(a, b) for a in res for b in B if a.search(b)]

def ninjagecko():
    d = collections.defaultdict(set)
    for k, b in enumerate(B):
        for i, j in itertools.combinations(range(len(b) + 1), 2):
            d[b[i:j]].add(k)
    return [(a, B[k]) for a in A for k in d[a]]

print "Nested loop", timeit.repeat(f, number=1)
print "List comprehension", timeit.repeat(g, number=1)
print "Regular expressions", timeit.repeat(h, number=1)
print "ninjagecko", timeit.repeat(ninjagecko, number=1)

Results:

Nested loop [0.3641810417175293, 0.36279606819152832, 0.36295199394226074]
List comprehension [0.362030029296875, 0.36148500442504883, 0.36158299446105957]
Regular expressions [1.6498990058898926, 1.6494300365447998, 1.6480278968811035]
ninjagecko [0.06402897834777832, 0.063711881637573242, 0.06389307975769043]

Edit 2: Added a variant of the alogrithm suggested by ninjagecko to the timings. You can see it is much better than all the brute force approaches.

Edit 3: Used sets instead of lists to eliminate the duplicates. (I did not update the timings -- they remained essentially unchanged.)

Parallel Coordinates plot in Matplotlib

11 votes

Two and three dimensional data can be viewed relatively straight-forwardly using traditional plot types. Even with four dimensional data, we can often find a way to display the data. Dimensions above four, though, become increasingly difficult to display. Fortunately, parallel coordinates plots provide a mechanism for viewing results with higher dimensions.

Example Parallel Coordinates Plot from Wikipedia

Several plotting packages provide parallel coordinates plots, such as Matlab, R, VTK type 1 and VTK type 2, but I don't see how to create one using Matplotlib.

  1. Is there a built-in parallel coordinates plot in Matplotlib? I certainly don't see one in the gallery.
  2. If there is no built-in-type, is it possible to build a parallel coordinates plot using standard features of Matplotlib?

Edit:

Based on the answer provided by Zhenya below, I developed the following generalization that supports an arbitrary number of axes. Following the plot style of the example I posted in the original question above, each axis gets its own scale. I accomplished this by normalizing the data at each axis point and making the axes have a range of 0 to 1. I then go back and apply labels to each tick-mark that give the correct value at that intercept.

The function works by accepting an iterable of data sets. Each data set is considered a set of points where each point lies on a different axis. The example in __main__ grabs random numbers for each axis in two sets of 30 lines. The lines are random within ranges that cause clustering of lines; a behavior I wanted to verify.

This solution isn't as good as a built-in solution since you have odd mouse behavior and I'm faking the data ranges through labels, but until Matplotlib adds a built-in solution, it's acceptable.

#!/usr/bin/python
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

def parallel_coordinates(data_sets, style=None):

    dims = len(data_sets[0])
    x    = range(dims)
    fig, axes = plt.subplots(1, dims-1, sharey=False)

    if style is None:
        style = ['r-']*len(data_sets)

    # Calculate the limits on the data
    min_max_range = list()
    for m in zip(*data_sets):
        mn = min(m)
        mx = max(m)
        if mn == mx:
            mn -= 0.5
            mx = mn + 1.
        r  = float(mx - mn)
        min_max_range.append((mn, mx, r))

    # Normalize the data sets
    norm_data_sets = list()
    for ds in data_sets:
        nds = [(value - min_max_range[dimension][0]) / 
                min_max_range[dimension][2] 
                for dimension,value in enumerate(ds)]
        norm_data_sets.append(nds)
    data_sets = norm_data_sets

    # Plot the datasets on all the subplots
    for i, ax in enumerate(axes):
        for dsi, d in enumerate(data_sets):
            ax.plot(x, d, style[dsi])
        ax.set_xlim([x[i], x[i+1]])

    # Set the x axis ticks 
    for dimension, (axx,xx) in enumerate(zip(axes, x[:-1])):
        axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
        ticks = len(axx.get_yticklabels())
        labels = list()
        step = min_max_range[dimension][2] / (ticks - 1)
        mn   = min_max_range[dimension][0]
        for i in xrange(ticks):
            v = mn + i*step
            labels.append('%4.2f' % v)
        axx.set_yticklabels(labels)


    # Move the final axis' ticks to the right-hand side
    axx = plt.twinx(axes[-1])
    dimension += 1
    axx.xaxis.set_major_locator(ticker.FixedLocator([x[-2], x[-1]]))
    ticks = len(axx.get_yticklabels())
    step = min_max_range[dimension][2] / (ticks - 1)
    mn   = min_max_range[dimension][0]
    labels = ['%4.2f' % (mn + i*step) for i in xrange(ticks)]
    axx.set_yticklabels(labels)

    # Stack the subplots 
    plt.subplots_adjust(wspace=0)

    return plt


if __name__ == '__main__':
    import random
    base  = [0,   0,  5,   5,  0]
    scale = [1.5, 2., 1.0, 2., 2.]
    data = [[base[x] + random.uniform(0., 1.)*scale[x]
            for x in xrange(5)] for y in xrange(30)]
    colors = ['r'] * 30

    base  = [3,   6,  0,   1,  3]
    scale = [1.5, 2., 2.5, 2., 2.]
    data.extend([[base[x] + random.uniform(0., 1.)*scale[x]
                 for x in xrange(5)] for y in xrange(30)])
    colors.extend(['b'] * 30)

    parallel_coordinates(data, style=colors).show()

I'm sure there is a better way of doing it, but here's a quick-and-dirty one (a really dirty one):

#!/usr/bin/python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

#vectors to plot: 4D for this example
y1=[1,2.3,8.0,2.5]
y2=[1.5,1.7,2.2,2.9]

x=[1,2,3,8] # spines

fig,(ax,ax2,ax3) = plt.subplots(1, 3, sharey=False)

# plot the same on all the subplots
ax.plot(x,y1,'r-', x,y2,'b-')
ax2.plot(x,y1,'r-', x,y2,'b-')
ax3.plot(x,y1,'r-', x,y2,'b-')

# now zoom in each of the subplots 
ax.set_xlim([ x[0],x[1]])
ax2.set_xlim([ x[1],x[2]])
ax3.set_xlim([ x[2],x[3]])

# set the x axis ticks 
for axx,xx in zip([ax,ax2,ax3],x[:-1]):
  axx.xaxis.set_major_locator(ticker.FixedLocator([xx]))
ax3.xaxis.set_major_locator(ticker.FixedLocator([x[-2],x[-1]]))  # the last one

# EDIT: add the labels to the rightmost spine
for tick in ax3.yaxis.get_major_ticks():
  tick.label2On=True

# stack the subplots together
plt.subplots_adjust(wspace=0)

plt.show()

This is essentially based on a (much nicer) one by Joe Kingon, Python/Matplotlib - Is there a way to make a discontinuous axis?. You might also want to have a look at the other answer to the same question.

In this example I don't even attempt at scaling the vertical scales, since it depends on what exactly you are trying to achieve.

EDIT: Here is the resultenter image description here

python - regex search and findall

6 votes

I need to find all matches in a string for a given regex. I've been using findall() to do that until I came across a case where it wasn't doing what I expected. For example:

regex = re.compile('(\d+,?)+')
s = 'There are 9,000,000 bicycles in Beijing.'

print re.search(regex, s).group(0)
> 9,000,000

print re.findall(regex, s)
> ['000']

In this case search() returns what I need (the longest match) but findall() behaves differently, although the docs imply it should be the same:

findall() matches all occurrences of a pattern, not just the first one as search() does.

  • Why is the behaviour different?

  • How can I achieve the result of search() with findall() (or something else)?

Ok, I see what's going on... from the docs:

If one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group.

As it turns out, you do have a group, "(\d+,?)"... so, what it's returning is the last occurrence of this group, or 000.

One solution is to surround the entire regex by a group, like this

regex = re.compile('((\d+,?)+)')

then, it will return [('9,000,000', '000')], which is a tuple containing both matched groups. of course, you only care about the first one.

Personally, i would use the following regex

regex = re.compile('((\d+,)*\d+)')

to avoid matching stuff like " this is a bad number 9,123,"

Edit.

Here's a way to avoid having to surround the expression by parenthesis or deal with tuples

s = "..."
regex = re.compile('(\d+,?)+')
it = re.finditer(regex, s)

for match in it:
  print match.group(0)

finditer returns an iterator that you can use to access all the matches found. these match objects are the same that re.search returns, so group(0) returns the result you expect.

Extracting only characters from a string in Python

6 votes

In Python, I want to extract only the characters from a string.

Consider I have the following string,

input = "{('players',): 24, ('year',): 28, ('money',): 19, ('ipod',): 36, ('case',): 23, ('mini',): 46}"

I want the result as,

output =  "players year money ipod case mini"

I tried to split considering only the alphabets,

word1 = st.split("[a-zA-Z]+")

But the split is not happening.

You could do it with re, but the string split method doesnt take a regex, it takes a string.

Heres one way to do it with re:

import re
word1 = " ".join(re.findall("[a-zA-Z]+", st))

Django Internationalization

5 votes

I have a similar issue as found here Why doesn't Django produce locale files from template files in another directory?

However I don't understand the solution. My structure:

Project
   App1
      locale
      templates
   App2
      locale
      templates
   templates
      somefilethatneedstranslation.html

Now when I run this command from App1:

python ../manage.py App1 -l nl

It nicely creates a po file for the App1 templates in the App1 locale folder

However I want my global templates to be translated aswell.. note: I do NOT want a locale folder in my project root, so I tried adding a symlink to the templates folder from App1 but it does not append the translation results to the App1/locale/po file

from the App1 folder

ln -s ../templates/locale/* translations
python ../manage.py App1 -l nl --symlinks

What am I missing?

note:

from the templates folder

python ../manage.py templates -l nl

could work, but it won't because obviously templates is not an installed app, it seems I am missing the obvious...

The full deprecation message (which is also explained in the translation docs) is:

Translations in the project directory aren't supported anymore. LOCALE_PATHS setting instead.

This message is perhaps a bit unclear. While automatic discovery of translations in the project directory is deprecated, the use of LOCALE_PATHS to reference a project-level locale folder is totally acceptable.

If you have project-level templates, it doesn't make sense to have these templates translated in an app-specific locale location: keep a project-level locale directory, reference it in LOCALE_PATHS.

Alternatives to php for in-line web programming?

5 votes

I first learned web programming with php a while back. It has some features that I find very helpful, but the overall language is not something I enjoy, just as a matter of personal preference. I am wondering what alternatives I could use to provide similar functionality using a different underlying programming language (Python? Ruby?).

What I am looking for:

  • general purpose programming capability
  • in-line server-side code embedded in HTML (i.e. I want to be able to make my documents pure HTML if desired, rather than demanding special syntax even where I don't want dynamic content)
  • access to request parameters
  • ability to send headers, set cookies, etc

Preferably:

  • does not require a separate server process
  • easy to connect with Apache

Does anyone have any suggestions?

One thing I tried to do was embedded Ruby (erb) through CGI. This looked like a good fit on paper. Unfortunately, I was not able to get it to work, because I was following a few different guides and the result of combining them did not work out. At any rate, it seems this would not allow me to set arbitrary headers (and more importantly, use sessions and cookies).

Note: I'm not looking for a full web framework at the moment. Just relatively small amounts of dynamic content among otherwise HTML pages.

Thanks!

You've hit on the big reason why PHP is so popular - it has all of those pieces in a server-embeddable package. There aren't really many solutions with its ease of deployment; PHP is written specifically for what you want, which is both its strength and weakness. It's why it's such a weak general-purpose language, and why everyone and their dog knows it. It's everywhere, and the barrier to entry is near zero.

PHP is a language plus templating plus a web framework all baked into one package. To get an equivalent, you're going to need a web framework, even if it's a small one. Something like Sinatra is a super lightweight way to do similar in Ruby, though it requires a separate server process.

You could look at something like Perl with cgi.pm, but it may be a step in the wrong direction if you're wanting something cleaner than PHP.

I don't know Python packages well enough to offer suggestions there, but Twisted makes it easy to bind a Python program to a web interface. That does end up running in its own server process, though.

You'll need to do a little more work than your standard PHP deploy if you want to use something besides PHP, but that's often a choice that people consider to be a reasonable tradeoff for gains in productivity.

How to run own daemon processes with Django?

5 votes

In my Django project I have to do repeatedly some processing in the background. This processing needs access to Django stuff, so I put it into Django's commands and run it as cronjob. Right now I realize, that I have to do some of them more frequently (cronjob has limitation to invoke command at most every 1 minute). Another problem is that I don't have enough control, to protect running the same command in one time. It's happen when one processing takes longer than one minute. I think that I should run them like daemons, but I am looking for pure way to do it with Django. Have you ever faced with this problem or know any clean solution for it?

We do a lot of background processing for django using Celery http://celeryproject.org/. It requires some effort to set up and there is a bit of a learning curve, but once it's up and running it's just awesome.

Pros and cons of celery vs disco vs hadoop vs other distributed computing packages

5 votes

I'm working on a web application (built in django) intended for medium-to-large-scale data analysis. I'm imagining using a task queue feeding to a bank of servers, or maybe on-demand EC2 instances to handle the load.

Before getting into development, I'm trying to decide which package(s) to use for distributed computing. I've looked into celery, disco, and mapreduce on hadoop -- and they all look pretty good.

What advice can you offer on the pros and cons of different systems?

Which ones...?

  • ...are easiest to work with in python/django?
  • ...play nice with each other?
  • ...tend to work best for which tasks?
  • ...impose restrictions on other aspects of the system architecture (e.g. database design)?
  • ...have the largest user bases and best documentation?
  • ...have the steepest learning curves?

BTW, I've built multi-core and client-server applications using python's multiprocessing, but that's the extent of my practical experience with distributed computing. Most of the theory is familiar, but I've never used any of the packages mentioned here.

First of all, I have no experience with disco and little experience with hadoop. Then, to answer your questions one by one:

  • are easiest to work with in python/django?

    Celery is the winner. It has straightforward integration with django via django-celery, while feature-rich and simple to use. I assume that disco comes second ( you write python code ) and hadoop comes last ( you can write python code, but in obscure ways).

  • play nice with each other?

    Everybody can play nice with others, provided that there exists a common layer on which they can communicate ( XML, JSON, whatever...).

  • tend to work best for which tasks?

    Disco and hadoop use the mapReduce paradigm, and the word "big data" comes in mind. If you have lots of data and you want to perform some processing on all of them, then mapReduce is an optimal solution. Celery is a distributed task queue, which is more "open and agile" in the ways you can implement distributed processings/schemas.

  • impose restrictions on other aspects of the system architecture (e.g. database design)?

    I don't believe that there exists any (serious) restriction for any of the contestants (please correct me if I am wrong).

  • have the largest user bases and best documentation?

    Here hadoop is probably the winner. Celery has a decent community and lots of stackoverflow questions :). I don't know for Disco.

  • have the steepest learning curves?

    I believe that Celery has the steepest learning curve, seriously.. Hadoop is a bit tricky.. Don't know for Disco, but I suspect it's in the middle.

    To sum up, if you want a great pythonic tool for general distributed processing, easy to use and fast to learn, with full django-integration, go with Celery. On the other hand, if your data "cry" for mapReduce, then follow your heart..

Profiling on live Django server?

4 votes

I've never done code coverage in Python, but I'm looking for something like GCC's gcov, which tells me how many times each line executes, or Apple's Shark which gives a hierarchial breakdown of how long each function is taking.

My problem is that I have a live server which is experiencing high load, and I can't tell from the logs what's causing it. I would like to attach something to my Django instance to monitor which lines are the hottest and/or which functions are taking the longest time.

This is something like, but not exactly, code coverage. I would like to introduce it to a live running server, preferably without modifying too much.

Ideas?

cProfile + RunSnakeRun: http://www.vrplumber.com/programming/runsnakerun/