Best python questions in July 2011

Should one minify server code when it's in production?

23 votes

When it comes to the frontend code you always minify it (remove white spaces, comments etc) in production.

Should one do the same with server code? I usually have a lot of comments in my server files. But I have never heard about people doing so.

Wouldn't the server run faster if the code was optimized in the same way?

You're not going to have any improvement as the whitespaces and all formatting are lost when your server side code is translated to machine code (or interpreted). It's also not sent over the wire, it's read from the local filesystem, so while having less characters would lead to a faster startup, it would not make any difference on the long run and the startup speed gain would be marginal (or even unnoticeable).

So, no, minifying your server side code is basically useless, worse, it's probably going to make stack traces completely useless, as there's going to be a lot of code in the same line (and not necessarily with the same formatting you used).

Which maximum does python pick in the case of a tie?

20 votes

When using the max() function in python to find the maximum value in a list (or tuple, dict etc.) and there is a tie for maximum value, which one does python pick? Is it random?

This is relevant if, for instance, one has a list of tuples and one maximizes (using a key=) based on the first element of the tuple but there are different second elements. How does python pick which one to pick as the maximum?

I'm working in Python v2.6

This isn't specified in the documentation and isn't in the portable in-Python section of the standard library, so this behaviour may vary between implementations.

In the source to CPython 2.7 this is implemented in ./Python/bltinmodule.c as builtin_max, which wraps the more general function min_max.

min_max will iterate through the values and use PyObject_RichCompareBool to see if they are greater than the current value. If so, the greater value replaces it. Equal values will be skipped over.

This will result in the first maximum in the list being chosen in the case of a tie.

How to handle "duck typing" in Python?

16 votes

I usually want to keep my code as generic as possible. I'm currently writing a simple library and being able to use different types with my library feels extra important this time.

One way to go is to force people to subclass an "interface" class. To me, this feels more like Java than Python and using issubclass in each method doesn't sound very tempting either.

My preferred way is to use the object in good faith, but this will raise some AttributeErrors. I could wrap each dangerous call in a try/except block. This, too, seems kind of cumbersome:

def foo(obj):
    ...
    # it should be able to sleep
    try:
        obj.sleep()
    except AttributeError:
        # handle error
    ...
    # it should be able to wag it's tail
    try:
        obj.wag_tail()
    except AttributeError:
        # handle this error as well

Should I just skip the error handling and expect people to only use objects with the required methods? If I do something stupid like [x**2 for x in 1234] I actually get a TypeError and not a AttributeError (ints are not iterable) so there must be some type checking going on somewhere -- what if I want to do the same?

This question will be kind of open ended, but what is the best way to handle the above problem in a clean way? Are there any established best practices? How is the iterable "type checking" above, for example, implemented?

Edit

While AttributeErrors are fine, the TypeErrors raised by native functions usually give more information about how to solve the errors. Take this for example:

>>> ['a', 1].sort()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unorderable types: int() < str()

I'd like my library to be as helpful as possible.

If your code requires a particular interface, and the user passes an object without that interface, then nine times out of ten, it's inappropriate to catch the exception. Most of the time, an AttributeError is not only reasonable but expected when it comes to interface mismatches.

Occasionally, it may be appropriate to catch an AttributeError for one of two reasons. Either you want some aspect of the interface to be optional, or you want to throw a more specific exception, perhaps a package-specific exception subclass. Certainly you should never prevent an exception from being thrown if you haven't honestly handled the error and any aftermath.

So it seems to me that the answer to this question must be problem- and domain- specific. It's fundamentally a question of whether using a Cow object instead of a Duck object ought to work. If so, and you handle any necessary interface fudging, then that's fine. On the other hand, there's no reason to explicitly check whether someone has passed you a Frog object, unless that will cause a disastrous failure (i.e. something much worse than a stack trace).

That said, it's always a good idea to document your interface -- that's what docstrings (among other things) are for. When you think about it, it's much more efficient to throw a general error for most cases and tell users the right way to do it in the docstring, than to try to foresee every possible error a user might make and create a custom error message.

A final caveat -- it's possible that you're thinking about UI here -- I think that's another story. It's good to check the input that an end user gives you to make sure it isn't malicious or horribly malformed, and provide useful feedback instead of a stack trace. But for libraries or things like that, you really have to trust the programmer using your code to use it intelligently and respectfully, and to understand the errors that Python generates.

Fastest way in Python to find a 'startswith' substring in a long sorted list of strings

16 votes

I've done a lot of Googling, but haven't found anything, so I'm really sorry if I'm just searching for the wrong things.

I am writing an implementation of the Ghost for MIT Introduction to Programming, assignment 5.

As part of this, I need to determine whether a string of characters is the start of any valid word. I have a list of valid words ("wordlist").

Update: I could use something that iterated through the list each time, such as Peter's simple suggestion:

def word_exists(wordlist, word_fragment):
return any(w.startswith(word_fragment) for w in wordlist)

I previously had:

wordlist = [w for w in wordlist if w.startswith(word_fragment)]

(from here) to narrow the list down to the list of valid words that start with that fragment and consider it a loss if wordlist is empty. The reason that I took this approach was that I (incorrectly, see below) thought that this would save time, as subsequent lookups would only have to search a smaller list.

It occurred to me that this is going through each item in the original wordlist (38,000-odd words) checking the start of each. This seems silly when wordlist is ordered, and the comprehension could stop once it hits something that is after the word fragment. I tried this:

newlist = []
for w in wordlist:
    if w[:len(word_fragment)] > word_fragment:
        # Take advantage of the fact that the list is sorted
        break
    if w.startswith(word_fragment):
        newlist.append(w)
return newlist

but that is about the same speed, which I thought may be because list comprehensions run as compiled code?

I then thought that more efficient again would be some form of binary search in the list to find the block of matching words. Is this the way to go, or am I missing something really obvious?

Clearly it isn't really a big deal in this case, but I'm just starting out with programming and want to do things properly.

UPDATE:

I have since tested the below suggestions with a simple test script. While Peter's binary search/bisect would clearly be better for a single run, I was interested in whether the narrowing list would win over a series of fragments. In fact, it did not:

The totals for all strings "p", "py", "pyt", "pyth", "pytho" are as follows:
In total, Peter's simple test took 0.175472736359
In total, Peter's bisect left test took 9.36985015869e-05
In total, the list comprehension took 0.0499348640442
In total, Neil G's bisect took 0.000373601913452

The overhead of creating a second list etc clearly took more time than searching the longer list. In hindsight, this was likely the best approach regardless, as the "reducing list" approach increased the time for the first run, which was the worst case scenario.

Thanks all for some excellent suggestions, and well done Peter for the best answer!!!

Generator expressions are evaluated lazily, so if you only need to determine whether or not your word is valid, I would expect the following to be more efficient since it doesn't necessarily force it to build the full list once it finds a match:

def word_exists(wordlist, word_fragment):
    return any(w.startswith(word_fragment) for w in wordlist)

Note that the lack of square brackets is important for this to work.

However this is obviously still linear in the worst case. You're correct that binary search would be more efficient; you can use the built-in bisect module for that. It might look something like this:

from bisect import bisect_left
def word_exists(wordlist, word_fragment):
    try:
        return wordlist[bisect_left(wordlist, word_fragment)].startswith(word_fragment)
    except IndexError:
        return False # word_fragment is greater than all entries in wordlist

bisect_left runs in O(log(n)) so is going to be considerably faster for a large wordlist.

Edit: I would guess that the example you gave loses out if your word_fragment is something really common (like 't'), in which case it probably spends most of its time assembling a large list of valid words, and the gain from only having to do a partial scan of the list is negligible. Hard to say for sure, but it's a little academic since binary search is better anyway.

Creating a singleton in python

16 votes

This question is not for the discussion of whether or not the Singleton design pattern is desirable, is an anti-pattern, or for any religious wars, but to discuss how this pattern is best implemented in python in such a way that is most pythonic. In this instance I define 'most pythonic' to mean that it follows the 'principle of least astonishment'

I have multiple classes which would become singletons my use-case is for a logger, but this is not important. I do not wish to clutter several classes with added gumph when I can simply inherit or decorate. suitable methods best methods:


Method 1: A decorator

def singleton(class_):
  instances = {}
  def getinstance(*args, **kwargs):
    if class_ not in instances:
        instances[class_] = class_(*args, **kwargs)
    return instances[class_]
  return getinstance

@singleton
class MyClass(BaseClass):
  pass

Pros

  • Decorators are additive in a way that is often more intuitive than multiple inheritance.

Cons

  • While objects created using MyClass() would be true singleton objects, MyClass itself is a a function, not a class, so you cannot call class methods from it. Also for m = MyClass(); n = MyClass(); o = type(n)(); then m == n && m != o && n != o

Method 2: A base class

class Singleton(object):
  _instance = None
  def __new__(class_, *args, **kwargs):
    if not isinstance(class_._instance, class_):
        class_._instance = object.__new__(class_, *args, **kwargs)
    return class_._instance

class MyClass(Singleton, BaseClass):
  pass

Pros

  • It's a true class

Cons

  • Multiple inheritance - eugh! __new__ could be overwritten during inheritance from a second base class? Have to think more than is necessary.

Method 3: A metaclass

class Singleton(type):
    _instances = {}
    def __call__(class_, *args, **kwargs):
        if class_ not in class_._instances:
            class_._instances[class_] = super(Singleton, class_).__call__(*args, **kwargs)
        return class_._instances[class_]

class MyClass(BaseClass):
    __metaclass__ = Singleton

Pros

  • It's a true class
  • Auto-magically covers inheritance
  • Uses __metaclass__ for its proper purpose (And made me aware of it)

Cons

  • Are there any?

Method 4: decorator returning a class with the same name

def singleton(class_):
  class class_w(class_):
    _instance = None
    def __new__(class_, *args, **kwargs):
      if class_w._instance is None:
          class_w._instance = super(class_w, 
                                    class_).__new__(class_, 
                                                    *args, 
                                                    **kwargs)
          class_w._instance._sealed = False
      return class_w._instance
def __init__(self, *args, **kwargs):
      if self._sealed:
    return
      super(class_w, self).__init__(*args, **kwargs)
      self._sealed = True
  class_w.__name__ = class_.__name__
    return class_w

@singleton
class MyClass(BaseClass):
    pass

Pros

  • It's a true class
  • Auto-magically covers inheritance

Cons

  • Is there not an overhead for creating each new class? Here we are creating two classes for each class we wish to make a singleton. While this is fine in my case I worry that this might not scale. Of course there is a matter of debate as to whether it aught to be too easy to scale this pattern...
  • What is the point of the _sealed attribute
  • Can't call methods of the same name on base classes using super() because they will recurse. This means you can't customize __new__ and can't subclass a class that needs you to call up to __init__.

Use a Metaclass

I would recommend Method #2, but you're better off using a metaclass than a base class. Here is a sample implementation:

class Singleton(type):
    _instances = {}
    def __call__(cls, *args, **kwargs):
        if cls not in cls._instances:
            cls._instances[cls] = super(Singleton, cls).__call__(*args, **kwargs)
        return cls._instances[cls]

class Logger(object):
    __metaclass__ = Singleton

If you want to run __init__ every time the class is called, add

        else:
            cls._instances[cls].__init__(*args, **kwargs)

to the if statement in Singleton.__call__.

A few words about metaclasses. A metaclass is the class of a class; that is, a class is an instance of its metaclass. You find the metaclass of an object in Python with type(obj). Normal new-style classes are of type type. Logger in the code above will be of type class 'your_module.Singleton', just as the (only) instance of Logger will be of type class 'your_module.Logger'. When you call logger with Logger(), Python first asks the metaclass of Logger, Singleton, what to do, allowing instance creation to be pre-empted. This process is the same as Python asking a class what to do by calling __getattr__ when you reference one of it's attributes by doing myclass.attribute.

A metaclass essentially decides what the definition of a class means and how to implement that definition. See for example http://code.activestate.com/recipes/498149/, which essentially recreates C-style structs in Python using metaclasses. The thread What are your (concrete) use-cases for metaclasses in Python? also provides some examples, they generally seem to be related to declarative programming, especially as used in ORMs.

In this situation, if you use your Method #2, and a subclass defines a __new__ method, it will be executed every time you call SubClassOfSingleton() -- because it is responsible for calling the method that returns the stored instance. With a metaclass, it will only be called once, when the only instance is created. You want to customize what it means to call the class, which is decided by it's type.

In general, it makes sense to use a metaclass to implement a singleton. A singleton is special because is created only once, and a metaclass is the way you customize the creation of a class. Using a metaclass gives you more control in case you need to customize the singleton class definitions in other ways.

Your singletons won't need multiple inheritance (because the metaclass is not a base class), but for subclasses of the created class that use multiple inheritance, you need to make sure the singleton class is the first / leftmost one with a metaclass that redefines __call__ This is very unlikely to be an issue. The instance dict is not in the instance's namespace so it won't accidentally overwrite it.

You will also hear that the singleton pattern violates the "Single Responsibility Principle" -- each class should do only one thing. That way you don't have to worry about messing up one thing the code does if you need to change another, because they are separate and encapsulated. The metaclass implementation passes this test. The metaclass is responsible for enforcing the pattern and the created class and subclasses need not be aware that they are singletons. Method #1 fails this test, as you noted with "MyClass itself is a a function, not a class, so you cannot call class methods from it."

Corrections

On another topic, you've probably already noticed this, but the base class implementation in your original post is wrong. _instances needs to be referenced on the class, you need to use super() or you're recursing, and __new__ is actually a static method that you have to pass the class to, not a class method, as the actual class hasn't been created yet when it is called. All of these things will be true for a metaclass implementation as well.

class Singleton(object):
  _instances = {}
  def __new__(class_, *args, **kwargs):
    if class_ not in class_._instances:
        class_._instances[class_] = super(Singleton, class_).__new__(class_, *args, **kwargs)
    return class_._instances[class_]

class MyClass(Singleton):
  pass

c = MyClass()

Decorator Returning A Class

I originally was writing a comment but it was too long, so I'll add this here. Method #4 is better than the other decorator version, but it's more code than needed for a singleton, and it's not as clear what it does.

The main problems stem from the class being it's own base class. First, isn't it weird to have a class be a subclass of a nearly identical class with the same name that exists only in its __class__ attribute? This also means that you can't define any methods that call the method of the same name on their base class with super() because they will recurse. This means your class can't customize __new__, and can't derive from any classes that need __init__ called on them.

When to use the singleton pattern

Your use case is one of the better examples of wanting to use a singleton. You say in one of the comments "To me logging has always seemed a natural candidate for Singletons." You're absolutely right.

When people say singletons are bad, the most common reason is they are implicit shared state. While with global variables and top-level module imports are explicit shared state, other objects that are passed around are generally instantiated. This is a good point, with two exceptions.

The first, and one that gets mentioned in various places, is when the singletons are constant. Use of global constants, especially enums, is widely accepted, and considered sane because no matter what, none of the users can mess them up for any other user. This is equally true for a constant singleton.

The second exception, which get mentioned less, is the opposite -- when the singleton is only a data sink, not a data source (directly or indirectly). This is why loggers feel like a "natural" use for singletons. As the various users are not changing the loggers in ways other users will care about, there is not really shared state. This negates the primary argument against the singleton pattern, and makes them a reasonable choice because of their ease of use for the task.

Here is a quote from http://googletesting.blogspot.com/2008/08/root-cause-of-singletons.html:

Now, there is one kind of Singleton which is OK. That is a singleton where all of the reachable objects are immutable. If all objects are immutable than Singleton has no global state, as everything is constant. But it is so easy to turn this kind of singleton into mutable one, it is very slippery slope. Therefore, I am against these Singletons too, not because they are bad, but because it is very easy for them to go bad. (As a side note Java enumeration are just these kind of singletons. As long as you don't put state into your enumeration you are OK, so please don't.)

The other kind of Singletons, which are semi-acceptable are those which don't effect the execution of your code, They have no "side effects". Logging is perfect example. It is loaded with Singletons and global state. It is acceptable (as in it will not hurt you) because your application does not behave any different whether or not a given logger is enabled. The information here flows one way: From your application into the logger. Even thought loggers are global state since no information flows from loggers into your application, loggers are acceptable. You should still inject your logger if you want your test to assert that something is getting logged, but in general Loggers are not harmful despite being full of state.

Flask vs webapp2 for Google App Engine

15 votes

I'm starting new Google App Engine application and currently considering two frameworks: Flask and webapp2. I'm rather satisfied with built-in webapp framework that I've used for my previous App Engine application, so I think webapp2 will be even better and I won't have any problems with it.

However, there are a lot of good reviews of Flask, I really like its approach and all the things that I've read so far in the documentation and I want to try it out. But I'm a bit concerned about limitations that I can face down the road with Flask.

So, the question is - do you know any problems, performance issues, limitations (e.g. routing system, built-in authorization mechanism, etc.) that Flask could bring into Google App Engine application? By "problem" I mean something that I can't work around in several lines of code (or any reasonable amount of code and efforts) or something that is completely impossible.

And as a follow-up question: are there any killer-features in Flask that you think can blow my mind and make me use it despite any problems that I can face?

Disclaimer: I'm the author of tipfy and webapp2.

A big advantage of sticking with webapp (or its natural evolution, webapp2) is that you don't have to create your own versions for existing SDK handlers for your framework of your choice.

For example, deferred uses a webapp handler. To use it in a pure Flask view, using werkzeug.Request and werkzeug.Response, you'll need to implement deferred for it (like I did here for tipfy).

The same happens for other handlers: blobstore (Werkzeug still doesn't support range requests, so you'll need to use WebOb even if you create your own handler -- see tipfy.appengine.blobstore), mail, XMPP and so on, or others that are included in the SDK in the future.

And the same happens for libraries created with App Engine in mind, like ProtoRPC, which is based on webapp and would need a port or adapter to work with other frameworks, if you don't want to mix webapp and your-framework-of-choice handlers in the same app.

So, even if you choose a different framework, you'll end a) using webapp in some special cases or b) having to create and maintain your versions for specific SDK handlers or features, if you'll use them.

I much prefer Werkzeug over WebOb, but after over one year porting and maintaining versions of the SDK handlers that work natively with tipfy, I realized that this is a lost cause -- to support GAE for the long term, best is to stay close to webapp/WebOb. It makes support for SDK libraries a breeze, maintenance becomes a lot easier, it is more future-proof as new libraries and SDK features will work out of the box and there's the benefit of a large community working around the same App Engine tools.

A specific webapp2 defense is summarized here. Add to those that webapp2 can be used outside of App Engine and is easy to be customized to look like popular micro-frameworks and you have a good set of compelling reasons to go for it. Also, webapp2 has a big chance to be included in a future SDK release (this is extra-official, don't quote me :-) which will push it forward and bring new developers and contributions.

That said, I'm a big fan of Werkzeug and the Pocoo guys and borrowed a lot from Flask and others (web.py, Tornado), but -- and, you know, I'm biased -- the above webapp2 benefits should be taken into account.

reverse() does not work on a Python literal?

15 votes

Why doesn't this work in Python?

>>> print [0,1,0,1,1,0,1,1,1,0,1,1,1,1,0].reverse() 
None

I expected to get back the list in reverse order.

If you want it to return a new list in reverse order, you can use [::-1]

>>> [0,1,0,1,1,0,1,1,1,0,1,1,1,1,0][::-1]
[0, 1, 1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0]

As I'm still trying to understand the downvote, if it doesn't matter that the original list gets changed, use @taskinoor's answer, it'll be faster.

However if you're required to keep the original list unchanged, I suggest you to use [::-1].

from time import time
lst = [0,1,0,1,1,0,1,1,1,0,1,1,1,1,0]

s = time()
for i in xrange(5000000):
    a = lst[::-1]          # creates a reversed list
print time() -s

s = time()
for i in xrange(5000000):
    b = lst[:]             # copies the list
    b.reverse()            # reverses it
print time() -s

outputs

2.44950699806
3.02573299408

Recent-ish changes to the Python execution model?

14 votes

I just re-read the section on execution models in the 3rd edition of Learning Python (late 2007), and it felt fairly tentative. So, I looked at the same section in the 4th edition (late 2009) and was pretty disappointed that it was completely unchanged.

What is the status for executing Python beyond CPython? It feels like Jython and IronPython are still niche projects; have other similar projects emerged? has Psyco solidified well enough to use without worry? Are many people using ShedSkin? Is there information on when PyPy is generally faster for execution?

The developer of Psyco, Armin Rigo, now works on PyPy along with a lot of other brilliant developers. PyPy is very actively developed and a lot of exciting stuff is planned for it in the future. PyPy compiled with JIT is almost always faster than CPython, frequently by a large margin. They have a collection of benchmarks tracking their progress. It's quickly becoming a very popular implementation. Of note in regards to PyPy:

  1. Very good implementation of Python, currently implements Python 2.7.1, so you can use the latest and greatest language features available outside of 3.x.

  2. The JIT allows for some truly amazing speed-ups and PyPy's ctypes support can be even faster than ctypes under CPython.

  3. The translation toolchain is very flexible. You can target different backends, build with stackless support, swap in and out garbage collectors, build with a JIT, etc.

  4. Fairly complete support for ctypes and partial support of the C API (support for both are being improved rapidly).

  5. You can actually write whatever you want in RPython and translate it, so you could use the translation toolchain similarly to ShedSkin.

ShedSkin is still developed, and I've used it a few times in the last year. It supports a restricted subset of Python, and significant chunk of the standard library. It's worth a look. I wouldn't recommend Jython or IronPython unless you need to run on the JVM or CLR. It sounds like you'd also be interested in Cython.

What is the theory behind mutable and immutable types?

13 votes

One of the things that I admire about Python is its distinction between mutable and immutable types. Having spent a while programming in c before coming to Python, I was astonished at how easily Python does away with all the complexities of pointer dereferencing that drive me mad in c. In Python everything just works the way I expect, and I quickly realized that the mutable/immutable distinction plays an important part in that.

There are still a few wrinkles, of course (mutable function argument defaults being a notable example) but overall, I feel that the mutable/immutable distinction greatly clarifies the question of what variables and their values are and how they ought to behave.

But where does it come from? I have to assume that GvR was not the first person to conceive of this distinction, and that Python was not the first language to use it. I'm interested in hearing about earlier languages that used this concept, as well as any early theoretical discussions of it.

Objective C is loaded with mutable/immutable distinctions (to the point where there are both NSString and NSMutableString, for example); it predates Python by about 8 years. Smalltalk, from which Objective C inherited much of its OO design, uses the concept to a lesser extent (notably, strings are not immutable; the trend these days is towards immutable strings as in Python, Ruby, etc.).

Playing aroud with Devanagari Characters

13 votes

I have something like

a = "बिक्रम मेरो नाम हो"

I want to achieve something like

a[0] = बि a[1] = क्र a[3] = म

but as म takes 4 bit while बि takes 8 bit I am not able to get to that straight. SO what could be done to achieve that. In python.

The algorithm for splitting text into grapheme clusters is given in Unicode Annex 29, section 3.1. I'm not going to implement the full algorithm for you here, but I'll show you roughly how to handle the case of Devanagari, and then you can read the Annex for yourself and see what else you need to implement.

The unicodedata module contains the information you need to detect the grapheme clusters.

>>> import unicodedata
>>> a = u"बिक्रम मेरो नाम हो"
>>> map(unicodedata.name, a)
['DEVANAGARI LETTER BA', 'DEVANAGARI VOWEL SIGN I', 'DEVANAGARI LETTER KA', 
 'DEVANAGARI SIGN VIRAMA', 'DEVANAGARI LETTER RA', 'DEVANAGARI LETTER MA',
 'SPACE', 'DEVANAGARI LETTER MA', 'DEVANAGARI VOWEL SIGN E',
 'DEVANAGARI LETTER RA', 'DEVANAGARI VOWEL SIGN O', 'SPACE',
 'DEVANAGARI LETTER NA', 'DEVANAGARI VOWEL SIGN AA', 'DEVANAGARI LETTER MA',
 'SPACE', 'DEVANAGARI LETTER HA', 'DEVANAGARI VOWEL SIGN O']

In Devanagari, each grapheme cluster consists of an initial letter, optional pairs of virama (vowel killer) and letter, and an optional vowel sign. In regular expression notation that would be LETTER (VIRAMA LETTER)* VOWEL?. You can tell which is which by looking up the Unicode category for each code point:

>>> map(unicodedata.category, a)
['Lo', 'Mc', 'Lo', 'Mn', 'Lo', 'Lo', 'Zs', 'Lo', 'Mn', 'Lo', 'Mc', 'Zs',
 'Lo', 'Mc', 'Lo', 'Zs', 'Lo', 'Mc']

Letters are category Lo (Letter, Other), vowel signs are category Mc (Mark, Spacing Combining), virama is category Mn (Mark, Nonspacing) and spaces are category Zs (Separator, Space).

So here's a rough approach to split out the grapheme clusters:

def splitclusters(s):
    """
    Generate the grapheme clusters for the string s. Not the full
    Unicode text segmentation algorithm, but probably good enough for
    Devanagari.
    """
    cluster = ''
    last = None
    virama = u'\N{DEVANAGARI SIGN VIRAMA}'
    for c in s:
        cat = unicodedata.category(c)[0]
        if cat == 'M' or cat == 'L' and last == virama:
            cluster += c
        else:
            if cluster:
                yield cluster
            cluster = c
        last = c
    if cluster:
        yield cluster

>>> for c in splitclusters(a): print c.encode('utf8'), '-',
... 
बि - क्र - म -   - मे - रो -   - ना - म -   - हो -

Python pattern-matching. Match 'c[any number of consecutive a's, b's, or c's or b's, c's, or a's etc.]t'

12 votes

Sorry about the title, I couldn't come up with a clean way to ask my question.

In Python I would like to match an expression 'c[some stuff]t', where [some stuff] could be any number of consecutive a's, b's, or c's and in any order.

For example, these work: 'ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat'

but these don't: 'cbcbbaat', 'caaccbabbt'

Edit: a's, b's, and c's are just an example but I would really like to be able to extend this to more letters. I'm interested in regex and non-regex solutions.

Not sure how attached you are to regex, but here is a solution using a different method:

from itertools import groupby

words = ['ct', 'cat', 'cbbt', 'caaabbct', 'cbbccaat',  'cbcbbaat', 'caaccbabbt']
for w in words:
    match = False
    if w.startswith('c') and w.endswith('t'):
        temp = w[1:-1]
        s = set(temp)
        match = s <= set('abc') and len(s) == len(list(groupby(temp)))
    print w, "matches" if match else "doesn't match"

The string matches if a set of the middle characters is a subset of set('abc') and the number of groups returned by groupby() is the same as the number of elements in the set.

Evolutionary Algorithm for the Theo Jansen Walking Mechanism

11 votes

There is a Dutch artist/engineer who created a very elaborate walking mechanism. The working principle can be seen here:

http://www.strandbeest.com/beests_leg.php

The curious part is that he used a self-made Evolutionary Algorithm to calculate the ideal link lengths, which are described at the bottom of the page.

I created a Python script to visually analyze the ground-contact part of the cycle, which must have two requisites fulfilled:

  1. Be as straight as possible, so as not to wobble up and down;
  2. Have a speed as constant as possible, so as not to drag one foot against the other;

These two criteria would result in a "wheel like" effect, with the machine going linearly ahead without wasting kinetic energy.

The question is:

"Do you have any suggestion of a simple evolutionary iterative formula to optimize leg lengths (by inserting the correct mutations in the code below) so as to improve the walking path given the two criteria above?"

EDIT: some suggestions about the "fitting rule" for genome candidates:

  • Take the "lower part" (ground contact) of the cycle, given that it corresponds to one third of crank revolution (mind the lower part might have a non-horizontal slope and still be linear);
  • Apply linear regression on the point positions of this "ground contact" part;
  • Calculate vertical variation from the linear regression (least squares?)
  • Calculate speed variation by the difference between one point and the previous one, parallel to the regression line;
  • (optional) plot graphs of these "error functions", possibly allowing to select mutants visually (boooring... ;o).

Here is my code, in Python + GTK, which gives some visual insight into the problem: (EDIT: now with parametrized magic numbers subject to mutation by mut's values)

# coding: utf-8

import pygtk
pygtk.require('2.0')
import gtk, cairo
from math import *

class Mechanism():
    def __init__(s):
        pass

    def assemble(s, angle):

        # magic numbers (unmutated)
        mu = [38, 7.8, 15, 50, 41.5, 39.3, 61.9, 55.8, 40.1, 39.4, 36.7, 65.7, 49]

        # mutations
        mut = [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]

        # mutated
        mn = [mu[n]+mut[n] for n in range(13)]

        s.A = Point(0,0)
        s.B = Point(-mn[0], -mn[1])

        s.C = fromPoint(s.A, mn[2], angle)
        s.ac = Line(s.A, s.C)

        s.D = linkage(s.C, mn[3], s.B, mn[4])
        s.cd = Line(s.C, s.D)
        s.bd = Line(s.B, s.D)

        s.E = linkage(s.B, mn[5], s.C, mn[6])
        s.be = Line(s.B, s.E)
        s.ce = Line(s.C, s.E)

        s.F = linkage(s.D, mn[7], s.B, mn[8])
        s.df = Line(s.D, s.F)
        s.bf = Line(s.B, s.F)

        s.G = linkage(s.F, mn[9], s.E, mn[10])
        s.fg = Line(s.F, s.G)
        s.eg = Line(s.E, s.G)

        s.H = linkage(s.G, mn[11], s.E, mn[12])
        s.gh = Line(s.G, s.H)
        s.EH = Line(s.E, s.H)


        return s.H


class Point:
    def __init__(self, x, y):
        self.x, self.y = float(x), float(y)
    def __str__(self):
        return "(%.2f, %.2f)" % (self.x, self.y)

class Line:
    def __init__(self, p1, p2):
        self.p1, self.p2 = p1, p2
    def length(self):
        return sqrt((p1.x-p2.x)**2 + (p1.y-p2.y)**2)

def fromPoint(point, distance, angle):
    angle = radians(angle)
    return Point(point.x + distance * cos(angle),
        point.y + distance * sin(angle))

def distance(p1, p2):
    return sqrt( (p1.x - p2.x)**2 + (p1.y - p2.y)**2 )

def ccw(p1, p2, px):
    """ Test if px is at the right side of the line p1p2 """
    ax, ay, bx, by = p1.x, p1.y, p2.x, p2.y
    cx, cy = px.x, px.y
    return (bx-ax)*(cy-ay)-(by-ay)*(cx-ax) < 0

def inter(p1, l1, p2, l2):
    l1 = float(l1)
    l2 = float(l2)
    dx,dy = p2.x-p1.x, p2.y-p1.y
    d = sqrt(dx**2 + dy**2)                             # distance between the centers
    a = (l1**2 - l2**2 + d**2) / (2*d)                  # distance from first center to the radical line
    M = Point(p1.x + (dx * a/d), p1.y + (dy * a/d))     # intersection of centerline with radical line
    h = sqrt(l1**2 - a**2)                              # distance from the midline to any of the points
    rx,ry = -dy*(h/d), dx*(h/d)
    # There are two results, but only one (the correct side of the line) must be chosen
    R1 = Point(M.x + rx, M.y + ry)
    R2 = Point(M.x - rx, M.y - ry)
    test1 = ccw(p1, p2, R1)
    test2 = ccw(p1, p2, R2)
    if test1:
        return R1
    else:
        return R2


###############################33

mec = Mechanism()
stepcurve = [mec.assemble(p) for p in xrange(360)]

window=gtk.Window()
panel = gtk.VBox()
window.add(panel)
toppanel = gtk.HBox()
panel.pack_start(toppanel)

class Canvas(gtk.DrawingArea):
    def __init__(self):
        gtk.DrawingArea.__init__(self)
        self.connect("expose_event", self.expose)

    def expose(self, widget, event):
        cr = widget.window.cairo_create()
        rect = self.get_allocation()
        w = rect.width
        h = rect.height
        cr.translate(w*0.85, h*0.3)
        scale = 1
        cr.scale(scale, -scale)
        cr.set_line_width(1)

        def paintpoint(p):
            cr.arc(p.x, p.y, 1.2, 0, 2*pi)
            cr.set_source_rgb(1,1,1)
            cr.fill_preserve()
            cr.set_source_rgb(0,0,0)
            cr.stroke()

        def paintline(l):
            cr.move_to(l.p1.x, l.p1.y)
            cr.line_to(l.p2.x, l.p2.y)
            cr.stroke()

        for i in mec.__dict__:
            if mec.__dict__[i].__class__.__name__ == 'Line':
                paintline(mec.__dict__[i])

        for i in mec.__dict__:
            if mec.__dict__[i].__class__.__name__ == 'Point':
                paintpoint(mec.__dict__[i])

        cr.move_to(stepcurve[0].x, stepcurve[0].y)
        for p in stepcurve[1:]:
            cr.line_to(p.x, p.y)
        cr.close_path()
        cr.set_source_rgb(1,0,0)
        cr.set_line_join(cairo.LINE_JOIN_ROUND)
        cr.stroke()

class FootPath(gtk.DrawingArea):
    def __init__(self):
        gtk.DrawingArea.__init__(self)
        self.connect("expose_event", self.expose)

    def expose(self, widget, event):
        cr = widget.window.cairo_create()
        rect = self.get_allocation()
        w = rect.width
        h = rect.height

        cr.save()
        cr.translate(w/2, h/2)

        scale = 20
        cr.scale(scale, -scale)

        cr.translate(40,92)

        twocurves = stepcurve.extend(stepcurve)

        cstart = 305
        cr.set_source_rgb(0,0.5,0)
        for p in stepcurve[cstart:cstart+121]:
            cr.arc(p.x, p.y, 0.1, 0, 2*pi)
            cr.fill()

        cr.move_to(stepcurve[cstart].x, stepcurve[cstart].y)
        for p in stepcurve[cstart+1:cstart+121]:
            cr.line_to(p.x, p.y)
        cr.set_line_join(cairo.LINE_JOIN_ROUND)
        cr.restore()
        cr.set_source_rgb(1,0,0)
        cr.set_line_width(1)
        cr.stroke()




        cr.save()
        cr.translate(w/2, h/2)
        scale = 20
        cr.scale(scale, -scale)
        cr.translate(40,92)

        cr.move_to(stepcurve[cstart+120].x, stepcurve[cstart+120].y)
        for p in stepcurve[cstart+120+1:cstart+360+1]:
            cr.line_to(p.x, p.y)
        cr.restore()
        cr.set_source_rgb(0,0,1)
        cr.set_line_width(1)
        cr.stroke()



canvas = Canvas()
canvas.set_size_request(140,150)
toppanel.pack_start(canvas, False, False)

toppanel.pack_start(gtk.VSeparator(), False, False)

footpath = FootPath()
footpath.set_size_request(1000,-1)
toppanel.pack_start(footpath, True, True)


def changeangle(par):
    mec.assemble(par.get_value()-60)
    canvas.queue_draw()
angleadjust = gtk.Adjustment(value=0, lower=0, upper=360, step_incr=1)
angleScale = gtk.HScale(adjustment=angleadjust)
angleScale.set_value_pos(gtk.POS_LEFT)
angleScale.connect("value-changed", changeangle)
panel.pack_start(angleScale, False, False)


window.set_position(gtk.WIN_POS_CENTER)
window.show_all()
gtk.main()

Try the demo!

enter image description here

This is a fascinating question, though I think somewhat beyond the scope of Stack Overflow: it's not something that going to be solved in a few minutes, so I'll stick an outline here and update it if I make any progress. There are going to be three parts to any approach:

  1. Scoring the footprint: does the linkage break? does the footprint have the right kind of shape? how flat is it? how smooth is the motion? does it spend enough time in the flat portion?

  2. Searching for good values of the magic numbers. It's not clear that this has to be an evolutionary algorithm (though I can see why the idea of such an algorithm would appeal to Theo Jansen as it fits in with the animal metaphor in his art); perhaps other approaches like local search (hill climbing) or simulated annealing would be productive.

  3. Searching for good configurations of the arms. This is where an evolutionary approach seems like it might be most worthwhile.

You can try out different magic numbers in my Javascript/canvas demo to see what kinds of motion you can get (CD = 55.4 is quite entertaining, for example). There's a whole mathematical theory of linkages, by the way, that connects the configuration spaces of linkages to topological manifolds.


I added some simple scoring to the demo. The ground score is the fraction of the cycle that the foot spends on the ground, which I take to be all points whose y-coordinate is within some tolerance of the lowest point. The drag score is the biggest difference between any two horizontal velocities while the foot is on the ground. (It's always negative, so that higher values = small differences in velocities = better.)

But here's where the difficulty comes in. In order to program any kind of search, I need to be able to combine these scores. But how do I balance them against each other? The Jansen magic numbers give me groundScore: 0.520; dragScore: -0.285. If I set AC=10, GH=65, EH=50, i get groundScore: 0.688; dragScore: -0.661. Nearly 70% of the time the foot is on the ground. But the take-off is draggy. Is it better or worse than Jansen's?

I think that getting actual engineering feedback in order to determine a good score is going to be the big problem here, not the actual search.

Is string interning really useful?

11 votes

I was having a conversation about strings and various languages a while back, and the topic of string interning came up. Apparently Java and the .NET framework do this automatically with all strings, as well as several scripting languages. Theoretically, it saves memory because you don't end up with multiple copies of the same string, and it saves time because string equality comparisons are a simple pointer comparison instead of an O(N) run through each character of the string.

But the more I think about it, the more skeptical I grow of the concept's benefits. It seems to me that the advantages are mostly theoretical:

  • First off, to use automatic string interning, all strings must be immutable, which makes a lot of string processing tasks harder than they need to be. (And yes, I've heard all the arguments for immutability in general. That's not the point.)
  • Every time a new string is created, it has to be checked against the string interning table, which is at least a O(N) operation. (EDIT: Where N is the size of the string, not the size of the table, since this was confusing people.) So unless the ratio of string equality comparisons to new string creation is pretty high, it's unlikely that the net time saved is a positive value.
  • If the string equality table uses strong references, the strings will never get garbage collected when they're no longer needed, thus wasting memory. On the other hand, if the table uses weak references, then the string class requires some sort of finalizer to remove the string from the table, thus slowing down the GC process. (Which could be pretty significant, depending on how the string intern table is implemented. Worst case, deleting an item from a hash table can require an O(N) rebuild of the entire table under certain circumstances.)

This is just the result of me thinking about implementation details. Is there something I've missed? Does string interning actually provide any significant benefits in the general case?

EDIT 2: All right, apparently I was operating from a mistaken premise. The person I was talking to never pointed out that string interning was optional for newly-created strings, and in fact gave the strong impression that the opposite was true. Thanks to Jon for setting the matter straight. Another accepted answer for him.

No, Java and .NET don't do it "automatically with all strings". They (well, Java and C#) do it with constant string expressions expressed in bytecode/IL, and on demand via the String.intern and String.Intern (.NET) methods. The exact situation in .NET is interesting, but basically the C# compiler will guarantee that every reference to an equal string constant within an assembly ends up referring to the same string object. That can be done efficiently at type initialization time, and can save a bunch of memory.

It doesn't happen every time a new string is created.

(On the string immutability front, I for one am extremely glad that strings are immutable. I don't want to have to take a copy every time I receive a parameter etc, thank you very much. I haven't seen it make string processing tasks harder, either...)

And as others have pointed out, looking up a string in a hash table isn't generally an O(n) operation, unless you're incredibly unlucky with hash collisions...

Personally I don't use string interning in user-land code; if I want some sort of cache of strings I'll create a HashSet<string> or something similar. That can be useful in various situations where you expect to come across the same strings several times (e.g. XML element names) but with a simple collection you don't pollute a system-wide cache.

Class decorators vs function decorators

11 votes

In python there are two ways to declare decorators:

Class based

class mydecorator(object):
    def __init__(self, f):
        self.f = f

    def __call__(self, *k, **kw):
        # before f actions
        self.f(*k, **kw)
        # after f actions

Function based

def mydecorator(f):
    def decorator(*k, **kw):
        # before f actions
        f(*k, **kw)
        # after f actions

    return decorator          

Is there any difference between these declarations? In which cases each of them should be used?

If you want to keep state in the decorator you should use a class.

For example, this does not work

def mydecorator(f):
    x = 0 
    def decorator():
        x += 1 # x is a nonlocal name and cant be modified
        return f(x)
    return decorator 

There are many workarounds for this but the simplest way is to use a class

class mydecorator(object):
    def __init__(self, f):
        self.f = f
        self.x = 0

    def __call__(self, *k, **kw):
        self.x += 1
        return f(self.x)

How to efficiently do many tasks a "little later" in Python?

11 votes

I have a process, that needs to perform a bunch of actions "later" (after 10-60 seconds usually). The problem is that those "later" actions can be a lot (1000s), so using a Thread per task is not viable. I know for the existence of tools like gevent and eventlet, but one of the problem is that the process uses zeromq for communication so I would need some integration (eventlet already has it).

What I'm wondering is What are my options? So, suggestions are welcome, in the lines of libraries (if you've used any of the mentioned please share your experiences), techniques (Python's "coroutine" support, use one thread that sleeps for a while and checks a queue), how to make use of zeromq's poll or eventloop to do the job, or something else.


Update: Added a little bounty. C'mon fellas, I'm asking for suggestions only, not a ready-made solution.

consider using a priority queue with one or more worker threads to service the tasks. The main thread can add work to the queue, with a timestamp of the soonest it should be serviced. Worker threads pop work off the queue, sleep until the time of priority value is reached, do the work, and then pop another item off the queue.

How about a more fleshed out answer. mklauber makes a good point. If there's a chance all of your workers might be sleeping when you have new, more urgent work, then a queue.PriorityQueue isn't really the solution, although a "priority queue" is still the technique to use, which is available from the heapq module. Instead, we'll make use of a different synchronization primitive; a condition variable, which in python is spelled threading.Condition.

The approach is fairly simple, peek on the heap, and if the work is current, pop it off and do that work. If there was work, but it's scheduled into the future, just wait on the condition until then, or if there's no work at all, sleep forever.

The producer does it's fair share of the work; every time it adds new work, it notifies the condition, so if there are sleeping workers, they'll wake up and recheck the queue for newer work.

import heapq, time, threading

START_TIME = time.time()
SERIALIZE_STDOUT = threading.Lock()
def consumer(message):
    """the actual work function.  nevermind the locks here, this just keeps
       the output nicely formatted.  a real work function probably won't need
       it, or might need quite different synchronization"""
    SERIALIZE_STDOUT.acquire()
    print time.time() - START_TIME, message
    SERIALIZE_STDOUT.release()

def produce(work_queue, condition, timeout, message):
    """called to put a single item onto the work queue."""
    prio = time.time() + float(timeout)
    condition.acquire()
    heapq.heappush(work_queue, (prio, message))
    condition.notify()
    condition.release()

def worker(work_queue, condition):
    condition.acquire()
    stopped = False
    while not stopped:
        now = time.time()
        if work_queue:
            prio, data = work_queue[0]
            if data == 'stop':
                stopped = True
                continue
            if prio < now:
                heapq.heappop(work_queue)
                condition.release()
                # do some work!
                consumer(data)
                condition.acquire()
            else:
                condition.wait(prio - now)
        else:
            # the queue is empty, wait until notified
            condition.wait()
    condition.release()

if __name__ == '__main__':
    # first set up the work queue and worker pool
    work_queue = []
    cond = threading.Condition()
    pool = [threading.Thread(target=worker, args=(work_queue, cond))
            for _ignored in range(4)]
    map(threading.Thread.start, pool)

    # now add some work
    produce(work_queue, cond, 10, 'Grumpy')
    produce(work_queue, cond, 10, 'Sneezy')
    produce(work_queue, cond, 5, 'Happy')
    produce(work_queue, cond, 10, 'Dopey')
    produce(work_queue, cond, 15, 'Bashful')
    time.sleep(5)
    produce(work_queue, cond, 5, 'Sleepy')
    produce(work_queue, cond, 10, 'Doc')

    # and just to make the example a bit more friendly, tell the threads to stop after all
    # the work is done
    produce(work_queue, cond, float('inf'), 'stop')
    map(threading.Thread.join, pool)

Best way to do conditional assignment in python

11 votes

I tend to use this a lot, but it's ugly:

a = (lambda x: x if x else y)(get_something())

So I wrote this function:

def either(val, alt):
    if val:
        return val
    else:
        return alt

So you can do:

a = either(get_something(), y)

Is there a built-in function for this like isnull in T-SQL? Or am I missing something really obvious?

The or operator does what you want:

get_something() or y

In fact, it's chainable, like COALESCE (and unlike ISNULL). The following expression evaluates to the left-most argument that converts to True.

A or B or C

How to find a fix number of (almost) fixed proportion rectangles with opencv?

10 votes

I am writing a simple fly tracking software and I would love some input from opencv experts.

The image I have looks pretty much like:

sample image for fly tracking

I used to do tracking using kmeans and PIL/numpy but I re-wrote everything to use blob detection in opencv. Tracking works OK but I would also like to automatize division of ROI. What I need to do is find each of the 32 grooves that appear in the picture, where flies live. See the black rectangle on the image as example of what I mean.

I think cornerHarris may be what I need but how do I specify only the grooves and not each single rectangle found in the image? All those grooves have proportions of roughly 10:1.

Thanks!

I don't think cvCornerHarris is even close to what you need.

A much better start would be to experiment with the demo available at: OpenCV-2.3.0/samples/cpp/squares.cpp. This technique uses Canny(), dilate() and findCountour().

Right out of the box, this demo outputs:

enter image description here

I believe that with a few tweaks here and there you can have your party started.

Can a Python list, set or dictionary be implemented invisibly using a database?

9 votes

The Python native capabilities for lists, sets & dictionaries totally rock. Is there a way to continue using the native capability when the data becomes really big? The problem I'm working on involved matching (intersection) of very large lists. I haven't pushed the limits yet -- actually I don't really know what the limits are -- and don't want to be surprised with a big reimplementation after the data grows as expected.

Is it reasonable to deploy on something like Google App Engine that advertises no practical scale limit and continue using the native capability as-is forever and not really think about this?

Is there some Python magic that can hide whether the list, set or dictionary is in Python-managed memory vs. in a DB -- so physical deployment of data can be kept distinct from what I do in code?

How do you, Mr. or Ms. Python Super Expert, deal with lists, sets & dicts as data volume grows?

I'm not quite sure what you mean by native capabilities for lists, sets & dictionaries. However, you can create classes that emulate container types and sequence types by defining some methods with special names. That means that you could create a class that behaves like a list, but stores its data in a SQL database or on GAE datastore. Simply speaking, this is what an ORM does. However, mapping objects to a database is very complicated and it is probably not a good idea to invent your own ORM, but to use an existing one.

I'm afraid there is no one-size-fits-all solution. Especially GAE is not some kind of of Magic Fairy Dust you can sprinkle on your code to make it scale. There are several limitations you have to keep in mind to create an application that can scale. Some of them are general, like computational complexity, others are specific to the environment your code runs in. E.g. on GAE the maximum response time is limited to 30 seconds and querying the datastore works different that on other databases.

It's hard to give any concrete advice without knowing your specific problem, but I doubt that GAE is the right solution.

In general, if you want to work with large datasets, you either have to keep that in mind from the start or you will have to rework your code, algorithms and data structures as the datasets grow.

how to create uncollectable garbage in python?

7 votes

I have a large long-running server, and, over weeks the memory usage steadily climbs.

Generally, as pointed out below, its unlikely that leaks are my problem; however, I have not got a lot to go on so I want to see if there are any leaks.

Getting at console output is tricky so I'm not running with gc.set_debug(). This is not a big problem though, as I have easily added an API to get it to run gc.collect() and then iterate through gc.garbage and send the results back out to me over HTTP.

My problem is that running it locally for a short time my gc.garbage is always empty. I can't test my bit of code that lists the leaks before I deploy it.

Is there a trivial recipe for creating an uncollectable bit of garbage so I can test my code that lists the garbage?

Any cycle of finalizable objects (that is, objects with a __del__ method) is uncollectable (because the garbage collector does not know which order to run the finalizers in):

>>> class Finalizable:
...     def __del__(self): pass
...
>>> a = Finalizable()
>>> b = Finalizable()
>>> a.x = b
>>> b.x = a
>>> del a
>>> del b
>>> import gc
>>> gc.collect()
4
>>> gc.garbage
[<__main__.Finalizable instance at 0x1004e0b48>,
 <__main__.Finalizable instance at 0x1004e73f8>]

But as a general point, it seems unlikely to me that your problem is due to uncollectable garbage, unless you are in the habit of using finalizers. It's more likely due to the accumulation of live objects, or to fragmentation of memory (since Python uses a non-moving collector).

Detect argument passing convention of a C library function

7 votes

With pure Python functions you can pass arguments either by order (e.g. foo(1, 2, 3)) or by name (e.g. foo(a=1, c=3, b=2)).

Functions defined in C modules can use either convention. You cannot say range(stop=10, step=2), and so it is with most but not all functions implemented using C interface.

Is there a way to determine the argument passing convention of a function from within Python?

It appears to be an open bug: there is simply no way to tell this. Also, the issue is implementation-dependent: your code might work in (for example) PyPy, though I can't confirm this.

The devs on the bug page aren't sure whether to change the documentation style or the implementation, but I get the impression it's not a pressing issue for them either way.