Best questions in April 2012

Algorithm improvement for Coca-Cola can shape recognition

255 votes

One of the most interesting projects I've worked in the past couple years as I was still a student, was a final project about image processing. The goal was to develop a system to be able to recognize Coca-Cola cans (note that I'm stressing the word cans, you'll see why in a minute). You can see a sample below, with the can recognized in the green rectangle with scale and rotation.

Template matching

Some contraints on the project:

  • The background could be very noisy.
  • The can could have any scale or rotation or even orientation (within reasonable limits)
  • The image could have some degree of fuziness (contours could be not really straight)
  • There could be Coca-Cola bottles in the image, and the algorithm should only detect the can !
  • The brightness of the image could vary a lot (so you can't rely "too much" on color detection.
  • The can could be partly hidden on the sides or the middle (and possibly partly hidden behind the bottle !)
  • There could be no cans at all in the image, in which case you had to find nothing and write a message saying so.

So you could end up with tricky things like this (which in this case had my algorithm totally fail):

Total fail

Now I've done this project obviously as it was a while ago, and had a lot of fun doing it, and I had a decent implementation. Here are some details about my implementation:

Language: Done in C++ using OpenCV library.

Pre-processing: Regarding image pre-processing I mean how to transform it in a more raw form to give to the algorithm. I used 2 methods:

  1. Changing color domain from RGB to HSV (Hue Saturation Value) and filtering based on "red" hue, saturation above a certain threshold to avoid orange-like colors, and filtering of low value to avoid dark tones. The end result was a binary black and white image, where all white pixels would represent the pixels that match this threshold. Obviously there is still a lot of crap in the image, but this reduces the number of dimensions you have to work with). Binarized image
  2. Noise filtering using median filtering (taking the median pixel value of all neighbors and replace the pixel by this value) to reduce noise.
  3. Using Canny Edge Detection Filter to get the contours of all items after 2 precedent steps. Contour detection

Algorithm: The algorithm itself I chose for this task was taken from this (awesome) book on feature extraction and called Generalized Hough Transform (pretty different from the regular Hough Transform). It basically says a few things:

  • You can describe an object in space without knowing its analytical equation (which is the case here).
  • It is resistent to image deformations such as scaling and rotation, as it will basically test your image for every combination of scale factor and rotation factor.
  • It uses a base model (a template) that the algorithm will "learn".
  • Each pixel remaining in the contour image will vote for another pixel which will supposedly be the center (in terms of gravity) of your object, based on what it learned from the model.

In the end, you end up with a heat map of the votes, for example here all the pixels of the contour of the can will vote for its gravitational center, so you'll have a lot of votes in the same pixel corresponding to the center, and will see a peak in the heat map as below.

GHT

Once you have that, a simple threshold-based heuristic can give you the location of the center pixel, from which you can derive the scale and rotation and then plot your little rectangle around it (final scale and rotation factor will obviously be relative to your original template). In theory at least...

Results: Now, while this approach worked in the basic cases, it was severely lacking in some areas:

  • It is extremely slow ! I'm not stressing this enough. Almost a full day was needed to process the 30 test images, obvisouly because I had a very high scaling factor for rotation and translation, since some of the cans were very small.
  • It was completely lost when bottles were in the image, and for some reason almost always found the bottle instead of the can (perhaps because bottles were bigger, thus had more pixels, thus more votes)
  • Fuzzy images were also no good, since the votes ended up in pixel at random locations around the center, thus ending with a very noisy heat map.
  • Invariance in translation and rotation was achieved, but not in orientation, meaning that a can that was not directly facing the camera objective wasn't recognized.

Can you help me improve my specific algorithm, using exclusively OpenCV features, to resolve the four specific issues mentionned?

I hope some people will also learn something out of it as well, after all I think not only people who ask questions should learn :)

An alternative approach would be to extract features (keypoints) using Scale Invariant Feature Transform (SIFT) or Speeded Up Robust Features (SURF).

It is implemented in openCV 2.3.1

A nice code example using features you find here <---

Both algorithms are invariant to scaling and rotation, since the work with features you can also handle occulsion (as long as enough keypoints are visible).

enter image description here Image source: tutoial example

The processing takes a few hundred ms for SIFT, SURF is bit faster but not suitable for realtime applications. ORB uses FAST wich is weaker regarding rotation invariance.

The original papers

Are the days of passing const std::string & as a parameter over?

101 votes

I heard a recent talk by Herb Sutter who suggested that the reasons to pass std::vector and std::string by const & are largely gone. He suggested that writing a function such as the following is now preferable:

std::string do_something ( std::string inval )
{
   std::string return_val;
   // ... do stuff ...
   return return_val;
}

I understand that the return_val will be an rvalue at the point the function returns and can therefore be returned using move semantics, which are very cheap. However, inval is still much larger than the size of a reference (which is usually implemented as a pointer). This is because a std::string has various components including a pointer into the heap and a member char[] for short string optimization. So it seems to me that passing by reference is still a good idea.

Can anyone explain why Herb might have said this?

The reason Herb said what he said is because of cases like this.

Let's say I have function A which calls function B, which calls function C. And A passes a string through B and into C. A does not know or care about C; all A knows about is B. That is, C is an implementation detail of B.

Let's say that A is defined as follows:

void A()
{
  B("value");
}

If B and C take the string by const&, then it looks something like this:

void B(const std::string &str)
{
  C(str);
}

void C(const std::string &str)
{
  //Do something with `str`. Does not store it.
}

All well and good. You're just passing pointers around, no copying, no moving, everyone's happy. C takes a const& because it doesn't store the string. It simply uses it.

Now, I want to make one simple change: C needs to store the string somewhere.

void C(const std::string &str)
{
  //Do something with `str`.
  m_str = str;
}

Hello, copy constructor and potential memory allocation (ignore SSO). C++11's move semantics are supposed to make it possible to remove needless copy-constructing, right? And A passes a temporary; there's no reason why C should have to copy the data. It should just abscond with what was given to it.

Except it can't. Because it takes a const&.

If I change C to take its parameter by value, that just causes B to do the copy into that parameter; I gain nothing.

So if I had just passed str by value through all of the functions, relying on std::move to shuffle the data around, we wouldn't have this problem. If someone wants to hold on to it, they can. If they don't, oh well.

Is it more expensive? Yes; moving into a value is more expensive than using references. Is it less expensive than the copy? Not for small strings with SSO. Is it worth doing?

It depends on your use case. How much do you hate memory allocations?

Why does GCC generate such radically different assembly for nearly the same C code?

89 votes

While writing an optimized ftol function I found some very odd behaviour in GCC 4.6.1. Let me show you the code first (for clarity I marked the differences):

fast_trunc_one, C:

int fast_trunc_one(int i) {
    int mantissa, exponent, sign, r;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);
    sign = i & 0x80000000;

    if (exponent < 0) {
        r = mantissa << -exponent;                       /* diff */
    } else {
        r = mantissa >> exponent;                        /* diff */
    }

    return (r ^ -sign) + sign;                           /* diff */
}

fast_trunc_two, C:

int fast_trunc_two(int i) {
    int mantissa, exponent, sign, r;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);
    sign = i & 0x80000000;

    if (exponent < 0) {
        r = (mantissa << -exponent) ^ -sign;             /* diff */
    } else {
        r = (mantissa >> exponent) ^ -sign;              /* diff */
    }

    return r + sign;                                     /* diff */
}

Seems the same right? Well GCC disagrees. After compiling with gcc -O3 -S -Wall test.s test.c this is the assembly output:

fast_trunc_one, generated:

_fast_trunc_one:
LFB0:
    .cfi_startproc
    movl    4(%esp), %eax
    movl    $150, %ecx
    movl    %eax, %edx
    andl    $8388607, %edx
    sarl    $23, %eax
    orl $8388608, %edx
    andl    $255, %eax
    subl    %eax, %ecx
    movl    %edx, %eax
    sarl    %cl, %eax
    testl   %ecx, %ecx
    js  L5
    rep
    ret
    .p2align 4,,7
L5:
    negl    %ecx
    movl    %edx, %eax
    sall    %cl, %eax
    ret
    .cfi_endproc

fast_trunc_two, generated:

_fast_trunc_two:
LFB1:
    .cfi_startproc
    pushl   %ebx
    .cfi_def_cfa_offset 8
    .cfi_offset 3, -8
    movl    8(%esp), %eax
    movl    $150, %ecx
    movl    %eax, %ebx
    movl    %eax, %edx
    sarl    $23, %ebx
    andl    $8388607, %edx
    andl    $255, %ebx
    orl $8388608, %edx
    andl    $-2147483648, %eax
    subl    %ebx, %ecx
    js  L9
    sarl    %cl, %edx
    movl    %eax, %ecx
    negl    %ecx
    xorl    %ecx, %edx
    addl    %edx, %eax
    popl    %ebx
    .cfi_remember_state
    .cfi_def_cfa_offset 4
    .cfi_restore 3
    ret
    .p2align 4,,7
L9:
    .cfi_restore_state
    negl    %ecx
    sall    %cl, %edx
    movl    %eax, %ecx
    negl    %ecx
    xorl    %ecx, %edx
    addl    %edx, %eax
    popl    %ebx
    .cfi_restore 3
    .cfi_def_cfa_offset 4
    ret
    .cfi_endproc

That's an extreme difference. This actually shows up on the profile too, fast_trunc_one is around 30% faster than fast_trunc_two. Now my question: what is causing this?

Updated to sync with the OP's edit

By tinkering with the code, I've managed to see how GCC optimizes the first case.

Before we can understand why they are so different, first we must understand how GCC optimizes fast_trunc_one().

Believe it or not, fast_trunc_one() is being optimized to this:

int fast_trunc_one(int i) {
    int mantissa, exponent;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);

    if (exponent < 0) {
        return (mantissa << -exponent);             /* diff */
    } else {
        return (mantissa >> exponent);              /* diff */
    }
}

This produces the exact same assembly as the original fast_trunc_one() - register names and everything.

Notice that there are no xors in the assembly for fast_trunc_one(). That's what gave it away for me.


How so?


Step 1: sign = -sign

First, let's take a look at the sign variable. Since sign = i & 0x80000000;, there are only two possible values that sign can take:

  • sign = 0
  • sign = 0x80000000

Now recognize that in both cases, sign == -sign. Therefore, when I change the original code to this:

int fast_trunc_one(int i) {
    int mantissa, exponent, sign, r;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);
    sign = i & 0x80000000;

    if (exponent < 0) {
        r = mantissa << -exponent;
    } else {
        r = mantissa >> exponent;
    }

    return (r ^ sign) + sign;
}

It produces the exact same assembly as the original fast_trunc_one(). I'll spare you the assembly, but it is identical - register names and all.


Step 2: Mathematical reduction: x + (y ^ x) = y

sign can only take one of two values, 0 or 0x80000000.

  • When x = 0, then x + (y ^ x) = y then trivial holds.
  • Adding and xoring by 0x80000000 is the same. It flips the sign bit. Therefore x + (y ^ x) = y also holds when x = 0x80000000.

Therefore, x + (y ^ x) reduces to y. And the code simplifies to this:

int fast_trunc_one(int i) {
    int mantissa, exponent, sign, r;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);
    sign = i & 0x80000000;

    if (exponent < 0) {
        r = (mantissa << -exponent);
    } else {
        r = (mantissa >> exponent);
    }

    return r;
}

Again, this compiles to the exact same assembly - register names and all.


This above version finally reduces to this:

int fast_trunc_one(int i) {
    int mantissa, exponent;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);

    if (exponent < 0) {
        return (mantissa << -exponent);             /* diff */
    } else {
        return (mantissa >> exponent);              /* diff */
    }
}

which is pretty much exactly what GCC generates in the assembly.


So why doesn't the compiler optimize fast_trunc_two() to the same thing?

The key part in fast_trunc_one() is the x + (y ^ x) = y optimization. In fast_trunc_two() the x + (y ^ x) expression is being split across the branch.

I suspect that might be enough to confuse GCC to not make this optimization. (It would need to hoist the ^ -sign out of the branch and merge it into the r + sign at the end.)

For example, this produces the same assembly as fast_trunc_one():

int fast_trunc_two(int i) {
    int mantissa, exponent, sign, r;

    mantissa = (i & 0x07fffff) | 0x800000;
    exponent = 150 - ((i >> 23) & 0xff);
    sign = i & 0x80000000;

    if (exponent < 0) {
        r = ((mantissa << -exponent) ^ -sign) + sign;             /* diff */
    } else {
        r = ((mantissa >> exponent) ^ -sign) + sign;              /* diff */
    }

    return r;                                     /* diff */
}

Why are Standard iterator ranges [begin, end) instead of [begin, end]?

84 votes

Why does the Standard define end() as one past the end, instead of at the actual end?

The best argument easily is the one made by Dijkstra himself:

  • You want the size of the range to be a simple difference end − begin;

  • including the lower bound is more "natural" when sequences degenerate to empty ones, and also because the alternative (excluding the lower bound) would require the existence of a "one-before-the-beginning" sentinel value.

You still need to justify why you start counting at zero rather than one, but that wasn't part of your question.

The wisdom behind the [begin, end) convention pays off time and again when you have any sort of algorithm that deals with multiple nested or iterated calles to range-based constructions, which chain naturally. By contrast, using a doubly-closed range would incur off-by-ones and extremely unpleasant and noisy code. For example, consider a partition [n0, n1)[n1, n2)[n2,n3). Another example is the standard iteration loop for (it = begin; it != end; ++it), which runs end - begin times. The corresponding code would be much less readable if both ends were inclusive – and imagine how you'd handle empty ranges.

Finally, we can also make a nice argument why counting should start at zero: With the half-open convention for ranges that we just established, if you are given a range of N elements (say to enumerate the members of an array), then 0 is the natural "beginning" so that you can write the range as [0, N), without any awkward offsets or corrections.

In a nutshell: the fact that we don't see the number 1 everywhere in range-based algorithms is a direct consequence of, and motivation for, the [begin, end) convention.

What is this style of syntax in C?

75 votes

From sys.c line 123:

void *sys_call_table[__NR_syscalls] = 
{
    [0 ... __NR_syscalls-1] = sys_ni_syscall, #include <asm/unistd.h>
};

sys_call_table is a generic pointer to arrays, I can see that. However what is the notation:

[0 ... __NR_syscalls-1]

What is the ...?


EDIT:
I learned another C trick here: #include <asm/unistd.h> will be preprocessed and replaced with its content and assigned to [0 ... _NR_syscalls-1].

It is initialized using Designated Initializers.

The range based initialization is a gnu gcc extension.

To initialize a range of elements to the same value, write `[first ... last] = value'. This is a GNU extension. For example,

 int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };

It is not portable. Compiling with -pedantic with tell you so.

Why / when would it be appropriate to override ToString?

56 votes

I'm studying C# and I wonder what the point and benefit of overriding ToString might be, as shown in the example below.

Could this be done in some simpler way, using a common method without the override?

public string GetToStringItemsHeadings
{
    get { return string.Format("{0,-20} {1, -20}", "Office Email", "Private Email"); }
}


public override string ToString()
{
    string strOut = string.Format("{0,-20} {1, -20}", m_work, m_personal);
    return strOut;
}

  • Do you need to override ToString? No.

  • Can you get a string representation of your object in another way? Yes.

But by using ToString you are using a method that is common to all objects and thus other classes know about this method. For instance, whenever the .NET framework wants to convert an object to a string representation, ToString is a prime candidate (there are others, if you want to provide more elaborate formatting options).

Concretely,

Console.WriteLine(yourObject);

would invoke yourObject.ToString().

Which classes cannot be subclassed?

46 votes

Is there any rule about which built-in and standard library classes are not subclassable ("final")?

As of Python 3.3, here are a few examples:

  • bool
  • function
  • operator.itemgetter
  • slice

I found a question which deals with the implementation of "final" classes, both in C and pure Python.

I would like to understand what reasons may explain why a class is chosen to be "final" in the first place.

There seems to be two reasons for a class to be "final" in Python.

1. Violation of Class Invariant

Classes that follow Singleton pattern have an invariant that there's a limited (pre-determined) number of instances. Any violation of this invariant in a subclass will be inconsistent with the class' intent, and would not work correctly. Examples:

  • bool: True, False; see Guido's comments
  • NoneType: None
  • NotImplementedType: NotImplemented
  • ellipsis: Ellipsis

There may be cases other than the Singleton pattern in this category but I'm not aware of any.

2. No Persuasive Use Case

A class implemented in C requires additional work to allow subclassing (at least in CPython). Doing such work without a convincing use case is not very attractive, so volunteers are less likely to come forward. Examples:

Note 1:

I originally thought there were valid use cases, but simply insufficient interest, in subclassing of function and operator.itemgetter. Thanks to @agf for pointing out that the use cases offered here and here are not convincing (see @agf comments to the question).

Note 2:

My concern is that another Python implementation might accidentally allow subclassing a class that's final in CPython. This may result in non-portable code (a use case may be weak, but someone might still write code that subclasses function if their Python supports it). This can be resolved by marking in Python documentation all built-in and standard library classes that cannot be subclassed, and requiring that all implementations follow CPython behavior in that respect.

Note 3:

The message produced by CPython in all the above cases is:

TypeError: type 'bool' is not an acceptable base type

It is quite cryptic, as numerous questions on this subject show. I'll submit a suggestion to add a paragraph to the documentation that explains final classes, and maybe even change the error message to:

TypeError: type 'bool' is final (non-extensible)

What is the purpose of the extra braces in Switch case?

46 votes

I'm curious about this thing... see example

switch(x)
{
    case(a):
        {
        //do stuff
        }
        break;
    case(b):
        //do stuff
        break;
}

All my life ive done it like case b, but since c# allows me to use it, and visual studio allows me to collapse that thing, i am curious - what is the real difference between case a (with those braces) and case b?

Braces {} are used to define a scope for a set of operations. Bizarrely, the following will compile and work:

private void ConnectionStateChange(object sender, StateChangeEventArgs e)
{
    string s = "hi";
    switch(s)
    {
        case "hi":
            {
                int a = 1;
                a++;
            }
            {
                int a = 2;
                a++;
            }
            break;
    }

    {
        int a = 1;
        a++;
    }
    {
        int a = 2;
        a++;
    }
}

As you can see, in that one method I've created four variables, each called a. Each is entirely separate because, as local variables, they exist only within their own scope.

Does that make some sort of sense?

Braces around string literal in char array declaration valid? (e.g. char s[] = {"Hello World"})

46 votes

By accident I found that the line char s[] = {"Hello World"}; is properly compiled and seems to be treated the same as char s[] = "Hello World";. Isn't the first ({"Hello World"}) an array containing one element that is an array of char, so the declaration for s should read char *s[]? In fact if I change it to char *s[] = {"Hello World"}; the compiler accepts it as well, as expected.

Searching for an answer, the only place I found which mentioned this is this one but there is no citing of the standard.

So my question is, why the line char s[] = {"Hello World"}; is compiled although the left side is of type array of char and the right side is of type array of array of char?

Following is a working program:

#include<stdio.h>
int main() {
    char s[] = {"Hello World"};
    printf("%s", s); // Same output if line above is char s[] = "Hello World";
    return 0;
}

Thanks for any clarifications.

P.S. My compiler is gcc-4.3.4.

It's allowed because the standard says so: C99 section 6.7.8, §14:

An array of character type may be initialized by a character string literal, optionally enclosed in braces. Successive characters of the character string literal (including the terminating null character if there is room or if the array is of unknown size) initialize the elements of the array.

What this means is that both

char s[] = { "Hello World" };

and

char s[] = "Hello World";

are nothing more than syntactic sugar for

char s[] = { 'H', 'e', 'l', 'l', 'o', ' ', 'W', 'o', 'r', 'l', 'd', 0 };

On a related note (same section, §11), C also allows braces around scalar initializers like

int foo = { 42 };

which, incidentally, fits nicely with the syntax for compound literals

(int){ 42 }

How to write iOS app purely in C

46 votes

I read here Learn C Before Objective-C?

Usually I then replace some Obj-C code with pure C code (after all you can mix them as much as you like, the content of an Obj-C method can be entirely, pure C code)

Is this true?

Could I build an iPhone app purely in the C programming language?

Damn, it took me a while but I got it:

main.m:

int main(int argc, char *argv[])
{
    @autoreleasepool {
        // Nothing here has changed, just simply launching the app
        return UIApplicationMain(argc, argv, nil, @"AppDelegate");
    }
}

AppDelegate.m:

#import <objc/runtime.h>
#import <objc/message.h>

// This is equivalent to creating a @class with one public variable named 'window'.
struct AppDel
{
    Class isa;

    id window;
};

// This is a strong reference to the class of the AppDelegate 
// (same as [AppDelegate class])
Class AppDelClass;

// this is the entry point of the application, same as -application:didFinishLaunchingWithOptions:
// note the fact that we use `void *` for the 'application' and 'options' fields, as we need no reference to them for this to work. A generic id would suffice here as well.
BOOL AppDel_didFinishLaunching(struct AppDel *self, SEL _cmd, void *application, void *options)
{
    // we +alloc and -initWithFrame: our window here, so that we can have it show on screen (eventually).
    // this entire method is the objc-runtime based version of the standard View-Based application's launch code, so nothing here really should surprise you.
    // one thing important to note, though is that we use `sel_getUid()` instead of @selector().
    // this is because @selector is an objc language construct, and the application would not have been created in C if I used @selector.
    self->window = objc_msgSend(objc_getClass("UIWindow"), sel_getUid("alloc"));
    self->window = objc_msgSend(self->window, sel_getUid("initWithFrame:"), (struct CGRect) { 0, 0, 320, 480 });

    // here, we are creating our view controller, and our view. note the use of objc_getClass, because we cannot reference UIViewController directly in C.
    id viewController = objc_msgSend(objc_msgSend(objc_getClass("UIViewController"), sel_getUid("alloc")), sel_getUid("init"));

    // creating our custom view class, there really isn't too much 
    // to say here other than we are hard-coding the screen's bounds, 
    // because returning a struct from a `objc_msgSend()` (via 
    // [[UIScreen mainScreen] bounds]) requires a different function call
    // and is finicky at best.
    id view = objc_msgSend(objc_msgSend(objc_getClass("View"), sel_getUid("alloc")), sel_getUid("initWithFrame:"), (struct CGRect) { 0, 0, 320, 480 });

    // here we simply add the view to the view controller, and add the viewController to the window.
    objc_msgSend(objc_msgSend(viewController, sel_getUid("view")), sel_getUid("addSubview:"), view);
    objc_msgSend(self->window, sel_getUid("setRootViewController:"), viewController);

    // finally, we display the window on-screen.
    objc_msgSend(self->window, sel_getUid("makeKeyAndVisible"));

    return YES;
}

// note the use of the gcc attribute extension (constructor). 
// Basically, this lets us run arbitrary code before program startup,
// for more information read here: http://stackoverflow.com/questions/2053029
__attribute__((constructor))
static void initAppDel()
{
    // This is objc-runtime gibberish at best. We are creating a class with the 
    // name "AppDelegate" that is a subclass of "UIResponder". Note we do not need
    // to register for the UIApplicationDelegate protocol, that really is simply for 
    // Xcode's autocomplete, we just need to implement the method and we are golden.
    AppDelClass = objc_allocateClassPair(objc_getClass("UIResponder"), "AppDelegate", 0);

    // Here, we tell the objc runtime that we have a variable named "window" of type 'id'
    class_addIvar(AppDelClass, "window", sizeof(id), 0, "@");

    // We tell the objc-runtime that we have an implementation for the method
    // -application:didFinishLaunchingWithOptions:, and link that to our custom 
    // function defined above. Notice the final parameter. This tells the runtime
    // the types of arguments received by the function.
    class_addMethod(AppDelClass, sel_getUid("application:didFinishLaunchingWithOptions:"), (IMP) AppDel_didFinishLaunching, "i@:@@");

    // Finally we tell the runtime that we have finished describing the class and 
    // we can let the rest of the application use it.
    objc_registerClassPair(AppDelClass);
}

View.m

#include <objc/runtime.h>

// This is a strong reference to the class of our custom view,
// In case we need it in the future.
Class ViewClass;

// This is a simple -drawRect implementation for our class. We could have 
// used a UILabel  or something of that sort instead, but I felt that this 
// stuck with the C-based mentality of the application.
void View_drawRect(id self, SEL _cmd, struct CGRect rect)
{
    // We are simply getting the graphics context of the current view, 
    // so we can draw to it
    CGContextRef context = UIGraphicsGetCurrentContext();

    // Then we set it's fill color to white so that we clear the background.
    // Note the cast to (CGFloat []). Otherwise, this would give a warning
    //  saying "invalid cast from type 'int' to 'CGFloat *', or 
    // 'extra elements in initializer'. Also note the assumption of RGBA.
    // If this wasn't a demo application, I would strongly recommend against this,
    // but for the most part you can be pretty sure that this is a safe move 
    // in an iOS application.
    CGContextSetFillColor(context, (CGFloat []){ 1, 1, 1, 1 });

    // here, we simply add and draw the rect to the screen
    CGContextAddRect(context, (struct CGRect) { 0, 0, 320, 480 });
    CGContextFillPath(context);

    // and we now set the drawing color to red, then add another rectangle
    // and draw to the screen
    CGContextSetFillColor(context, (CGFloat []) { 1, 0, 0, 1 });
    CGContextAddRect(context, (struct CGRect) { 10, 10, 20, 20 });
    CGContextFillPath(context);
}

// Once again we use the (constructor) attribute. generally speaking, 
// having many of these is a very bad idea, but in a small application 
// like this, it really shouldn't be that big of an issue.
__attribute__((constructor))
static void initView()
{
    // Once again, just like the app delegate, we tell the runtime to 
    // create a new class, this time a subclass of 'UIView' and named 'View'.
    ViewClass = objc_allocateClassPair(objc_getClass("UIView"), "View", 0);

    // and again, we tell the runtime to add a function called -drawRect: 
    // to our custom view. Note that there is an error in the type-specification
    // of this method, as I do not know the @encode sequence of 'CGRect' off 
    // of the top of my head. As a result, there is a chance that the rect 
    // parameter of the method may not get passed properly.
    class_addMethod(ViewClass, sel_getUid("drawRect:"), (IMP) View_drawRect, "v@:");

    // And again, we tell the runtime that this class is now valid to be used. 
    // At this point, the application should run and display the screenshot shown below.
    objc_registerClassPair(ViewClass);    
}

It's ugly, but it works.

If you would like to download this, you can get it from my dropbox here

ScreenShot

What happens to memory after '\0' in a C string?

44 votes

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.

The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?

To give this some context, let's say I want to remove consecutive duplicates, like this:

input "Hello oOOOo !!" --> output "Helo oOo !"

... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.

char* RemoveChains(const char* str)
{
    if (str == NULL) {
        return NULL;
    }
    if (strlen(str) == 0) {
        char* outstr = (char*)malloc(1);
        *outstr = '\0';
        return outstr;
    }
    const char* original = str; // for reuse
    char prev = *str++;       // [prev][str][str+1]...
    unsigned int outlen = 1;  // first char auto-counted

    // Determine length necessary by mimicking processing
    while (*str) {
        if (*str != prev) { // new char encountered
            ++outlen;
            prev = *str; // restart chain
        }
        ++str; // step pointer along input
    }

    // Declare new string to be perfect size
    char* outstr = (char*)malloc(outlen + 1);
    outstr[outlen] = '\0';
    outstr[0] = original[0];
    outlen = 1;

    // Construct output
    prev = *original++;
    while (*original) {
        if (*original != prev) {
            outstr[outlen++] = *original;
            prev = *original;
        }
        ++original;
    }
    return outstr;
}

If I stick a '\0' into the middle of the allocated memory, does

(a.) free() still work properly, and

Yes.

(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?

Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.

Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.

But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).

It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.

Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?

Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.

Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.

When NOT to call super() method when overriding?

44 votes

When I make my own Android custom class, I extend its native class. Then when I want to override the base method, I always call super() method, just like I always do in onCreate, onStop, etc.

And I thought this is it, as from the very beginning Android team advised us to always call super on every method override.

But, in many books I can see that developers, more experienced than myself, often omit calling super and I really doubt they do it as a lack of knowledge. For example, look at this basic SAX parser class where super is omitted in startElement, characters and endElement:

public class SAXParser extends DefaultHandler{
    public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
        if(qName.equalsIgnoreCase("XXY")) {
            //do something
        }
    }

    public void characters(char[] ch, int start, int length) throws SAXException {
        //do something
    }

    public void endElement(String uri, String localName, String qName) throws SAXException {
        if(qName.equalsIgnoreCase("XXY")) {
            //do something
        }else () {
            //do something
        }
    }
}

If you try to create any override method via Eclipse or any other IDE, super will always be created as a part of automated process.

This was just a simple example. Books are full of similar code.

How do they know when you must call super and when you can omit it calling?

PS. Do not bind to this specific example. It was just an example randomly picked from many examples.

(This may sound like a beginner question, but I am really confused.)

By calling the super method, you're not overriding the behavior of the method, you're extending it.

A call to super will perform any logic the class you're extending has defined for that method. Take into account that it might be important the moment when you call super's implementation in your method overriding. For instance:

public class A { 
    public void save() { 
         // Perform save logic
    }
}

public class B extends A {
    private Object b;
    @Override
    public void save() { 
        super.save(); // Performs the save logic for A
        save(b); // Perform additional save logic
    }
}

A call to B.save() will perform the save() logic for both A and B, in this particular order. If you weren't calling super.save() inside B.save(), A.save() wouldn't be called. And if you called super.save() after save(b), A.save() would be effectively performed afterwards B.save().

If you want to override super's behavior (that is, fully ignore its implementation and provide it all yourself), you shouldn't be calling super.

In the SAXParser example you provide, the implementations of DefaultHandler for those methods are just empty, so that subclasses can override them and provide a behavior for those methods. In the javadoc for this method this is also pointed out.

public void startElement (String uri, String localName,
    String qName, Attributes attributes) throws SAXException {
    // no op
}

About the super() default call in code generated by IDEs, as @barsju pointed out in his comment, in each constructor there's an implicit call to super() (even if you don't write it in your code), which means, in that context, a call to super's default constructor. The IDE just writes it down for you, but it would also get called if you removed it. Also notice that when implementing constructors, super() or any of its variants with arguments (i.e. super(x,y,z)) can only be called at the very beginning of the method.

How foreach actually works

43 votes

Let me prefix this by saying that I know what foreach is, does and how to use it. This question concerns how it works under the bonnet, and I don't want any answers along the lines of "this is how you loop an array with foreach".


For a long time I assumed that foreach worked with the array itself. Then I found many references to the fact that it works with a copy of the array, and I have since assumed this to be the end of the story. But I recently got into a discussion on the matter, and after a little experimentation found that this was not in fact 100% true.

Let me show what I mean. For the following test cases, we will be working with the following array:

$array = array(1, 2, 3, 4, 5);

Test case 1:

foreach ($array as $item) {
  echo "$item\n";
  $array[] = $item;
}
print_r($array);

/* Output:
  1
  2
  3
  4
  5
  Array
  (
      [0] => 1
      [1] => 2
      [2] => 3
      [3] => 4
      [4] => 5
      [5] => 1
      [6] => 2
      [7] => 3
      [8] => 4
      [9] => 5
  )
*/

This clearly shows that we are not working directly with the source array - otherwise the loop would continue forever, since we are constantly pushing items onto the array during the loop. But just to be sure this is the case:

Test case 2:

foreach ($array as $key => $item) {
  $array[$key + 1] = $item + 2;
  echo "$item\n";
}

print_r($array);

/*
  1
  2
  3
  4
  5
  Array
  (
      [0] => 1
      [1] => 3
      [2] => 4
      [3] => 5
      [4] => 6
      [5] => 7
  )
*/

This backs up our initial conclusion, we are working with a copy of the source array during the loop, otherwise we would see the modified values during the loop. But...

If we look in the manual, we find this statement:

When foreach first starts executing, the internal array pointer is automatically reset to the first element of the array.

Right... this seems to suggest that foreach relies on the array pointer of the source array. But we've just proved that we're not working with the source array, right? Well, not entirely.

Test case 3:

// Move the array pointer on one to make sure it doesn't affect the loop
var_dump(each($array));

foreach ($array as $item) {
  echo "$item\n";
}

var_dump(each($array));

/* Output
  array(4) {
    [1]=>
    int(1)
    ["value"]=>
    int(1)
    [0]=>
    int(0)
    ["key"]=>
    int(0)
  }
  1
  2
  3
  4
  5
  bool(false)
*/

So, despite the fact that we are not working directly with the source array, we are working directly with the source array pointer - the fact that the pointer is at the end of the array at the end of the loop shows this. Except this can't be true - if it was, then test case 1 would loop forever.

The PHP manual also states:

As foreach relies on the internal array pointer changing it within the loop may lead to unexpected behavior.

Well, lets find out what that "unexpected behavior" is (technically, any behavior is unexpected since I no longer know what to expect).

Test case 4:

foreach ($array as $key => $item) {
  echo "$item\n";
  each($array);
}

/* Output
  1
  2
  3
  4
  5
*/

Test case 5:

foreach ($array as $key => $item) {
  echo "$item\n";
  reset($array);
}

/* Output
  1
  2
  3
  4
  5
*/

...nothing that unexpected there, in fact it seems to support the "copy of source" theory.


The Question

Please can someone explain what is going on here? My C++-fu is not good enough for me to able to extract a proper conclusion simply by looking at the PHP source code, I would appreciate it if someone could translate it into english for me.

It seems to me that foreach works with a copy of the array, but sets the array pointer of the source array to the end of the array after the loop.

  • Is this correct and the whole story?
  • If not, what is it really doing?
  • Is there any situation where using functions that adjust the array pointer (each(), reset() et al.) during a foreach could affect the outcome of the loop?

In example 3 you don't modify the array. In all other examples you modify either the contents or the internal array pointer. This is important when it comes to php arrays because of the semantics of the assignment operator.

The assignment operator for the arrays in php works more like a lazy clone. Assigning one variable to another that contains an array will clone the array, unlike most languages. However, the actual cloning will not be done unless it is needed. This means that the clone will take place only when either of the variables is modified (copy-on-write).

Here is an example:

$a = array(1,2,3);
$b = $a;  // This is lazy cloning of $a. For the time 
          // being $a and $b point to the same internal
          // data structure.

$a[] = 3; // Here $a changes, which triggers the actual 
          // cloning. From now on, $a and $b are two 
          // different data structures. The same would
          // happen if there were a change in $b.

Coming back to your test cases, you can easily imagine that foreach creates some kind of iterator with a reference to the array. This reference works exactly like the variable $b in my example. However, the iterator along with the reference live only during the loop and then, they are both discarded. Now you can see that, in all cases but 3, the array is modified during the loop, while this extra reference is alive. This triggers a clone, and that explains what's going on here!

Here is an excellent article for another side effect of this copy-on-write behaviour: The PHP Ternary Operator: Fast or not?

Why should I make the underlying type of an Enum Int32 instead of byte?

42 votes

Given the following enum:

public enum Operations_PerHourType : byte
{
    Holes = 1,
    Pieces = 2,
    Sheets = 3,
    Strips = 4,
    Studs = 5
}

When I run the Microsoft code analysis tool, it tells me:

CA1028 : Microsoft.Design : If possible, make the underlying type of 'Enums.Operations_PerHourType' System.Int32 instead of 'byte'.

It will never have more than a couple possible values, so I declared it as a byte. Why would they recommend using int32? More values for future scalability? Or is there a performance improvement?

Have a look on MSDN for the reason.

Here is an excerpt:

An enumeration is a value type that defines a set of related named constants. By default, the System.Int32 data type is used to store the constant value. Even though you can change this underlying type, it is not necessary or recommended for most scenarios. Note that no significant performance gain is achieved by using a data type that is smaller than Int32. If you cannot use the default data type, you should use one of the Common Language System (CLS)-compliant integral types, Byte, Int16, Int32, or Int64 to make sure that all values of the enumeration can be represented in CLS-compliant programming languages.

Sum range of int's in List<int>

40 votes

I reckon this will be quite trivial but I can't work out how to do it. I have a List<int> and I want to sum a range of the numbers.

Say my list is:

var list = new List<int>()
{
    1, 2, 3, 4
};

How would I get the sum of the first 3 objects? The result being 6. I tried using Enumberable.Range but couldn't get it to work, not sure if that's the best way of going about it.

Without doing:

int sum = list[0] + list[1] + list[2];

Thanks!

You can accomplish this by using Take & Sum:

var list = new List<int>()
{
    1, 2, 3, 4
};

// 1 + 2 + 3
int sum = list.Take(3).Sum(); // Result: 6

If you want to sum a range beginning elsewhere, you can use Skip:

var list = new List<int>()
{
    1, 2, 3, 4
};

// 3 + 4
int sum = list.Skip(2).Take(2).Sum(); // Result: 7

Or, reorder your list using OrderBy or OrderByDescending and then sum:

var list = new List<int>()
{
    1, 2, 3, 4
};

// 3 + 4
int sum = list.OrderByDescending(x => x).Take(2).Sum(); // Result: 7

As you can see, there are a number of ways to accomplish this task (or related tasks). See Take, Sum, Skip, OrderBy & OrderByDescending documentation for further information.

Why does Convert.ToString(null) return a different value if you cast null?

34 votes
Convert.ToString(null)

returns

null

As I expected.

But

Convert.ToString(null as object)

returns

""

Why are these different?

There are 2 overloads of ToString that come into play here

Convert.ToString(object o);
Convert.ToString(string s);

The C# compiler essentially tries to pick the most specific overload which will work with the input. A null value is convertible to any reference type. In this case string is more specific than object and hence it will be picked as the winner.

In the null as object you've solidified the type of the expression as object. This means it's no longer compatible with the string overload and the compiler picks the object overload as it's the only compatible one remaining.

The really hairy details of how this tie breaking works is covered in section 7.4.3 of the C# language spec.

32 votes

I have a multiply-add kernel inside my application and I want to increase its performance.

I use an Intel Core i7-960 (3.2 GHz clock) and have already manually implemented the kernel using SSE intrinsics as follows:

 for(int i=0; i<iterations; i+=4) {
    y1 = _mm_set_ss(output[i]);
    y2 = _mm_set_ss(output[i+1]);
    y3 = _mm_set_ss(output[i+2]);
    y4 = _mm_set_ss(output[i+3]);

    for(k=0; k<ksize; k++){
        for(l=0; l<ksize; l++){
            w  = _mm_set_ss(weight[i+k+l]);

            x1 = _mm_set_ss(input[i+k+l]);
            y1 = _mm_add_ss(y1,_mm_mul_ss(w,x1));
            …
            x4 = _mm_set_ss(input[i+k+l+3]);
            y4 = _mm_add_ss(y4,_mm_mul_ss(w,x4));
        }
    }
    _mm_store_ss(&output[i],y1);
    _mm_store_ss(&output[i+1],y2);
    _mm_store_ss(&output[i+2],y3);
    _mm_store_ss(&output[i+3],y4);
 }

I know I can use packed fp vectors to increase the performance and I already did so succesfully, but I want to know why the single scalar code isn't able to meet the processor's peak performance.

The performance of this kernel on my machine is ~1.6 FP operations per cycle, while the maximum would be 2 FP operations per cycle (since FP add + FP mul can be executed in parallel).

If I'm correct from studying the generated assembly code, the ideal schedule would look like follows, where the mov instruction takes 3 cycles, the switch latency from the load domain to the FP domain for the dependent instructions takes 2 cycles, the FP multiply takes 4 cycles and the FP add takes 3 cycles. (Note that the dependence from the multiply -> add doesn't incur any switch latency because the operations belong to the same domain).

schedule

According to the measured performance (~80% of the maximum theoretical performance) there is an overhead of ~3 instructions per 8 cycles.

I am trying to either:

  • get rid of this overhead, or
  • explain where it comes from

Of course there is the problem with cache misses & data misalignment which can increase the latency of the move instructions, but are there any other factors that could play a role here? Like register read stalls or something?

I hope my problem is clear, thanks in advance for your responses!


Update: The assembly of the inner-loop looks as follows:

...
Block 21: 
  movssl  (%rsi,%rdi,4), %xmm4 
  movssl  (%rcx,%rdi,4), %xmm0 
  movssl  0x4(%rcx,%rdi,4), %xmm1 
  movssl  0x8(%rcx,%rdi,4), %xmm2 
  movssl  0xc(%rcx,%rdi,4), %xmm3 
  inc %rdi 
  mulss %xmm4, %xmm0 
  cmp $0x32, %rdi 
  mulss %xmm4, %xmm1 
  mulss %xmm4, %xmm2 
  mulss %xmm3, %xmm4 
  addss %xmm0, %xmm5 
  addss %xmm1, %xmm6 
  addss %xmm2, %xmm7 
  addss %xmm4, %xmm8 
  jl 0x401b52 <Block 21> 
...

I noticed in the comments that:

  • The loop takes 5 cycles to execute.
  • It's "supposed" to take 4 cycles. (since there's 4 adds and 4 mulitplies)

However, your assembly shows 5 SSE movssl instructions. According to Agner Fog's tables all floating-point SSE move instructions are at least 1 inst/cycle reciprocal throughput for Nehalem.

Since you have 5 of them, you can't do better than 5 cycles/iteration.


So in order to get to peak performance, you need to reduce the # of loads that you have. How you can do that I can't see immediately this particular case - but it might be possible.

One common approach is to use tiling. Where you add nesting levels to improve locality. Although it's used mostly for improving cache access, it can also be used in registers to reduce the # of load/stores that are needed.

Ultimately, your goal is to reduce the number of loads to be less than the numbers of add/muls. So this might be the way to go.

Scope resolution operator

32 votes

I accidentally happened to find this in one of the source codes I was looking at. So, I'm giving a similar smaller example here.

In the file test.h:

#include<iostream>

class test{
    int i;
public:
    test(){}
    //More functions here
};

In the file test.cpp:

#include "test.h"

int main()
{
    test test1;
    test::test test2;
    test::test::test test3;
    return 0;
}

First of all, is there a reason to declare test2 that way? Secondly, this code compiles just fine in g++ version 4.4.3 and lower versions. Is there something in the C++ standard, saying, scope resolution operators are ignored when there is no need to resolve scope?

This code is not valid.

It was a bug in g++ that it accepted the code. See "g++ does not treat injected class name correctly." The bug was resolved as fixed in 2009, so it should be fixed in any recent version of g++.

Is C# Decimal Rounding Inconsistent?

32 votes

I've been fighting decimal precision in C# coming from a SQL Decimal (38,30) and I've finally made it all the way to a rounding oddity. I know I'm probably overlooking the obvious here, but I need a little insight.

The problem I'm having is that C# doesn't produce what I would consider to be consistent output.

decimal a = 0.387518769125m;
decimal b = 0.3875187691250002636113061835m;

Console.WriteLine(Math.Round(a, 11));
Console.WriteLine(Math.Round(b, 11));
Console.WriteLine(Math.Round(a, 11) == Math.Round(b, 11));

Yields

0.38751876912
0.38751876913
False

Uhh, 0.38751876913? Really? What am I missing here?

From MSDN:

If the digit in the decimals position is odd, it is changed to an even digit. Otherwise, it is left unchanged.

Why am I seeing inconsistent results? The additional precision isn't changing the 'digit in the decimals position'...

From MSDN:

If there is a single non-zero digit in d to the right of the decimals decimal position and its value is 5, the digit in the decimals position is rounded up if it is odd, or left unchanged if it is even. If d has fewer fractional digits than decimals, dis returned unchanged.

In your first case

decimal a = 0.387518769125m;
Console.WriteLine(Math.Round(a, 11));

there is a single digit to the right of the 11th place, and that number is 5. Therefore, since position 11 is even, it is left unchanged. Thus, you get

0.38751876912

In your second case

decimal b = 0.3875187691250002636113061835m;
Console.WriteLine(Math.Round(b, 11));

there is not a single digit to the right of the 11th place. Therefore, this is straight up grade-school rounding; you round up if the next digit is greater than 4, otherwise you round down. Since the digit to the right of the 11th place is more than 4 (it's a 5), we round up so you see

0.38751876913

Why am I seeing inconsistent results?

You're not. The results are completely consistent with the documentation.

Difference between the built-in pow() and math.pow() for floats, in Python?

32 votes

Is there a difference in the results returned by Python's built-in pow(x, y) (no third argument) and the values returned by math.pow()?

Edit: I am only interested in the case of two float arguments.

I am asking this question because the documentation for math.pow() implies that pow(x, y) (i.e. x**y) is essentially the same as math.pow(x, y):

math.pow(x, y)

Return x raised to the power y. Exceptional cases follow Annex ‘F’ of the C99 standard as far as possible. In particular, pow(1.0, x) and pow(x, 0.0) always return 1.0, even when x is a zero or a NaN. If both x and y are finite, x is negative, and y is not an integer then pow(x, y) is undefined, and raises ValueError.

Changed in version 2.6: The outcome of 1**nan and nan**0 was undefined.

Note the last line: the documentation implies that the behavior of math.pow() is that of the exponentiation operator ** (and therefore of pow(x, y)). Is this officially guaranteed?

Background: My goal is to provide an implementation of both the built-in pow() and of math.pow() for numbers with uncertainty that behaves in the same way as with regular Python floats (same numerical results, same exceptions, same results for corner cases, etc.). I have already implemented something that works quite well, but there are some corner cases that need to be handled.

Quick Check

From the signatures, we can tell that they are different:

pow(x, y[, z])

math.pow(x, y)

Also, trying it in the shell will give you a quick idea:

>>> pow is math.pow
False

Testing the differences

Another way to understand the differences in behaviour between the two functions is to test for them:

import math
import traceback
import sys

inf = float("inf")
NaN = float("nan")

vals = [inf, NaN, 0.0, 1.0, 2.2, -1.0, -0.0, -2.2, -inf, 1, 0, 2]

tests = set([])

for vala in vals:
  for valb in vals:
    tests.add( (vala, valb) )
    tests.add( (valb, vala) )


for a,b in tests:
  print("math.pow(%f,%f)"%(a,b) )
  try:
    print("    %f "%math.pow(a,b))
  except:
    traceback.print_exc()

  print("__builtins__.pow(%f,%f)"%(a,b) )
  try:
    print("    %f "%__builtins__.pow(a,b))
  except:
    traceback.print_exc()

We can then notice some subtle differences. For example:

math.pow(0.000000,-2.200000)
    ValueError: math domain error

__builtins__.pow(0.000000,-2.200000)
    ZeroDivisionError: 0.0 cannot be raised to a negative power

There are other differences, and the test list above is not complete (no long numbers, no complex, etc...), but this will give us a pragmatic list of how the two functions behave differently. I would also recommend extending the above test to check for the type that each function returns. You could probably write something similar that creates a report of the differences between the two functions.

math.pow()

math.pow() handles its arguments very differently from the builtin ** or pow(). This comes at the cost of flexibility. Having a look at the source, we can see that the arguments to math.pow() are cast directly to doubles:

static PyObject *
math_pow(PyObject *self, PyObject *args)
{
    PyObject *ox, *oy;
    double r, x, y;
    int odd_y;

    if (! PyArg_UnpackTuple(args, "pow", 2, 2, &ox, &oy))
        return NULL;
    x = PyFloat_AsDouble(ox);
    y = PyFloat_AsDouble(oy);
/*...*/

The checks are then carried out against the doubles for validity, and then the result is passed to the underlying C math library.

builtin pow()

The built-in pow() (same as the ** operator) on the other hand behaves very differently, it actually uses the Objects's own implementation of the ** operator, which can be overridden by the end user if need be by replacing a number's __pow__(), __rpow__() or __ipow__(), method.

For built-in types, it is instructive to study the difference between the power function implemented for two numeric types, for example, floats, long and complex.

Overridding the default behaviour

Emulating numeric types is described here. essentially, if you are creating a new type for numbers with uncertainty, what you will have to do is provide the __pow__(), __rpow__() and possibly __ipow__() methods for your type. This will allow your numbers to be used with the operator:

class Uncertain:
  def __init__(self, x, delta=0):
    self.delta = delta
    self.x = x
  def __pow__(self, other):
    return Uncertain(
      self.x**other.x, 
      Uncertain._propagate_power(self, other)
    )
  @staticmethod
  def _propagate_power(A, B):
    return math.sqrt(
      ((B.x*(A.x**(B.x-1)))**2)*A.delta*A.delta +
      (((A.x**B.x)*math.log(B.x))**2)*B.delta*B.delta
    )

In order to override math.pow() you will have to monkey patch it to support your new type:

def new_pow(a,b):
    _a = Uncertain(a)
    _b = Uncertain(b)
    return _a ** _b

math.pow = new_pow

Note that for this to work you'll have to wrangle the Uncertain class to cope with an Uncertain instance as an input to __init__()