Best questions in February 2011

C# 5 async CTP: why is internal "state" set to 0 in generated code before EndAwait call?

92 votes

Yesterday I was giving a talk about the new C# "async" feature, in particular delving into what the generated code looked like, and the GetAwaiter() / BeginAwait() / EndAwait() calls.

We looked in some detail at the state machine generated by the C# compiler, and there were two aspects we couldn't understand:

  • Why the generated class contains a Dispose() method and a $__disposing variable, which never appear to be used (and the class doesn't implement IDisposable).
  • Why the internal state variable is set to 0 before any call to EndAwait(), when 0 normally appears to mean "this is the initial entry point".

I suspect the first point could be answered by doing something more interesting within the async method, although if anyone has any further information I'd be glad to hear it. This question is more about the second point, however.

Here's a very simple piece of sample code:

using System.Threading.Tasks;

class Test
{
    static async Task<int> Sum(Task<int> t1, Task<int> t2)
    {
        return await t1 + await t2;
    }
}

... and here's the code which gets generated for the MoveNext() method which implements the state machine. This is copied directly from Reflector - I haven't fixed up the unspeakable variable names:

public void MoveNext()
{
    try
    {
        this.$__doFinallyBodies = true;
        switch (this.<>1__state)
        {
            case 1:
                break;

            case 2:
                goto Label_00DA;

            case -1:
                return;

            default:
                this.<a1>t__$await2 = this.t1.GetAwaiter<int>();
                this.<>1__state = 1;
                this.$__doFinallyBodies = false;
                if (this.<a1>t__$await2.BeginAwait(this.MoveNextDelegate))
                {
                    return;
                }
                this.$__doFinallyBodies = true;
                break;
        }
        this.<>1__state = 0;
        this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();
        this.<a2>t__$await4 = this.t2.GetAwaiter<int>();
        this.<>1__state = 2;
        this.$__doFinallyBodies = false;
        if (this.<a2>t__$await4.BeginAwait(this.MoveNextDelegate))
        {
            return;
        }
        this.$__doFinallyBodies = true;
    Label_00DA:
        this.<>1__state = 0;
        this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();
        this.<>1__state = -1;
        this.$builder.SetResult(this.<1>t__$await1 + this.<2>t__$await3);
    }
    catch (Exception exception)
    {
        this.<>1__state = -1;
        this.$builder.SetException(exception);
    }
}

It's long, but the important lines for this question are these:

// End of awaiting t1
this.<>1__state = 0;
this.<1>t__$await1 = this.<a1>t__$await2.EndAwait();

// End of awaiting t2
this.<>1__state = 0;
this.<2>t__$await3 = this.<a2>t__$await4.EndAwait();

In both cases the state is changed again afterwards before it's next obviously observed... so why set it to 0 at all? If MoveNext() were called again at this point (either directly or via Dispose) it would effectively start the async method again, which would be wholly inappropriate as far as I can tell... if and MoveNext() isn't called, the change in state is irrelevant.

Is this simply a side-effect of the compiler reusing iterator block generation code for async, where it may have a more obvious explanation?

Important disclaimer

Obviously this is just a CTP compiler. I fully expect things to change before the final release - and possibly even before the next CTP release. This question is in no way trying to claim this is a flaw in the C# compiler or anything like that. I'm just trying to work out whether there's a subtle reason for this that I've missed :)

Okay, I finally have a real answer. I sort of worked it out on my own, but only after Lucian Wischik from the VB part of the team confirmed that there really is a good reason for it. Many thanks to him - and please visit his blog, which rocks.

The value 0 here is only special because it's not a valid state which you might be in just before the await in a normal case. In particular, it's not a state which the state machine may end up testing for elsewhere. I believe that using any non-positive value would work just as well: -1 isn't used for this as it's logically incorrect, as -1 normally means "finished". I could argue that we're giving an extra meaning to state 0 at the moment, but ultimately it doesn't really matter. The point of this question was finding out why the state is being set at all.

The value is relevant if the await ends in an exception which is caught. We can end up coming back to the same await statement again, but we mustn't be in the state meaning "I'm just about to come back from that await" as otherwise all kinds of code would be skipped. It's simplest to show this with an example. Note that I'm now using the second CTP, so the generated code is slightly different to that in the question.

Here's the async method:

static async Task<int> FooAsync()
{
    var t = new SimpleAwaitable();

    for (int i = 0; i < 3; i++)
    {
        try
        {
            Console.WriteLine("In Try");
            return await t;
        }                
        catch (Exception)
        {
            Console.WriteLine("Trying again...");
        }
    }
    return 0;
}

Conceptually, the SimpleAwaitable can be any awaitable - maybe a task, maybe something else. For the purposes of my tests, it always returns false for IsCompleted, and throws an exception in GetResult.

Here's the generated code for MoveNext:

public void MoveNext()
{
    int returnValue;
    try
    {
        int num3 = state;
        if (num3 == 1)
        {
            goto Label_ContinuationPoint;
        }
        if (state == -1)
        {
            return;
        }
        t = new SimpleAwaitable();
        i = 0;
      Label_ContinuationPoint:
        while (i < 3)
        {
            // Label_ContinuationPoint: should be here
            try
            {
                num3 = state;
                if (num3 != 1)
                {
                    Console.WriteLine("In Try");
                    awaiter = t.GetAwaiter();
                    if (!awaiter.IsCompleted)
                    {
                        state = 1;
                        awaiter.OnCompleted(MoveNextDelegate);
                        return;
                    }
                }
                else
                {
                    state = 0;
                }
                int result = awaiter.GetResult();
                awaiter = null;
                returnValue = result;
                goto Label_ReturnStatement;
            }
            catch (Exception)
            {
                Console.WriteLine("Trying again...");
            }
            i++;
        }
        returnValue = 0;
    }
    catch (Exception exception)
    {
        state = -1;
        Builder.SetException(exception);
        return;
    }
  Label_ReturnStatement:
    state = -1;
    Builder.SetResult(returnValue);
}

I had to move Label_ContinuationPoint to make it valid code - otherwise it's not in the scope of the goto statement - but that doesn't affect the answer.

Think about what happens when GetResult throws its exception. We'll go through the catch block, increment i, and then loop round again (assuming i is still less than 3). We're still in whatever state we were before the GetResult call... but when we get inside the try block we must print "In Try" and call GetAwaiter again... and we'll only do that if state isn't 1. Without the state = 0 assignment, it will use the existing awaiter and skip the Console.WriteLine call.

It's a fairly tortuous bit of code to work through, but that just goes to show the kinds of thing that the team has to think about. I'm glad I'm not responsible for implementing this :)

Top bad practices in PHP

75 votes

I've learned from Stack Overflow that some of the code I've been writing so far is considered "bad practice". Like using tons of global variables, create_function() etc. I'm curious to know more about this.

Post examples of bad practices in PHP and why. Please only post one bad practice per answer, this way the worst can get the most votes.

for($i = 0; $i < count($array); $i++)

God kills a kitten every time you call count() inside a loop.

Redefining NULL

62 votes

I'm writing C code for a system where address 0x0000 is valid and contains port I/O. Therefore, any possible bugs that access a NULL pointer will remain undetected and at the same time cause dangerous behaviour.

For this reason I wish to redefine NULL to be another address, to for example an address that isn't valid. If I accidentally access such an address I will get a hardware interrupt where I can handle the error. I happen to have access to stddef.h for this compiler, so I can actually alter the standard header and redefine NULL.

My question is: will this conflict with the C standard? As far as I can tell from 7.17 in the standard, the macro is implementation-defined. Is there anything elsewhere in the standard stating that NULL must be 0?

Another issue is that plenty of compilers perform static initialization by setting everything to zero, no matter the data type. Even though the standard says that the compiler should set integers to zero and pointers to NULL. If I would redefine NULL for my compiler, then I know that such static initialization will fail. Could I regard that as incorrect compiler behaviour even though I boldly altered compiler headers manually? Because I know for certain that this particular compiler does not access the NULL macro when doing static initialization.

The C standard does not require null pointers to be at the machine's address zero. HOWEVER, casting a 0 constant to a pointer value must result in a NULL pointer (§6.3.2.3/3), and evaluating the null pointer as a boolean must be false. This can be a bit awkward if you really do want a zero address, and NULL is not the zero address.

Nevertheless, with (heavy) modifications to the compiler and standard library, it's not impossible to have NULL be represented with an alternate bit pattern while still remaining strictly conformant to the standard library. It is not sufficient to simply change the definition of NULL itself however, as then NULL would evaluate to true.

Specifically, you would need to:

  • Arrange for literal zeros in assignments to pointers (or casts to pointers) to be converted into some other magic value such as -1.
  • Arrange for equality tests between pointers and a constant integer 0 to check for the magic value instead (§6.5.9/6)
  • Arrange for all contexts in which a pointer type is evaluated as a boolean to check for equality to the magic value instead of checking for zero. This follows from the equality testing semantics, but the compiler may implement it differently internally. See §6.5.13/3, §6.5.14/3, §6.5.15/4, §6.5.3.3/5, §6.8.4.1/2, §6.8.5/4
  • As caf pointed out, update the semantics for initialization of static objects (§6.7.8/10) and partial compound initializers (§6.7.8/21) to reflect the new null pointer representation.
  • Create an alternate way to access true address zero.

There are some things you do not have to handle. For example:

int x = 0;
void *p = (void*)x;

After this, p is NOT guaranteed to be a null pointer. Only constant assignments need be handled (this is a good approach for accessing true address zero). Likewise:

int x = 0;
assert(x == (void*)0); // CAN BE FALSE

Also:

void *p = NULL;
int x = (int)p;

x is not guaranteed to be 0.

In short, this very condition was apparently considered by the C language committee, and considerations made for those who would choose an alternate representation for NULL. All you have to do now is make major changes to your compiler, and hey presto you're done :)

As a side note, it may be possible to implement these changes with a source code transformation stage before the compiler proper. That is, instead of the normal flow of preprocessor -> compiler -> assembler -> linker, you'd add a preprocessor -> NULL transformation -> compiler -> assembler -> linker. Then you could do transformations like:

p = 0;
if (p) { ... }
/* becomes */
p = (void*)-1;
if ((void*)(p) != (void*)(-1)) { ... }

This would require a full C parser, as well as a type parser and analysis of typedefs and variable declarations to determine which identifiers correspond to pointers. However, by doing this you could avoid having to make changes to the code generation portions of the compiler proper. clang may be useful for implementing this - I understand it was designed with transformations like this in mind. You would still likely need to make changes to the standard library as well of course.

Why are C# 4 optional parameters defined on interface not enforced on implementing class?

53 votes

I noticed that with the optional parameters in C# 4 if you specify a parameter as optional on an interface you DON'T have to make that parameter optional on any implementing class:

public interface MyInterface
{
    void TestMethod(bool flag=false);
}

public class MyClass : MyInterface
{
    public void TestMethod(bool flag)
    {
        Console.WriteLine(flag);
    }
}

and therefore:

var obj = new MyClass();        
obj.TestMethod(); // compiler error

var obj2 = new MyClass() as MyInterface;
obj2.TestMethod(); // prints false

Does anyone know why optional parameters are designed to work this way?

On one hand I suppose the ability to override any default values specified on the interfaces is useful though to be honest I'm not sure if you should even be able to specify default values on the interface as that should be an implementation decision.

On the other hand, this disconnect means you can't always use the concrete class and the interface interchangeably. This of course, wouldn't be a problem if the default value is specified on the implementation, but then if you're exposing your concrete class as the interface (using some IOC framework to inject the concrete class for instance) then really there's no point having the default value as the caller will have to always provide it anyway.

UPDATE: This question was the subject of my blog on May 12th 2011. Thanks for the great question!

Suppose you have an interface as you describe, and a hundred classes that implement it. Then you decide to make one of the parameters of one of the interface's methods optional. Are you suggesting that the right thing to do is for the compiler to force the developer to find every implementation of that interface method, and make the parameter optional as well?

Suppose we did that. Now suppose the developer did not have the source code for the implementation:


// in metadata:
public class B 
{ 
    public void TestMethod(bool b) {}
}

// in source code
interface MyInterface 
{ 
    public void TestMethod(bool b = false); 
}
class D : B, MyInterface {}
// Legal because D's base class has a public method 
// that implements the interface method

How is the author of D supposed to make this work? Are they required in your world to call up the author of B on the phone and ask them to please ship them a new version of B that makes the method have an optional parameter?

That's not going to fly. What if two people call up the author of B, and one of them wants the default to be true and one of them wants it to be false? What if the author of B simply refuses to play along?

Perhaps in that case they would be required to say:

class D : B, MyInterface 
{
    public new void TestMethod(bool b = false)
    {
        base.TestMethod(b);
    }
}

The proposed feature seems to add a lot of inconvenience for the programmer with no corresponding increase in representative power. What's the compelling benefit of this feature which justifies the increased cost to the user?

Is C open source?

53 votes

This is probably a stupid question, but I've been wondering about this for a while. Does C (or any other low-level language, for that matter) even have source, or is the compiler the part that "does all the work", including parsing? If so, couldn't different compilers have different C dialects? Where does the stdlib factor into this? I would really like to know how this works.

The C language is not a piece of software but a defined standard, so one wouldn't say that it's open-source, but rather that it's an open standard.

There are a gazillion different compilers for C however, and many of those are indeed open-source. The most notable example is GCC's C compiler, which is all under the GNU General Public License (GPL), an open-source license.

There are more options. Watcom is open-source, for instance. There is no shortage of open-source C compilers, but without a doubt the most widespread one, at least in the non-Windows world, is GCC.

For Windows, your best bet is probably Watcom or GCC by using Cygwin or MinGW.

text-overflow:ellipsis in Firefox 4?

48 votes

The text-overflow:ellipsis; CSS property must be one of the few things that Microsoft has done right for the web.

All the other browsers now support it... except Firefox.

The Firefox developers have been arguing over it since 2005 but despite the obvious demand for it, they can't seem to actually bring themselves to implement it (even an experimental -moz- implementation would be sufficient).

A few years ago, someone worked out a way to hack Firefox 3 to make it support an ellipsis. The hack uses the -moz-binding feature to implement it using XUL. Quite a number of sites are now using this hack.

The bad news? Firefox 4 is removing the -moz-binding feature, which means this hack won't work any more.

So as soon as Firefox 4 is released (later this month, I hear), we're going to be back to the problem of having it not being able to support this feature.

So my question is: Is there any other way around this? (I'm trying to avoid falling back to a Javascript solution if at all possible)

[EDIT]
Lots of up-votes, so I'm obviously not the only one who wants to know, but I've got one answer so far which basically says 'use javascript'. I'm still hoping for a solution that will either not need JS at all, or at worst only use it as a fall-back where the CSS feature doesn't work. So I'm going to post a bounty on the question, on the off chance that someone, somewhere has found an answer.

Spudley, you could achieve the same thing by writing a small JavaScript code:

var limit = 50;
var ellipsis = "...";
if( $('#limitedWidthTextBox').val().length() > limit)
{
   var trimmedText = $('#limitedWidthTextBox').val().substring(0, limit-4); // -4 to include the ellipsis size and also since it is an index
   trimmedText += elipsis;
   $('#limitedWidthTextBox').val(trimmedText);
}

I understand that there should be some way that all browsers support this natively (without an javascript) but, thats what we have at this point.

EDIT Also, you could make it more neat by attaching a css class to all those fixed width field say fixWidth and then do something like the following:

$(document).ready(function() {
   $(.fixWidth).each(function() {
      var limit = 50;
      var ellipsis = "...";
      if( $('#limitedWidthTextBox').val().length() > limit)
      {
         var trimmedText = $('#limitedWidthTextBox').val().substring(0, limit-4); // -4 to include the ellipsis size and also since it is an index
         trimmedText += elipsis;
         $('#limitedWidthTextBox').val(trimmedText);
      }
   }
}

How can jquery deferred be used?

43 votes

jQuery 1.5 brings the new Deferred object and the attached methods .when, .Deferred and ._Deferred.

For those who havn't used .Deferred before I've annotated the source for it

What are the possible usages of these new methods, how do we go about fitting them into patterns?

I have already read the API and the source, so I know what it does. My question is how can we use these new features in everyday code?

I have a simple example of a buffer class that calls AJAX request in order. (Next one start after previous one finishes).

/* Class: Buffer
 *  methods: append
 *
 *  Constructor: takes a function which will be the task handler to be called
 *
 *  .append appends a task to the buffer. Buffer will only call a task when the 
 *  previous task has finished
 */
var Buffer = function(handler) {
    var tasks = [];
    // empty resolved deferred object
    var deferred = $.when();

    // handle the next object
    function handleNextTask() {
        // if the current deferred task has resolved and there are more tasks
        if (deferred.isResolved() && tasks.length > 0) {
            // grab a task
            var task = tasks.shift();
            // set the deferred to be deferred returned from the handler
            deferred = handler(task);
            // if its not a deferred object then set it to be an empty deferred object
            if (!(deferred && deferred.promise)) {
                deferred = $.when();
            }
            // if we have tasks left then handle the next one when the current one 
            // is done.
            if (tasks.length > 0) {
                deferred.done(handleNextTask);
            }
        }
    }

    // appends a task.
    this.append = function(task) {
        // add to the array
        tasks.push(task);
        // handle the next task
        handleNextTask();
    };
};

I'm looking for demonstrations and possible uses of .Deferred and .when.

It would also be lovely to see examples of ._Deferred.

Linking to the new jQuery.ajax source for examples is cheating.

Bounty: Show us what techniques are available when we abstract away whether an operation is synchronously or asynchronously done.

The best use case I can think of is in caching AJAX responses. Here's a modified example from Rebecca Murphey's intro post on the topic:

var cache = {};

function getData( val ){

    // return either the cached value or an
    // jqXHR object (which contains a promise)
    return cache[ val ] || $.ajax('/foo/', {
        data: { value: val },
        dataType: 'json',
        success: function( resp ){
            cache[ val ] = resp;
        }
    });
}

$.when(getData('foo')).then(function(resp){
    // do something with the response, which may
    // or may not have been retreived using an
    // XHR request.
});

Basically, if the value has already been requested once before it's returned immediately from the cache. Otherwise, an AJAX request fetches the data and adds it to the cache. The $.when/.then doesn't care about any of this; all you need to be concerned about is using the response, which is passed to the .then() handler in both cases.

Deferreds are perfect for when the task may or may not operate asynchronously, and you want to abstract that condition out of the code.

Another real world example using the $.when helper:

$.when( $.getJSON('/some/data/'), $.get('template.tpl') ).then(function( data, tmpl ){

    $( tmpl ) // create a jQuery object out of the template
        .tmpl( data) // compile it
        .appendTo( "#target" ); // insert it into the DOM

});

PHP 2-way encryption: I need to store passwords that can be retrieved

41 votes

I am creating an application that will store passwords, which the user can retrieve and see. The passwords are for a hardware device, so checking against hashes are out of the question.

What I need to know is:

  1. How do I encrypt and decrypt a password in PHP?

  2. What is the safest algorithm to encrypt the passwords with?

  3. Where do I store the private key?

  4. Instead of storing the private key, is it a good idea to require users to enter the private key any time they need a password decrypted? (Users of this application can be trusted)

  5. In what ways can the password be stolen and decrypted? What do I need to be aware of?

Personally, I would use mcrypt like others posted. But there is much more to note...

  1. How do I encrypt and decrypt a password in PHP?

    See below for a strong class that takes care of everything for you:

  2. What is the safest algorithm to encrypt the passwords with?

    safest? any of them. The safest method if you're going to encrypt is to protect against information disclosure vulnerabilities (XSS, remote inclusion, etc). If it gets out, the attacker can eventually crack the encryption (no encryption is 100% un-reversible without the key - As @NullUserException points out this is not entirely true. There are some encryption schemes that are impossible to crack such as OneTimePad).

  3. Where do I store the private key?

    What I would do is use 3 keys. One is user supplied, one is application specific and the other is user specific (like a salt). The application specific key can be stored anywhere (in a config file outside of the web-root, in an environmental variable, etc). The user specific one would be stored in a column in the db next to the encrypted password. The user supplied one would not be stored. Then, you'd do something like this:

    $key = $userKey . $serverKey . $userSuppliedKey;
    

    The benefit there, is that any 2 of the keys can be compromised without the data being compromised. If there's a SQL Injection attack, they can get the $userKey, but not the other 2. If there's a local server exploit, they can get $userKey and $serverKey, but not the third $userSuppliedKey. If they go beat the user with a wrench, they can get the $userSuppliedKey, but not the other 2 (but then again, if the user is beaten with a wrench, you're too late anyway).

  4. Instead of storing the private key, is it a good idea to require users to enter the private key any time they need a password decrypted? (Users of this application can be trusted)

    Absolutely. In fact, that's the only way I would do it. Otherwise you'd need to store an unencrypted version in a durable storage format (shared memory such as APC or memcached, or in a session file). That's exposing yourself to additional compromises. Never store the unencrypted version of the password in anything except a local variable.

  5. In what ways can the password be stolen and decrypted? What do I need to be aware of?

    Any form of compromise of your systems will let them view encrypted data. If they can inject code or get to your filesystem, they can view decrypted data (since they can edit the files that decrypt the data). Any form of Replay or MITM attack will also give them full access to the keys involved. Sniffing the raw HTTP traffic will also give them the keys.

    Use SSL for all traffic. And make sure nothing on the server has any kind of vulnerabilities (CSRF, XSS, SQL Injection, Privilege Escalation, Remote Code Execution, etc).

Edit: Here's a PHP class implementation of a strong encryption method:

/**
 * A class to handle secure encryption and decryption of arbitrary data
 *
 * Note that this is not just straight encryption.  It also has a few other
 *  features in it to make the encrypted data far more secure.  Note that any
 *  other implementations used to decrypt data will have to do the same exact
 *  operations.  
 *
 * Security Benefits:
 *
 * - Uses Key stretching
 * - Hides the Initialization Vector
 * - Does HMAC verification of source data
 *
 */
class Encryption {

    /**
     * @var string $cipher The mcrypt cipher to use for this instance
     */
    protected $cipher = '';

    /**
     * @var int $mode The mcrypt cipher mode to use
     */
    protected $mode = '';

    /**
     * Constructor!
     *
     * @param string $cipher The MCRYPT_* cypher to use for this instance
     * @param int    $mode   The MCRYPT_MODE_* mode to use for this instance
     */
    public function __construct($cipher, $mode) {
        $this->cipher = $cipher;
        $this->mode = $mode;
    }

    /**
     * Decrypt the data with the provided key
     *
     * @param string $data The encrypted datat to decrypt
     * @param string $key  The key to use for decryption
     * 
     * @returns string|false The returned string if decryption is successful
     *                           false if it is not
     */
    public function decrypt($data, $key) {
        $key = $this->stretch($key);
        $iv = $this->getIv($data, $key);
        if ($iv === false) {
            return false; //Invalid IV, so we can't continue
        }
        $de = mcrypt_decrypt($this->cipher, $key, $data, $this->mode, $iv);
        if (!$de || strpos($de, ':') === false) return false;

        list ($hmac, $data) = explode(':', $de, 2);
        $data = rtrim($data, "\0");

        if ($hmac != hash_hmac('sha1', $data, $key)) {
            return false;
        }
        return $data;
    }

    /**
     * Encrypt the supplied data using the supplied key
     * 
     * @param string $data The data to encrypt
     * @param string $key  The key to encrypt with
     *
     * @returns string The encrypted data
     */
    public function encrypt($data, $key) {
        $key = $this->stretch($key);
        $data = hash_hmac('sha1', $data, $key) . ':' . $data;

        $iv = $this->generateIv();
        $enc = mcrypt_encrypt($this->cipher, $key, $data, $this->mode, $iv);

        return $this->storeIv($enc, $iv, $key);
    }

    /**
     * Generate an Initialization Vector based upon the class's cypher and mode
     *
     * @returns string The initialization vector
     */
    protected function generateIv() {
        $size = mcrypt_get_iv_size($this->cipher, $this->mode);
        return mcrypt_create_iv($size, MCRYPT_RAND);
    }

    /**
     * Extract a stored initialization vector from an encrypted string
     *
     * This will shorten the $data pramater by the removed vector length.
     * 
     * @see Encryption::storeIv()
     *
     * @param string &$data The encrypted string to process.
     * @param string $key   The supplied key to extract the IV with
     *
     * @returns string The initialization vector that was stored
     */
    protected function getIv(&$data, $key) {
        $size = mcrypt_get_iv_size($this->cipher, $this->mode);
        $iv = '';
        for ($i = $size - 1; $i >= 0; $i--) {
            $pos = hexdec($key[$i]);
            $iv = substr($data, $pos, 1) . $iv;
            $data = substr_replace($data, '', $pos, 1);
        }
        if (strlen($iv) != $size) {
            return false;
        }
        return $iv;
    }

    /**
     * Store the Initialization Vector inside the encrypted string.
     *
     * We will need the IV later to decrypt the data, so we need to
     * make it available.  We don't want to just append it, since that
     * could open MITM style attacks on the data.  So we'll hide it 
     * using the key to determine exactly how to hide it.  That way,
     * without knowing the key, it should be impossible to get the IV.
     *
     * @param string $data The data to hide the IV within
     * @param string $iv   The IV to hide
     * @param string $key  The key to use to hide the IV with
     *
     * @returns string The $data parameter with the hidden IV
     */
    protected function storeIv($data, $iv, $key) {
        for ($i = 0; $i < strlen($iv); $i++) {
            $offset = hexdec($key[$i]);
            $data = substr_replace($data, $iv[$i], $offset, 0);
        }
        return $data;
    }

    /**
     * Stretch the key using a simple hmac based stretching algorythm
     *
     * We want to use sha1 here over something stronger since Blowfish
     * expects a key between 4 and 56 bytes.  Sha1 produces a 40 byte
     * hash, so it should be good for these purposes.  This also allows
     * an arbitrary key of any length to be used for encryption.
     *
     * Another benefit of streching the kye is that it actually slows
     * down any potential brute force attacks. 
     *
     * We use 5000 runs for the stretching since it's a good balance
     * between brute force protection and system load.  We could increase
     * this if we were paranoid, but it shouldn't be necessary.
     *
     * @see http://en.wikipedia.org/wiki/Key_stretching
     *
     * @param string $key The key to stretch
     *
     * @returns string A 40 character hex string with the stretched key
     */
    protected function stretch($key) {
        $hash = sha1($key);
        $runs = 0;
        do {
            $hash = hash_hmac('sha1', $hash, $key);
        } while ($runs++ < 5000);
        return $hash;
    }

}

Usage:

$e = new Encryption(MCRYPT_BlOWFISH, MCRYPT_MODE_CBC);
$encryptedData = $e->encrypt($data, 'key');

Then, to decrypt:

$e2 = new Encryption(MCRYPT_BlOWFISH, MCRYPT_MODE_CBC);
$data = $e2->decrypt($encryptedData, 'key');

Note that I used $e2 the second time to show you different instances will still properly decrypt the data.

Now, how does it work/why use it over another solution:

  1. Keys

    • The keys are not directly used. Instead, the key is stretched by a 5000 round hmac cycle.

    • The key is used to store the Initialization Vector. That way, without the proper key, you cannot even get the IV. So MITM attacks should be averted.

  2. Data Integrity

    • All stored data is hmac'd to verify that it has not been tampered with prior to encryption. When it's decrypted, it's hmac'ed again (against the key) to determine if it was tampered with and that the correct result was returned.
  3. Encryption:

    • It uses mcrypt to actually perform the encryption. I would suggest using MCRYPT_BLOWFISH cypher and MCRYPT_MODE_CBC for the mode. It's strong enough, and still fairly fast (an encryption and decryption cycle takes about 1/2 second on my machine).

Now, as to point 3 from the first list, what that would give you is a function like this:

function makeKey($userKey, $serverKey, $userSuppliedKey) {
    $key = hash_hmac('sha512', $userKey, $serverKey);
    $key = hash_hmac('sha512', $key, $userSuppliedKey);
    return $key;
}

You could stretch it in the makeKey() function, but since it's going to be stretched later, there's not really a huge point to doing so.

As far as the storage size, it depends on the plain text. Blowfish uses a 8 byte block size, so you'll have:

  • 8 bytes for the initialization vector
  • 40 bytes for the hmac
  • 1 byte for the hmac separator :
  • data length
  • Padding so that (41 + data length) % 8 == 0

So for a 16 character data source, there will be 41 + 16 (57) characters of data to be encrypted. So that means the actual encrypted data size is 64 bytes due to padding. Then add the 8 bytes for the IV and the total stored size is 72 bytes. So there's at best a 49 character overhead, and at worst a 56 character overhead...

I hope that helps...

Why does .NET 4.0 sort this array differently than .NET 3.5?

40 votes

This stackoverflow question raised an interesting question about sorting double arrays with NaN values. The OP posted the following code:

static void Main(string[] args)
{
    double[] someArray = { 4.0, 2.0, double.NaN, 1.0, 5.0, 3.0, double.NaN, 10.0, 9.0, 8.0 };

    foreach (double db in someArray)
    {
        Console.WriteLine(db);
    }

    Array.Sort(someArray);
    Console.WriteLine("\n\n");
    foreach (double db in someArray)
    {
        Console.WriteLine(db);
    }

    Console.ReadLine();
}

When you run this under the .NET 3.5 framework, the array is sorted as follows:

1,4,NaN,2,3,5,8,9,10,NaN

When you run it under .NET 4.0, the array is sorted somewhat more logically:

NaN,NaN,1,2,3,4,5,8,9,10

I can understand why it would sort weirdly in .NET 3.5 (because NaN is not equal to, less than, or greater than, anything). I can also understand why it would sort the way it does in .NET 4.0. My question is, why did this change from 3.5 to 4.0? And where is the Microsoft documentation for this change?

It's a bug fix. The feedback report with the bug details is here.

For statement in C

38 votes

Possible Duplicate: Help with C puzzle

This morning i had a job interview and they gave me this problem:

You should change one character to print "*" 42 times. You can replace, add or remove ONLY one character.

Still i cant figured it, i've tried over and over again.

#include <stdio.h>
main(){
    int i,n = 42;

   for(i = 0; i < n; i--){
       printf("*");
   }
}

#include <stdio.h>
main(){
    int i,n = 42;

   for(i = 0; -i < n; i--){    //added a minus here
       printf("*");
   }
}

But to be honest, I think that this is a silly interview question.

How does today's (Jules Verne) Google's Doodle work?

38 votes

I am sure many of you have already checked on today's (2011-02-08) Google's doodle (link to article on CNN if doodle changes). It was awesome and I tried figuring out about its implementation in Firebug, some things I found out was that it has about 3 layers of images (for 3D effect) which are pan and rotated (-moz-transform:rotate()), etc. What I didn't found about were (and my questions):

  • How it hid our mouse cursor when you hold on the handle, I know it's cursor:none in CSS but I still saw this CSS for the handle:

    #verne-drag {
        background: url("logos/2011/verne-hp.png") no-repeat scroll 1000px 1000px transparent;
        cursor: pointer;/*here its pointer not none*/
        height: 150px;
        left: 565px;
        position: absolute;
        top: 15px;
        width: 150px;
        z-index: 700;
    }
    
  • How it allowed dragging of handle so and swapping between 9 images according to position at the same time.

  • Shed some light on its Javascript (I didn't find one in firebug...only that usual script for search, and this little code which just calculates mod (what about possible code other tasks)

    google.doodle.mod = function (a, n) {return a % n;};
    2 /* !eval(new String("google.doodle.mod = function(a,n);)) */
    

So simply point me out how its implemented (I have mentioned 3 but include other points which might not be that obvious).


Image Resources for reference:

The Sprite for resources

Link to other 3 images (They were so long that was not feasible to show here)

Big Fishes, shark
Giant Tail
Under water fauna
Sky


Update

Myles Gray here has made a great contribution by re-implementing (and making it more readable) the Javascript Code, CSS and HTML to show us how Doodle was implemented.

Here is the link for you all to check it out:

http://jsfiddle.net/Mutant_Tractor/jRkND/16/ <-- Latest Revision

This is the best I could do with making all of their code readable:

http://jsfiddle.net/Mutant_Tractor/jRkND/16/

Optimising Python dictionary access code

37 votes

Question:

I've profiled my Python program to death, and there is one function that is slowing everything down. It uses Python dictionaries heavily, so I may not have used them in the best way. If I can't get it running faster, I will have to re-write it in C++, so is there anyone who can help me optimise it in Python?

I hope I've given the right sort of explanation, and that you can make some sense of my code! Thanks in advance for any help.

My code:

This is the offending function, profiled using line_profiler and kernprof. I'm running Python 2.7

I'm particularly puzzled by things like lines 363, 389 and 405, where an if statement with a comparison of two variables seems to take an inordinate amount of time.

I've considered using NumPy (as it does sparse matrices) but I don't think it's appropriate because: (1) I'm not indexing my matrix using integers (I'm using object instances); and (2) I'm not storing simple data types in the matrix (I'm storing tuples of a float and an object instance). But I'm willing to be persuaded about NumPy. If anyone knows about NumPy's sparse matrix performance vs. Python's hash tables, I'd be interested.

Sorry I haven't given a simple example that you can run, but this function is tied up in a much larger project and I couldn't work out how to set up a simple example to test it, without giving you half of my code base!

Timer unit: 3.33366e-10 s
File: routing_distances.py
Function: propagate_distances_node at line 328
Total time: 807.234 s

Line #   Hits         Time  Per Hit   % Time  Line Contents
328                                               @profile
329                                               def propagate_distances_node(self, node_a, cutoff_distance=200):
330                                                       
331                                                   # a makes sure its immediate neighbours are correctly in its distance table
332                                                   # because its immediate neighbours may change as binds/folding change
333    737753   3733642341   5060.8      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
334    512120   2077788924   4057.2      0.1              use_neighbour_link = False
335                                                       
336    512120   2465798454   4814.9      0.1              if(node_b not in self.node_distances[node_a]): # a doesn't know distance to b
337     15857     66075687   4167.0      0.0                  use_neighbour_link = True
338                                                       else: # a does know distance to b
339    496263   2390534838   4817.1      0.1                  (node_distance_b_a, next_node) = self.node_distances[node_a][node_b]
340    496263   2058112872   4147.2      0.1                  if(node_distance_b_a > neighbour_distance_b_a): # neighbour distance is shorter
341        81       331794   4096.2      0.0                      use_neighbour_link = True
342    496182   2665644192   5372.3      0.1                  elif((None == next_node) and (float('+inf') == neighbour_distance_b_a)): # direct route that has just broken
343        75       313623   4181.6      0.0                      use_neighbour_link = True
344                                                               
345    512120   1992514932   3890.7      0.1              if(use_neighbour_link):
346     16013     78149007   4880.3      0.0                  self.node_distances[node_a][node_b] = (neighbour_distance_b_a, None)
347     16013     83489949   5213.9      0.0                  self.nodes_changed.add(node_a)
348                                                           
349                                                           ## Affinity distances update
350     16013     86020794   5371.9      0.0                  if((node_a.type == Atom.BINDING_SITE) and (node_b.type == Atom.BINDING_SITE)):
351       164      3950487  24088.3      0.0                      self.add_affinityDistance(node_a, node_b, self.chemistry.affinity(node_a.data, node_b.data))     
352                                                   
353                                                   # a sends its table to all its immediate neighbours
354    737753   3549685140   4811.5      0.1          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].iteritems():
355    512120   2129343210   4157.9      0.1              node_b_changed = False
356                                               
357                                                       # b integrates a's distance table with its own
358    512120   2203821081   4303.3      0.1              node_b_chemical = node_b.chemical
359    512120   2409257898   4704.5      0.1              node_b_distances = node_b_chemical.node_distances[node_b]
360                                                       
361                                                       # For all b's routes (to c) that go to a first, update their distances
362  41756882 183992040153   4406.3      7.6              for node_c, (distance_b_c, node_after_b) in node_b_distances.iteritems(): # Think it's ok to modify items while iterating over them (just not insert/delete) (seems to work ok)
363  41244762 172425596985   4180.5      7.1                  if(node_after_b == node_a):
364                                                               
365  16673654  64255631616   3853.7      2.7                      try:
366  16673654  88781802534   5324.7      3.7                          distance_b_a_c = neighbour_distance_b_a + self.node_distances[node_a][node_c][0]
367    187083    929898684   4970.5      0.0                      except KeyError:
368    187083   1056787479   5648.8      0.0                          distance_b_a_c = float('+inf')
369                                                                   
370  16673654  69374705256   4160.7      2.9                      if(distance_b_c != distance_b_a_c): # a's distance to c has changed
371    710083   3136751361   4417.4      0.1                          node_b_distances[node_c] = (distance_b_a_c, node_a)
372    710083   2848845276   4012.0      0.1                          node_b_changed = True
373                                                                   
374                                                                   ## Affinity distances update
375    710083   3484577241   4907.3      0.1                          if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
376     99592   1591029009  15975.5      0.1                              node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
377                                                                   
378                                                               # If distance got longer, then ask b's neighbours to update
379                                                               ## TODO: document this!
380  16673654  70998570837   4258.1      2.9                      if(distance_b_a_c > distance_b_c):
381                                                                   #for (node, neighbour_distance) in node_b_chemical.neighbours[node_b].iteritems():
382   1702852   7413182064   4353.4      0.3                          for node in node_b_chemical.neighbours[node_b]:
383   1204903   5912053272   4906.7      0.2                              node.chemical.nodes_changed.add(node)
384                                                       
385                                                       # Look for routes from a to c that are quicker than ones b knows already
386  42076729 184216680432   4378.1      7.6              for node_c, (distance_a_c, node_after_a) in self.node_distances[node_a].iteritems():
387                                                           
388  41564609 171150289218   4117.7      7.1                  node_b_update = False
389  41564609 172040284089   4139.1      7.1                  if(node_c == node_b): # a-b path
390    512120   2040112548   3983.7      0.1                      pass
391  41052489 169406668962   4126.6      7.0                  elif(node_after_a == node_b): # a-b-a-b path
392  16251407  63918804600   3933.1      2.6                      pass
393  24801082 101577038778   4095.7      4.2                  elif(node_c in node_b_distances): # b can already get to c
394  24004846 103404357180   4307.6      4.3                      (distance_b_c, node_after_b) = node_b_distances[node_c]
395  24004846 102717271836   4279.0      4.2                      if(node_after_b != node_a): # b doesn't already go to a first
396   7518275  31858204500   4237.4      1.3                          distance_b_a_c = neighbour_distance_b_a + distance_a_c
397   7518275  33470022717   4451.8      1.4                          if(distance_b_a_c < distance_b_c): # quicker to go via a
398    225357    956440656   4244.1      0.0                              node_b_update = True
399                                                           else: # b can't already get to c
400    796236   3415455549   4289.5      0.1                      distance_b_a_c = neighbour_distance_b_a + distance_a_c
401    796236   3412145520   4285.3      0.1                      if(distance_b_a_c < cutoff_distance): # not too for to go
402    593352   2514800052   4238.3      0.1                          node_b_update = True
403                                                                   
404                                                           ## Affinity distances update
405  41564609 164585250189   3959.7      6.8                  if node_b_update:
406    818709   3933555120   4804.6      0.2                      node_b_distances[node_c] = (distance_b_a_c, node_a)
407    818709   4151464335   5070.7      0.2                      if((node_b.type == Atom.BINDING_SITE) and (node_c.type == Atom.BINDING_SITE)):
408    104293   1704446289  16342.9      0.1                          node_b_chemical.add_affinityDistance(node_b, node_c, self.chemistry.affinity(node_b.data, node_c.data))
409    818709   3557529531   4345.3      0.1                      node_b_changed = True
410                                                       
411                                                       # If any of node b's rows have exceeded the cutoff distance, then remove them
412  42350234 197075504439   4653.5      8.1              for node_c, (distance_b_c, node_after_b) in node_b_distances.items(): # Can't use iteritems() here, as deleting from the dictionary
413  41838114 180297579789   4309.4      7.4                  if(distance_b_c > cutoff_distance):
414    206296    894881754   4337.9      0.0                      del node_b_distances[node_c]
415    206296    860508045   4171.2      0.0                      node_b_changed = True
416                                                               
417                                                               ## Affinity distances update
418    206296   4698692217  22776.5      0.2                      node_b_chemical.del_affinityDistance(node_b, node_c)
419                                                       
420                                                       # If we've modified node_b's distance table, tell its chemical to update accordingly
421    512120   2130466347   4160.1      0.1              if(node_b_changed):
422    217858   1201064454   5513.1      0.0                  node_b_chemical.nodes_changed.add(node_b)
423                                                   
424                                                   # Remove any neighbours that have infinite distance (have just unbound)
425                                                   ## TODO: not sure what difference it makes to do this here rather than above (after updating self.node_distances for neighbours)
426                                                   ##       but doing it above seems to break the walker's movement
427    737753   3830386968   5192.0      0.2          for (node_b, neighbour_distance_b_a) in self.neighbours[node_a].items(): # Can't use iteritems() here, as deleting from the dictionary
428    512120   2249770068   4393.1      0.1              if(neighbour_distance_b_a > cutoff_distance):
429       150       747747   4985.0      0.0                  del self.neighbours[node_a][node_b]
430                                                           
431                                                           ## Affinity distances update
432       150      2148813  14325.4      0.0                  self.del_affinityDistance(node_a, node_b)

Explanation of my code:

This function maintains a sparse distance matrix representing the network distance (sum of edge weights on the shortest path) between nodes in a (very big) network. To work with the complete table and use the Floyd-Warshall algorithm would be very slow. (I tried this first, and it was orders of magnitude slower than the current version.) So my code uses a sparse matrix to represent a thresholded version of the full distance matrix (any paths with a distance greater than 200 units are ignored). The network topolgy changes over time, so this distance matrix needs updating over time. To do this, I am using a rough implementation of a distance-vector routing protocol: each node in the network knows the distance to each other node and the next node on the path. When a topology change happens, the node(s) associated with this change update their distance table(s) accordingly, and tell their immediate neighbours. The information spreads through the network by nodes sending their distance tables to their neighbours, who update their distance tables and spread them to their neighbours.

There is an object representing the distance matrix: self.node_distances. This is a dictionary mapping nodes to routing tables. A node is an object that I've defined. A routing table is a dictionary mapping nodes to tuples of (distance, next_node). Distance is the graph distance from node_a to node_b, and next_node is the neighbour of node_a that you must go to first, on the path between node_a and node_b. A next_node of None indicates that node_a and node_b are graph neighbours. For example, a sample of a distance matrix could be:

self.node_distances = { node_1 : { node_2 : (2.0, None),
                                   node_3 : (5.7, node_2),
                                   node_5 : (22.9, node_2) },
                        node_2 : { node_1 : (2.0, None),
                                   node_3 : (3.7, None),
                                   node_5 : (20.9, node_7)},
                        ...etc...

Because of topology changes, two nodes that were far apart (or not connected at all) can become close. When this happens, entries are added to this matrix. Because of the thresholding, two nodes can become too far apart to care about. When this happens, entries are deleted from this matrix.

The self.neighbours matrix is similar to self.node_distances, but contains information about the direct links (edges) in the network. self.neighbours is continually being modified externally to this function, by the chemical reaction. This is where the network topology changes come from.

The actual function that I'm having problems with: propagate_distances_node() performs one step of the distance-vector routing protocol. Given a node, node_a, the function makes sure that node_a's neighbours are correctly in the distance matrix (topology changes). The function then sends node_a's routing table to all of node_a's immediate neighbours in the network. It integrates node_a's routing table with each neighbour's own routing table.

In the rest of my program, the propagate_distances_node() function is called repeatedly, until the distance matrix converges. A set, self.nodes_changed, is maintained, of the nodes that have changed their routing table since they were last updated. On every iteration of my algorithm, a random subset of these nodes are chosen and propagate_distances_node() is called on them. This means the nodes spread their routing tables asynchronously and stochastically. This algorithm converges on the true distance matrix when the set self.nodes_changed becomes empty.

The "affinity distances" parts (add_affinityDistance and del_affinityDistance) are a cache of a (small) sub-matrix of the distance matrix, that is used by a different part of the program.

The reason I'm doing this is that I'm simulating computational analogues of chemicals participating in reactions, as part of my PhD. A "chemical" is a graph of "atoms" (nodes in the graph). Two chemicals binding together is simulated as their two graphs being joined by new edges. A chemical reaction happens (by a complicated process that isn't relevant here), changing the topology of the graph. But what happens in the reaction depends on how far apart the different atoms are that make up the chemicals. So for each atom in the simulation, I want to know which other atoms it is close to. A sparse, thresholded distance matrix is the most efficient way to store this information. Since the topology of the network changes as the reaction happens, I need to update the matrix. A distance-vector routing protocol is the fastest way I could come up with of doing this. I don't need a more compliacted routing protocol, because things like routing loops don't happen in my particular application (because of how my chemicals are structured). The reason I'm doing it stochastically is so that I can interleve the chemical reaction processes with the distance spreading, and simulate a chemical gradually changing shape over time as the reaction happens (rather than changing shape instantly).

The self in this function is an object representing a chemical. The nodes in self.node_distances.keys() are the atoms that make up the chemical. The nodes in self.node_distances[node_x].keys() are nodes from the chemical and potentially nodes from any chemicals that the chemical is bound to (and reacting with).

Update:

I tried replacing every instance of node_x == node_y with node_x is node_y (as per @Sven Marnach's comment), but it slowed things down! (I wasn't expecting that!) My original profile took 807.234s to run, but with this modification it increased to 895.895s. Sorry, I was doing the profiling wrong! I was using line_by_line, which (on my code) had far too much variance (that difference of ~90 seconds was all in the noise). When profiling it properly, is is detinitely faster than ==. Using CProfile, my code with == took 34.394s, but with is, it took 33.535s (which I can confirm is out of the noise).

Update: Existing libraries

I'm unsure as to whether there will be an existing library that can do what I want, since my requirements are unusual: I need to compute the shortest-path lengths between all pairs of nodes in a weighted, undirected graph. I only care about path lengths that are lower than a threshold value. After computing the path lengths, I make a small change to the network topology (adding or removing an edge), and then I want to re-compute the path lengths. My graphs are huge compared to the threshold value (from a given node, most of the graph is further away than the threshold), and so the topology changes don't affect most of the shortest-path lengths. This is why I am using the routing algorithm: because this spreads topology-change information through the graph structure, so I can stop spreading it when it's gone further than the threshold. i.e., I don't need to re-compute all the paths each time. I can use the previous path information (from before the topology change) to speed up the calculation. This is why I think my algorithm will be faster than any library implementations of shortest-path algorithms. I've never seen routing algorithms used outside of actually routing packets through physical networks (but if anyone has, then I'd be interested).

NetworkX was suggested by @Thomas K. It has lots of algorithms for calculating shortest paths. It has an algorithm for computing the all-pairs shortest path lengths with a cutoff (which is what I want), but it only works on unweighted graphs (mine are weighted). Unfortunately, its algorithms for weighted graphs don't allow the use of a cutoff (which might make them slow for my graphs). And none of its algorithms appear to support the use of pre-calculated paths on a very similar network (i.e. the routing stuff).

igraph is another graph library that I know of, but looking at its documentation, I can't find anything about shortest-paths. But I might have missed it - its documentation doesn't seem very comprehensive.

NumPy might be possible, thanks to @9000's comment. I can store my sparse matrix in a NumPy array if I assign a unique integer to each instance of my nodes. I can then index a NumPy array with integers instead of node instances. I will also need two NumPy arrays: one for the distances and one for the "next_node" references. This might be faster than using Python dictionaries (I don't know yet).

Does anyone know of any other libraries that might be useful?

Update: Memory usage

I'm running Windows (XP), so here is some info about memory usage, from Process Explorer. The CPU usage is at 50% because I have a dual-core machine.

global memory usage my program's memory usage

My program doesn't run out of RAM and start hitting the swap. You can see that from the numbers, and from the IO graph not having any activity. The spikes on the IO graph are where the program prints to the screen to say how it's doing.

However, my program does keep using up more and more RAM over time, which is probably not a good thing (but it's not using up much RAM overall, which is why I didn't notice the increase until now).

And the distance between the spikes on the IO graph increases over time. This is bad - my program prints to the screen every 100,000 iterations, so that means that each iteration is taking longer to execute as time goes on... I've confirmed this by doing a long run of my program and measuring the time between print statements (the time between each 10,000 iterations of the program). This should be constant, but as you can see from the graph, it increases linearly... so something's up there. (The noise on this graph is because my program uses lots of random numbers, so the time for each iteration varies.)

lag between print statements increasing over time

After my program's been running for a long time, the memory usage looks like this (so it's definitely not running out of RAM):

global memory usage - after a long run my program's memory usage - after a long run

node_after_b == node_a will try to call node_after_b.__eq__(node_a):

>>> class B(object):
...     def __eq__(self, other):
...         print "B.__eq__()"
...         return False
... 
>>> class A(object):
...     def __eq__(self, other):
...         print "A.__eq__()"
...         return False
... 
>>> a = A()
>>> b = B()
>>> a == b
A.__eq__()
False
>>> b == a
B.__eq__()
False
>>> 

Try to override Node.__eq__() with an optimized version before resorting to C.

UPDATE

I made this little experiment (python 2.6.6):

#!/usr/bin/env python
# test.py
class A(object):
    def __init__(self, id):
        self.id = id

class B(A):
    def __eq__(self, other):
        return self.id == other.id

@profile
def main():
    list_a = []
    list_b = []
    for x in range(100000):
        list_a.append(A(x))
        list_b.append(B(x))

    ob_a = A(1)
    ob_b = B(1)
    for ob in list_a:
        if ob == ob_a:
            x = True
        if ob is ob_a:
            x = True
        if ob.id == ob_a.id:
            x = True
        if ob.id == 1:
            x = True
    for ob in list_b:
        if ob == ob_b:
            x = True
        if ob is ob_b:
            x = True
        if ob.id == ob_b.id:
            x = True
        if ob.id == 1:
            x = True

if __name__ == '__main__':
    main()

Results:

Timer unit: 1e-06 s

File: test.py Function: main at line 10 Total time: 5.52964 s

Line #      Hits         Time  Per Hit % Time  Line Contents
==============================================================
    10                                           @profile
    11                                           def main():
    12         1            5      5.0      0.0      list_a = []
    13         1            3      3.0      0.0      list_b = []
    14    100001       360677      3.6      6.5      for x in range(100000):
    15    100000       763593      7.6     13.8          list_a.append(A(x))
    16    100000       924822      9.2     16.7          list_b.append(B(x))
    17
    18         1           14     14.0      0.0      ob_a = A(1)
    19         1            5      5.0      0.0      ob_b = B(1)
    20    100001       500454      5.0      9.1      for ob in list_a:
    21    100000       267252      2.7      4.8          if ob == ob_a:
    22                                                       x = True
    23    100000       259075      2.6      4.7          if ob is ob_a:
    24                                                       x = True
    25    100000       539683      5.4      9.8          if ob.id == ob_a.id:
    26         1            3      3.0      0.0              x = True
    27    100000       271519      2.7      4.9          if ob.id == 1:
    28         1            3      3.0      0.0              x = True
    29    100001       296736      3.0      5.4      for ob in list_b:
    30    100000       472204      4.7      8.5          if ob == ob_b:
    31         1            4      4.0      0.0              x = True
    32    100000       283165      2.8      5.1          if ob is ob_b:
    33                                                       x = True
    34    100000       298839      3.0      5.4          if ob.id == ob_b.id:
    35         1            3      3.0      0.0              x = True
    36    100000       291576      2.9      5.3          if ob.id == 1:
    37         1            3      3.0      0.0              x = True

I was very surprised:

  • "dot" access (ob.property) seems to be very expensive (line 25 versus line 27).
  • there was not much difference between is and '==', at least for simple objects

Then I tried with more complex objects and results are consistent with the first experiment.

Are you swapping a lot? If your dataset is so large that it does not fit available RAM, I guess you may experience some kind of I/O contention related to virtual memory fetches.

Are you running Linux? If so, could you post a vmstat of your machine while running your program? Send us the output of something like:

vmstat 10 100

Good luck!

UPDATE (from comments by OP)

I sugested playing with sys.setcheckinterval and enable/disable the GC. The rationale is that for this particular case (huge number of instances) the default GC reference count check is somewhat expensive and its default interval is away too often.

Yes, I had previously played with sys.setcheckinterval. I changed it to 1000 (from its default of 100), but it didn't do any measurable difference. Disabling Garbage Collection has helped - thanks. This has been the biggest speedup so far - saving about 20% (171 minutes for the whole run, down to 135 minutes) - I'm not sure what the error bars are on that, but it must be a statistically significant increase. – Adam Nellis Feb 9 at 15:10

My guess:

I think the Python GC is based on reference count. From time to time it will check the reference count for every instance; since you are traversing these huge in-memory structures, in your particular case the GC default frequency (1000 cycles?) is away too often - a huge waste. – Yours Truly Feb 10 at 2:06

How fast is D compared to C++ ?

36 votes

I like some features of D, but would be interested if they come with a runtime penalty?

To compare, I implemented a simple program that computes scalar products of many short vectors both in C++ and in D. The result is surprising:

  • D: 18.9 s [see below for final runtime]
  • C++: 3.8 s

Is C++ really almost five times as fast or did I make a mistake in the D program (it is my first one, please forgive me)?


Summary of answers: D is as fast as C++ (for numerical computations) if

  1. one uses the correct optimization options (-O -release -inline -m64)
  2. uses gdc/gdmd instead of dmd
  3. uses correct D calling conventions (in vector_t x instead of const ref vector_t x in the arguments to function scalar_product below; --- while replacing the for loops by foreach looks nicer, but does not affect runtimes).

Here all the figures on my machine:

  • D (const ref arguments, dmd compiler, correct optimization): 9.6s
  • D (in arguments, dmd compiler, correct optimization): 6.1s
  • D (in arguments, gdmd compiler, correct optimization): 3.9s
  • C++: 3.8s

I used gdc/gdmd 4.3. The combination D(const ref arguments, gdmd compiler) did not compile and thus has not been tested. Aside: gdc/gdmd did not have the datetime library yet and compiling it from sources failed for me, so it seems not as easy to handle as dmd right now.

Thanks CyberShadow and GMan and all others!


Continuation of original question:

I compiled C++ with g++ -O3 (gcc-snapshot 2011-02-19) and D with dmd -O (dmd 2.052) on a moderate recent linux desktop. The results are reproducible over several runs and standard deviations negligible.

Here the C++ program:

#include <iostream>
#include <random>
#include <chrono>
#include <string>

#include <vector>
#include <array>

typedef std::chrono::duration<long, std::ratio<1, 1000>> millisecs;
template <typename _T>
long time_since(std::chrono::time_point<_T>& time) {
      long tm = std::chrono::duration_cast<millisecs>( std::chrono::system_clock::now() - time).count();
  time = std::chrono::system_clock::now();
  return tm;
}

const long N = 20000;
const int size = 10;

typedef int value_type;
typedef long long result_type;
typedef std::vector<value_type> vector_t;
typedef typename vector_t::size_type size_type;

inline value_type scalar_product(const vector_t& x, const vector_t& y) {
  value_type res = 0;
  size_type siz = x.size();
  for (size_type i = 0; i < siz; ++i)
    res += x[i] * y[i];
  return res;
}

int main() {
  auto tm_before = std::chrono::system_clock::now();

  // 1. allocate and fill randomly many short vectors
  vector_t* xs = new vector_t [N];
  for (int i = 0; i < N; ++i) {
    xs[i] = vector_t(size);
      }
  std::cerr << "allocation: " << time_since(tm_before) << " ms" << std::endl;

  std::mt19937 rnd_engine;
  std::uniform_int_distribution<value_type> runif_gen(-1000, 1000);
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < size; ++j)
      xs[i][j] = runif_gen(rnd_engine);
  std::cerr << "random generation: " << time_since(tm_before) << " ms" << std::endl;

  // 2. compute all pairwise scalar products:
  time_since(tm_before);
  result_type avg = 0;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j) 
      avg += scalar_product(xs[i], xs[j]);
  avg = avg / N*N;
  auto time = time_since(tm_before);
  std::cout << "result: " << avg << std::endl;
  std::cout << "time: " << time << " ms" << std::endl;
}

And here the D version:

import std.stdio;
import std.datetime;
import std.random;

const long N = 20000;
const int size = 10;

alias int value_type;
alias long result_type;
alias value_type[] vector_t;
alias uint size_type;

value_type scalar_product(const ref vector_t x, const ref vector_t y) {
  value_type res = 0;
  size_type siz = x.length;
  for (size_type i = 0; i < siz; ++i)
    res += x[i] * y[i];
  return res;
}

int main() {   
  auto tm_before = Clock.currTime();

  // 1. allocate and fill randomly many short vectors
  vector_t[] xs;
  xs.length = N;
  for (int i = 0; i < N; ++i) {
    xs[i].length = size;
  }
  writefln("allocation: %i ", (Clock.currTime() - tm_before));
  tm_before = Clock.currTime();

  for (int i = 0; i < N; ++i)
    for (int j = 0; j < size; ++j)
      xs[i][j] = uniform(-1000, 1000);
  writefln("random: %i ", (Clock.currTime() - tm_before));
  tm_before = Clock.currTime();

  // 2. compute all pairwise scalar products:
  result_type avg = cast(result_type) 0;
  for (int i = 0; i < N; ++i)
    for (int j = 0; j < N; ++j) 
      avg += scalar_product(xs[i], xs[j]);
  avg = avg / N*N;
  writefln("result: %d", avg);
  auto time = Clock.currTime() - tm_before;
  writefln("scalar products: %i ", time);

  return 0;
}

To enable all optimizations and disable all safety checks, compile your D program with the following DMD flags:

-O -inline -release -noboundscheck

EDIT: I've tried your programs with g++, dmd and gdc. dmd does lag behind, but gdc achieves performance very close to g++. The commandline I used was gdmd -O -release -inline (gdmd is a wrapper around gdc which accepts dmd options).

Looking at the assembler listing, it looks like neither dmd nor gdc inlined scalar_product, but g++/gdc did emit MMX instructions, so they might be auto-vectorizing the loop.

Lisp and Erlang Atoms, Ruby and Scheme Symbols. How useful are they?

34 votes

How useful is the feature of having an atom data type in a programming language?

A few programming languages have the concept of atom or symbol to represent a constant of sorts. There are a few differences among the languages I have come across (Lisp, Ruby and Erlang), but it seems to me that the general concept is the same. I am interested in programming language design, and I was wondering what value does having an atom type provide in real life. Other languages such as Python, Java, C# seem to be doing quite well without it.

I have no real experience of Lisp or Ruby (I know the syntaxes, but haven't used either in a real project). I have used Erlang enough to be used to the concept there.

A short example that shows how the ability to manipulate symbols leads to cleaner code: (Code is in Scheme, a dialect of Lisp).

(define men '(socrates plato aristotle))

(define (man? x) 
    (contains? men x))

(define (mortal? x) 
    (man? x))

;; test

> (mortal? 'socrates)
=> #t

You can write this program using character strings or integer constants. But the symbolic version has certain advantages. A symbol is guaranteed to be unique in the system. This makes comparing two symbols as fast as comparing two pointers. This is obviously faster than comparing two strings. Using integer constants allows people to write meaningless code like:

(define SOCRATES 1)
;; ...

(mortal? SOCRATES)
(mortal? -1) ;; ??

Probably a detailed answer to this question could be found in the book Common Lisp: A Gentle Introduction to Symbolic Computation.

Preferred way of modifying elements that have yet to be created (besides events)

33 votes

There are a lot of questions about binding future manipulations to non-existent elements that all end up answered with live/delegate. I am wondering how to run an arbitrary callback (to add a class or trigger a plugin, for example) to all existing elements that match a selector and all future elements that match that same selector that are yet to be created.

It seems that the main functionality of the livequery plugin made it into the core but the other part, attaching arbitrary callbacks got lost along the way somehow.

Another common answer is event delegation but what if one doesn't have access to all of the vendor code that is creating the elements to have it trigger the events?


Here is some real-world code:

// with livequery
$('input[type=text], input[type=password], textarea, .basic_form .block select, .order_form .form_item select, .order_form .form_item input')
    .livequery(function(){
        $(this)
            .focus(function(){
                $(this).addClass('active');
            })
            .blur(function(){
                $(this).removeClass('active');
            })
            .addClass('text');
    });

// with live
$('input[type=text], input[type=password], textarea, .basic_form .block select, .order_form .form_item select, .order_form .form_item input')
    .live('focus', function(){
            $(this).addClass('active');
        })
    .live('blur', function(){
            $(this).removeClass('active');
        });
    // now how to add the class to future elements?
    // (or apply another plugin or whatever arbitrary non-event thing)

One approach would be to monitor when new nodes are added/removed and re-trigger our selectors. Thanks to @arnorhs we know about the DOMNodeInserted event, which I would ignore the cross-browser problems in the hope that those small IE patches could someday land upstream to jQuery or knowing the jQuery DOM functions could be wrapped.

Even if we could ensure that the DOMNodeInserted fired cross-browser, however, it would be ridiculous to bind to it with multiple selectors. Hundreds of elements can be created at any time, and making potentially dozens of selector calls on each of those elements would crawl.

My best idea so far

Would it maybe be better to monitor DOMNodeInserted/Deleted and/or hook into jQuery's DOM manipulation routines to only set a flag that a "re-init" should happen? Then there could just be a timer that checks that flag every x seconds, only running all those selectors/callbacks when the DOM has actually changed.

That could still be really bad if you were adding/removing elements in great numbers at a fast rate (like with animation or __). Having to re-parse the DOM once for each saved selector every x seconds could be too intense if x is low, and the interface would appear sluggish if x is high.

Any other novel solutions?

I will add a bounty when it lets me. I have added a bounty for the most novel solution!

Basically what I am getting at is a more aspect-oriented approach to manipulating the DOM. One that can allow that new elements are going to be created in the future, and they should be created with the initial document.ready modifications applied to them as well.

JS has been able to do so much magic lately that I'm hoping it will be obvious.

In my opinion, the DOM Level 3 events DOMNodeInsertedhelp (which fires only for nodes) and DOMSubtreeModifiedhelp (which fires for virtually any modification, like attribute changes) are your best shot to accomplish that task.

Of course, the big downside of those events is, that the Internet Explorers of this world don't support them
(...well, IE9 does).

The other reasonable solution for this problem, is to hook into any method Which can modify the DOM. But then we have to ask, what is our scope here?

Is it just enough to deal with DOM modification methods from a specific library like jQuery? What if for some reason another library is modifying the DOM or even a native method ?

If it's just for jQuery, we don't need .sub() at all. We could write hooks in the form of:

HTML

<div id="test">Test DIV</div>

JS

(function(_append, _appendTo, _after, _insertAfter, _before, _insertBefore) {
    $.fn.append = function() {
        this.trigger({
            type: 'DOMChanged',
            newnode: arguments[0]
        });
        return _append.apply(this, arguments);
    };
    $.fn.appendTo = function() {
        this.trigger({
            type: 'DOMChanged',
            newnode: this
        });
        return _appendTo.apply(this, arguments);
    };
    $.fn.after = function() {
        this.trigger({
             type: 'DOMChanged',
             newnode: arguments[0]
         });
        return _after.apply(this, arguments);
    };

    // and so forth

}($.fn.append, $.fn.appendTo, $.fn.after, $.fn.insertAfter, $.fn.before, $.fn.insertBefore));

$('#test').bind('DOMChanged', function(e) {
    console.log('New node: ', e.newnode);
});

$('#test').after('<span>new span</span>');
$('#test').append('<p>new paragraph</p>');
$('<div>new div</div>').appendTo($('#test'));

A live example of the above code can be found here: http://www.jsfiddle.net/RRfTZ/1/

This of course requires a complete list of DOMmanip methods. I'm not sure if you can overwrite native methods like .appendChild() with this approach. .appendChild is located in Element.prototype.appendChild for instance, might be worth a try.

update

I tested overwriting Element.prototype.appendChild etc. in Chrome, Safari and Firefox (official latest release). Works in Chrome and Safari but not in Firefox!


There might be other ways to tackle the requirement. But I can't think of a single approach which is really satisfying, like counting / watching all descendents of a node (which would need an interval or timeouts, eeek).

Conclusion

A mixture of DOM Level 3 events where supported and hooked DOMmanip methods is probably the best you can do here.

List<int> test = {1, 2, 3} - is it a feature or a bug?

33 votes

Hello everybody!

As you know, it is not allowed to use the Array-initialisation syntax with Lists. It will give a compile-time error. Example:

List<int> test = { 1, 2, 3} 
// At compilation the following error is shown:
// Can only use array initializer expressions to assign to array types. 

However today I did the following (very simplified):

class Test
{
     public List<int> Field;
}

List<Test> list = new List<Test>
{
    new Test { Field = { 1, 2, 3 } }
};

The code above compiles just fine, but when run it will give a "Object references is not set to an object" run-time error.

I would expect that code to give a compile-time error. My question to you is: Why doesn't it, and are there any good reasons for when such a scenario would run correctly?

This has been tested using .NET 3.5, both .Net and Mono compilers.

Cheers.

I think this is a by-design behavior. The Test = { 1, 2, 3 } is compiled into code that calls Add method of the list stored in the Test field.

The reason why you're getting NullReferenceException is that Test is null. If you initialize the Test field to a new list, then the code will work:

class Test {    
  public List<int> Field = new List<int>(); 
}  

// Calls 'Add' method three times to add items to 'Field' list
var t = new Test { Field = { 1, 2, 3 } };

It is quite logical - if you write new List<int> { ... } then it creates a new instance of list. If you don't add object construction, it will use the existing instance (or null). As far as I can see, the C# spec doesn't contain any explicit translation rule that would match this scenario, but it gives an example (see Section 7.6.10.3):

A List<Contact> can be created and initialized as follows:

var contacts = new List<Contact> {
    new Contact {
        Name = "Chris Smith",
        PhoneNumbers = { "206-555-0101", "425-882-8080" }
    },
    new Contact {
        Name = "Bob Harris",
        PhoneNumbers = { "650-555-0199" }
    }
};

which has the same effect as

var contacts = new List<Contact>();
Contact __c1 = new Contact();
__c1.Name = "Chris Smith";
__c1.PhoneNumbers.Add("206-555-0101");
__c1.PhoneNumbers.Add("425-882-8080");
contacts.Add(__c1);
Contact __c2 = new Contact();
__c2.Name = "Bob Harris";
__c2.PhoneNumbers.Add("650-555-0199");
contacts.Add(__c2);

where __c1 and __c2 are temporary variables that are otherwise invisible and inaccessible.

How can I say "love" without character or digits in JavaScript?

33 votes

Inspired by Ryan Barnett's PPT of BlackHat DC 2011, especially the code below:

($=[$=[]][(__=!$+$)[_=-~-~-~$]+({}+$)[_/_]+ ($$=($_=!''+$)[_/_]+$_[+$])])()[__[_/_]+__ [_+~$]+$_[_]+$$](_/_)

Yesterday was special day for lovers, so I tried to write something similar. Which basically alert "I love you" without any character or digits.

e.g. "I" can be obtained from ((_=-~[])/--_+[])[_]

we have "[object Object]", "true", "false", "NaN", "Infinity" to use, I cannot figure out a way to get "v" this way.

I tried to think of String.fromCharCode(), (Ryan already get window reference for us, so in theory, we can window["String"]["fromCharCode"](118)) however I miss "S" and "C" character here. Also think about window["eval"](...), again, I have no "v".

Just try to explain a little bit, [] is empty, when apply +/-/~ operate to it, it converts to number 0, and ~[] gives 1, 1/0 gives Infinitey. Then it comes to 1/0 + [], they will both converted to string for the add, which gives "Infinity", and "Infinity"[_] == "Infinity"[0] == "I"...

The original code of Ryan is more complex, it utilized a lot more, includes scope, special return value, etc. (this is another story)

This might not seem to be a great idea to do things, but just very interesting.

With help with meze, I was able to produce this for Firefox:

($=($=[$=[]][(__=!$+$)[_=-~-~-~$]+(_$={}+$)[_/_]+ ($$=($_=!''+$)[_/_]+$_[+$])])())[__[_/_]+__ [_+~$]+$_[_]+$$]((_$_=(__$=-~[])/--__$+[])[__$]+_$[_+++_]+__[__$=-~-~[]]+_$[-~[]]+($[_$[$__=_+_]+_$[++$__]+_$[++$__]+_$[++$__]+_$[++$__]+_$[++$__]]+[])[
$__+$__+--_]+__[++_]+_$[$__=_+--_]+_$_[_+++_]+_$[_/_]+$_[__$]);

it basically is alert("I love you"), many thanks! If only I get the help yesterday, which I have not post this yet :(

JavaScript is beautiful, some varibles for your reference:

$_ = "true"
__ = "false"
_$ = "[object Object]"
$$ = "rt"
_$_ = "Infinity"
_ = 3 = 4 = 3 = 4 = 3
$  = window
$__ = 8  = 13
__$  = 0 = 2

Some variables are reused many times, will not try to leave details, it is not a fun job :) I am happy, we are finally here! This actually has lots of potential, as we now have "v", and lots of digits, we will in theory possible to eval() lots of... things easier. I will show this to my wife, hope she enjoys the _$-+()...

example as your reference: http://jsfiddle.net/Y4wqw/

btw, we can shorten the code a bit, as we already have reference to sort(), which can be used instead of window["Object"] to get the "native code" => "v", here it is:

($=($_$=($=[$=[]][(__=!$+$)[_=-~-~-~$]+(_$={}+$)[_/_]+ ($$=($_=!''+$)[_/_]+$_[+$])]))())[__[_/_]+__ [_+~$]+$_[_]+$$]((_$_=(__$=-~[])/--__$+[])[__$]+_$[_+++_]+__[__$=-~-~[]]+_$[-~[]]+($_$+[])[(__$<<__$<<__$)-_+~[]]+$_[--_]+_$[$__=_+++_]+_$_[_+--_]+_$[_/_]+$_[__$]);

Again, it works only in Firefox, might not try to migrate to other browser. And I love Firefox.

Well at least in Firefox, JavaScript native objects return function Object() { [native code] }, which has 'v'. So if we have window and Object, then i suppose we could do:

(window["Object"]+0)[29];

Does Ruby on Rails affect how a web page looks?

32 votes

Most of the time, whenever I hit a website that looks "bubbly" in nature, and all prettified in those pastel-like colors, I think to myself, "This was probably done with Rails." And, lo and behold, after some digging into the site's information pages I discover this is actually true. So, I pose the question, not knowing much about Rails but enough about Django to understand how the database stuff works:

Does RoR have any display-specific qualities that affect how a web page looks? Or do all RoR devs naturally use the same Adobe tools to make everything look so ubiquitous?

Ruby on Rails is a server side technology, so it doesn't lend any specific quality to the user visible design. That said, it is a "trendy" technology so people who are likely to write their back-end code with RoR are likely to choose a particular "Web 2.0" style for their views.

It's kind of like saying "I notice that douches tend to wear backwards baseball caps and wear really baggy jeans. Does wearing a backwards baseball cap affect the tightness of one's jeans?" And the answer would be, no, however, a douche is likely to choose both of these clothing items.

"int main (vooid)"? How does that work?

31 votes

I recently had to type in a small C test program and, in the process, I made a spelling mistake in the main function by accidentally using vooid instead of void.

And yet it still worked.

Reducing it down to its smallest complete version, I ended up with:

int main (vooid) {
    return 42;
}

This does indeed compile (gcc -Wall -o myprog myprog.c) andm when run, it returns 42.

How exactly is this valid code?


Here's a transcript cut and pasted from my bash shell to show what I'm doing:

pax$ cat qq.c
int main (vooid) {
    return 42;
}

pax$ rm qq ; gcc -Wall -o qq qq.c ; ./qq

pax$ echo $?
42

Isn't it just using "old-style" function-declaration syntax; you're implicitly declaring an int parameter called vooid?

How do I track down the cause of a StackOverflowException in .NET?

28 votes

I get a StackOverflowException when I run the following code:

private void MyButton_Click(object sender, EventArgs e) {
  MyButton_Click_Aux();
}

private static volatile int reportCount;

private static void MyButton_Click_Aux() {
  try { /*remove because stack overflows without*/ }
  finally {
    var myLogData = new ArrayList();
    myLogData.Add(reportCount);
    myLogData.Add("method MyButtonClickAux");
    Log(myLogData);
  }
}

private static void Log(object logData) {
  // my log code is not matter
}

What could be causing the StackOverflowException?

I know how to stop it from happening

I just don't know why it causes it (yet). And it would appear you have indeed found a bug either in the .Net BCL or, more likely, in the JIT.

I just commented out all the lines in the MyButton_Click_Aux method and then started bringing them back in, one by one.

Take off the volatile from the static int and you'll no longer get a StackOverflowException.

Now to research why... Clearly something to do with Memory Barriers is causing an issue - perhaps somehow forcing the MyButton_Click_Aux method to call itself...

UPDATE

Okay so other people are finding that .Net 3.5 is not an issue.

I'm using .Nt 4 as well so these comments relate to that:

As I said, take the volatile off and it works.

Equally, if you put the volatile back on and remove the try/finally it also works:

private static void MyButton_Click_Aux()
{
  //try { /*remove because stack overflows without*/ }
  //finally
  //{
    var myLogData = new ArrayList(); 
    myLogData.Add(reportCount); 
    //myLogData.Add("method MyButtonClickAux");
    //Log(myLogData);
  //}
}  

I also wondered if it was something to do with the uninitialised reportCount when the try/finally is in. But it makes no difference if you initialise it to zero.

I'm looking at the IL now - although it might require someone with some ASM chaps to get involved...

Final Update As I say, this really is going to require analysis of the JIT output to really understand what's happening and whilst I find it fun to analyse assembler - I feel it's probably a job for someone in Microsoft so this bug can actually be confirmed and fixed! That said - it appears to be a pretty narrow set of circumstances.

I've moved over to a release build to get rid of all the IL noise (nops etc) for analysis.

This has, however, had a complicating impact on the diagnosis. I thought I had it but didn't - but now I know what it is.

I tried this code:

private static void MyButton_Click_Aux()
{
  try { }
  finally
  {
    var myLogData = new ArrayList();
    Console.WriteLine(reportCount);
    //myLogData.Add("method MyButtonClickAux");
    //Log(myLogData);
  }
}

With the int as volatile. It runs without fault. Here's the IL:

.maxstack 1
L_0000: leave.s L_0015
L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
L_0007: pop 
L_0008: volatile. 
L_000a: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
L_000f: call void [mscorlib]System.Console::WriteLine(int32)
L_0014: endfinally 
L_0015: ret 
.try L_0000 to L_0002 finally handler L_0002 to L_0015

Then we look at the minimum code required to get the error again:

private static void MyButton_Click_Aux()
{
  try { }
  finally
  {
    var myLogData = new ArrayList();
    myLogData.Add(reportCount);
  }
}

And it's IL:

.maxstack 2
.locals init (
    [0] class [mscorlib]System.Collections.ArrayList myLogData)
L_0000: leave.s L_001c
L_0002: newobj instance void [mscorlib]System.Collections.ArrayList::.ctor()
L_0007: stloc.0 
L_0008: ldloc.0 
L_0009: volatile. 
L_000b: ldsfld int32 modreq([mscorlib]System.Runtime.CompilerServices.IsVolatile) WindowsFormsApplication1.Form1::reportCount
L_0010: box int32
L_0015: callvirt instance int32 [mscorlib]System.Collections.ArrayList::Add(object)
L_001a: pop 
L_001b: endfinally 
L_001c: ret 
.try L_0000 to L_0002 finally handler L_0002 to L_001c

The difference? Well there's two that I spotted - boxing of the volatile int, and a virtual call. So I setup these two classes:

public class DoesNothingBase
{
  public void NonVirtualFooBox(object arg) { }
  public void NonVirtualFooNonBox(int arg) { }

  public virtual void FooBox(object arg) { }
  public virtual void FooNonBox(int arg) { }
}

public class DoesNothing : DoesNothingBase
{
  public override void FooBox(object arg) { }
  public override void FooNonBox(int arg) { }
}

And then tried each of these four versions of the offending method:

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.FooNonBox(reportCount);
}

Which works.

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.NonVirtualFooNonBox(reportCount);
}

Which also works.

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.FooBox(reportCount);
}

Oops - StackOverflowException.

And:

try { }
finally
{
  var doesNothing = new DoesNothing();
  doesNothing.NonVirtualFooBox(reportCount);
}

Oops again! StackOverflowException!

We could go further with this - but the issue is, I feel, clearly caused by the boxing of the volatile int whilst inside the finally block of a try/catch... I put the code inside the try, and no problem. I added a catch clause (and put the code in there), also no problem.

It could also apply to the boxing of other value types I guess.

So, to summarise - in .Net 4.0 - in both debug and release builds - the boxing of a volatile int in a finally block appears to cause the JIT to generate code that ends up filling the stack. The fact that the stack trace simply shows 'external code' also supports this proposition.

There's even a possibility that it can't always be reproduced and might even depend on the layout and size of the code that is generated by the try/finally. It's clearly something to do with an errant jmp or something similar being generated to the wrong location which eventually repeats one or more push commands to the stack. The idea that that is being caused actually by a box operation is, frankly, fascinating!

Final Final Update

If you look at the MS Connect bug that @Hasty G found (answer further down) - you see there that the bug manifests in a similar fashion, but with a volatile bool in a catch statement.

Also - MS queued a fix for this after getting it to repro - but no hotfix available yet after 7 months. I've gone on record before as being in support of MS Connect, so I'll say no more - I don't think I need to!

Final Final Final Update (23/02/2011)

It is fixed - but not yet released. Quote from the MS Team on the MS Connect bug:

Yes, it's fixed. We're in the process of figuring out how best to ship a fix. It is already fixed in 4.5, but we'd really like to fix a batch of code generation bugs prior to 4.5 release.