Best questions in October 2011

What does the C ??!??! operator do?!

192 votes

I saw a line of C that looked like this:

!ErrorHasOccured() ??!??! HandleError();

It compiled correctly and seems to run ok. It seems to like it's checking if an error has occurred, and if it has, it handles it, but I'm not really sure what it's actually doing or how it's doing it. It does look like the programmer is trying express his feelings about errors.

I have never seen the ??!??! before in any programming language, and I can't find documentation for it anywhere. (Google doesn't help with search terms like ??!??!). What does it do and how does the code sample work?

It's a trigraph that translates to |. So it says:

!ErrorHasOccured() || HandleError();

which, due to short circuiting, is equivalent to:

if (ErrorHasOccured())
    HandleError();

Guru of the Week (deals with C++ but relevant here), where I picked this up.

Possible origin of trigraphs or as @DwB points out in the comments it's more likely due to EBCDIC being difficult (again). This discussion on the IBM developerworks board seems to support that theory.

From ISO/IEC 9899:1999 §5.2.1.1, footnote 12 (h/t @Random832):

The trigraph sequences enable the input of characters that are not defined in the Invariant Code Set as described in ISO/IEC 646, which is a subset of the seven-bit US ASCII code set.

Throwing the fattest people off of an overloaded airplane.

91 votes

Let's say you've got an airplane, and it is low on fuel. Unless the plane drops 3000 pounds of passenger weight, it will not be able to reach the next airport. To save the maximum number of lives, we would like to throw the heaviest people off of the plane first.

And oh yeah, there are millions of people on the airplane, and we would like an optimal algorithm to find the heaviest passengers, without necessarily sorting the entire list.

This is a proxy problem for something I'm trying to code in C++. I would like to do a "partial_sort" on the passenger manifest by weight, but I don't know how many elements I'm going to need. I could implement my own "partial_sort" algorithm ("partial_sort_accumulate_until"), but I'm wondering if there's any easier way to do this using standard STL.

One way would be to use a min heap. Here's how you'd do it, assuming you had a MinHeap class. (Yes, my example is in C#. I think you get the idea.)

int targetTotal = 3000;
int totalWeight = 0;
// this creates an empty heap!
var myHeap = new MinHeap<Passenger>(/* need comparer here to order by weight */);
foreach (var pass in passengers)
{
    if (totalWeight < targetTotal)
    {
        // unconditionally add this passenger
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
    else if (pass.Weight > myHeap.Peek().Weight)
    {
        // If this passenger is heavier than the lightest
        // passenger already on the heap,
        // then remove the lightest passenger and add this one
        var oldPass = myHeap.RemoveFirst();
        totalWeight -= oldPass.Weight;
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
}

// At this point, the heaviest people are on the heap,
// but there might be too many of them.
// Remove the lighter people until we have the minimum necessary
while ((totalWeight - myHeap.Peek().Weight) > targetTotal)
{
    var oldPass = myHeap.RemoveFirst();
    totalWeight -= oldPass.Weight; 
}
// The heap now contains the passengers who will be thrown overboard.

According to the standard references, running time should be proportional to n log k, where n is the number of passengers and k is the maximum number of items on the heap. If we assume that passengers' weights will typically be 100 lbs or more, then it's unlikely that the heap will contain more than 30 items at any time.

The worst case would be if the passengers are presented in order from lowest weight to highest. That would require that every passenger be added to the heap, and every passenger be removed from the heap. Still, with a million passengers and assuming that the lightest weighs 100 lbs, the n log k works out to a reasonably small number.

If you get the passengers' weights randomly, performance is much better. I use something quite like this for a recommendation engine (I select the top 200 items from a list of several million). I typically end up with only 50,000 or 70,000 items actually added to the heap.

I suspect that you'll see something quite similar: the majority of your candidates will be rejected because they're lighter than the lightest person already on the heap. And Peek is an O(1) operation.

For a more information about the performance of heap select and quick select, see When theory meets practice. Short version: if you're selecting fewer than 1% of the total number of items, then heap select is a clear winner over quick select. More than 1%, then use quick select or a variant like Introselect.

What is the purpose of a stack? Why do we need it? (MSIL)

86 votes

So I am learning MSIL right now to learn to debug my C# .net applications.

I've always wondered: what is the purpose of the stack?

Just to put my question in context:
Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory? (reflective question)

  • is it because it's faster?
  • is it because it's RAM based?
  • for efficiency?

I'm trying to grasp this to help me understand IL codes much more deeply.

Thanks for the help.

I've always wondered: what is the purpose of the stack?

I assume you mean the evaluation stack of the MSIL language, and not the actual per-thread stack at runtime.

Why is there a transfer from memory to stack or "loading?" On the other hand, why is there a transfer from stack to memory or "storing"? Why not just have them all placed in the memory?

MSIL is a "virtual machine" language. Compilers like the C# compiler generate IL, and then at runtime another compiler called the JIT (Just In Time) compiler turns the IL into actual machine code that can execute.

So first lets answer the question "why have MSIL at all?" Why not just have the C# compiler write out machine code?

Because it is cheaper to do it this way. Suppose we didn't do it that way; suppose each language has to have its own machine code generator. You have twenty different languages: C#, JScript .NET, Visual Basic, Iron Python, F#... And suppose you have ten different processors. How many code generators do you have to write? 20 x 10 = 200 code generators. That's a lot of work. Now suppose you want to add a new processor. You have to write the code generator for it twenty times, one for each language.

Furthermore, it is difficult and dangerous work. Writing efficient code generators for chips that you are not an expert on is a hard job! Compiler designers are experts on the semantic analysis of their language, not on efficient register allocation of new chip sets.

Now suppose we do it the IL way. How many IL generators do you have to write? One per language. How many jit compilers do you have to write? One per processor. Total: 20 + 10 = 30 code generators. Moreover, the language-to-IL generator is easy to write because IL is a simple language, and the IL-to-machine-code generator is also easy to write because IL is a simple language. We get rid of all of the intricacies of C# and VB and whatnot and "lower" everything to a simple language that is easy to write a jitter for.

Having an intermediate language lowers the cost of producing a new language compiler dramatically. It also lowers the cost of supporting a new chip dramatically. You want to support a new chip, you find some experts on that chip and have them write an IL jitter and you're done; you then support all those languages on your chip.

OK, so we've established why we have MSIL; because having an intermediate language lowers costs. Why then is the language a "stack machine"?

Because stack machines are conceptually very simple for language compiler writers to deal with. Stacks are a simple, easily understood mechanism for describing computations. Stack machines are also conceptually very easy for jit compiler writers to deal with. Using a stack is a simplifying abstraction, and therefore again, it lowers our costs.

You ask "why have a stack at all?" Why not just do everything directly out of memory? Well, let's think about that. Suppose you want to generate IL code for:

int x = A() + B() + C() + 10;

Suppose we have the convention that "add", "call", "store" and so on always take their arguments off the stack and put their result (if there is one) on the stack. To generate IL code for this C# we just say something like:

load the address of x // stack now contains address of x
call A()              // stack contains address of x and result of A()
call B()              // addr of x, result of A(), result of B()
add                   // addr of x, result of A() + B()
call C()              // addr of x, result of A() + B(), result of C()
add                   // addr of x, result of A() + B() + C()
load 10               // addr of x, result of A() + B() + C(), 10
add                   // addr of x, result of A() + B() + C() + 10
store in address      // result is now stored in x, stack is empty.

Now suppose we did it without a stack. We'll do it your way, where every opcode takes the addresses of its operands and the address to which it stores its result:

allocate temporary store T1 for result of A()
call A() with address of T1
allocate temporary store T2 for result of B()
call B() with address of T2
allocate temporary store T3 for result of first addition
add contents of T1 to T2, then store the result into address of T3
allocate temporary store T4 for result of C()
call C() with address of T4
allocate temporary store T5 for result of second addition
...

you see how this goes? Our code is getting huge because we have to explicitly allocate all the temporary storage that would normally by convention just go on the stack. Worse, our opcodes themselves are all getting enormous because they all now have to take as an argument the address that they're going to write their result into, and the address of each operand. An "add" instruction that knows that it is going to take two things off the stack and put one thing on can be a single byte. An add instruction that takes two operand addresses and a result address is going to be enormous.

We use stack-based opcodes because stacks solve the common problem. Namely: I want to allocate some temporary storage, use it very soon and then get rid of it quickly when I'm done. By making the assumption that we have a stack at our disposal we can make the opcodes very small and the code very terse.

UPDATE: Some additional thoughts

Incidentally, this idea of drastically lowering costs by (1) specifing a virtual machine, (2) writing compilers that target the VM language, and (3) writing implementations of the VM on a variety of hardware, is not a new idea at all. It did not originate with MSIL, LLVM, Java bytecode, or any other modern infrastructures. The earliest implementation of this strategy I'm aware of is the pcode machine from 1966.

The first I personally heard of this concept was when I learned how the Infocom implementors managed to get Zork running on so many different machines so well. They specified a virtual machine called the Z-machine and then made Z-machine emulators for all the hardware they wanted to run their games on. This had the added enormous benefit that they could implement virtual memory management on primitive 8-bit systems; a game could be larger than would fit into memory because they could just page the code in from disk when they needed it and discard it when they needed to load new code.

What's the point of const pointers?

74 votes

I'm not talking about pointers to const values, but const pointers themselves.

I'm learning C and C++ beyond the very basic stuff and just until today I realized that pointers are passed by value to functions, which makes sense. This means that inside a function I can make the copied pointer point to some other value without affecting the original pointer from the caller.

So what's the point of having a function header that says:

void foo(int* const ptr);

Inside such a function you cannot make ptr point to something else because it's const and you don't want it to be modified, but a function like this:

void foo(int* ptr);

Does the work just as well! because the pointer is copied anyways and the pointer in the caller is not affected even if you modify the copy. So what's the advantage of const?

Update: There are illegal sites like programmersgoodies.com linking to this question / answer. Sites like these are violating the StackOverflow attribution requirements.


Answer:

const is a tool which you should use in pursuit of a very important C++ concept:

Find bugs at compile-time, rather than run-time, by getting the compiler to enforce what you mean.

Even though it doesn't change the functionality, adding const generates a compiler error when you're doing things you didn't mean to do. Imagine the following typo:

void foo(int* ptr)
{
    ptr = 0;// oops, I meant *ptr = 0
}

If you use int* const, this would generate a compiler error because you're changing the value to ptr. Adding restrictions via syntax is a good thing in general. Just don't take it too far -- the example you gave is a case where most people don't bother using const.

Pattern to avoid nested try catch blocks?

74 votes

Consider a situation where I have three (or more) ways of performing a calculation, each of which can fail with an exception. In order to attempt each calculation until we find one that succeeds, I have been doing the following:

double val;

try { val = calc1(); }
catch (Calc1Exception e1)
{ 
    try { val = calc2(); }
    catch (Calc2Exception e2)
    {
        try { val = calc3(); }
        catch (Calc3Exception e3)
        {
            throw new NoCalcsWorkedException();
        }
    }
}

Is there any accepted pattern which achieves this in a nicer way? Of course I could wrap each calculation in a helper method which returns null on failure, and then just use the ?? operator, but is there a way of doing this more generally (i.e. without having to write a helper method for each method I want to use)? I've thought about writing a static method using generics which wraps any given method in a try/catch and returns null on failure, but I'm not sure how I would go about this. Any ideas?

As far as possible, don't use exceptions for control flow or unexceptional circumstances.

But to answer your question directly (assuming all the exception-types are the same):

Func<double>[] calcs = { calc1, calc2, calc3 };

foreach(var calc in calcs)
{
   try { return calc(); }
   catch (CalcException){  }
} 

throw new NoCalcsWorkedException();

What is x after "x = x++"?

69 votes

Possible Duplicate:
Is there a difference between x++ and ++x in java?
Why does this go into an infinite loop?

What happens (behind the curtains) when this is executed?

int x = 7;
x = x++;

I compiled and executed this. x is still 7 even after the entire statement. In my book, it says that x is incremented!

x = x++;

is equivalent to

int tmp = x;
x++;
x = tmp;

Immutable objects that reference each other?

62 votes

Today I was trying to wrap my head around immutable objects that reference each other. I came to the conclusion that you can't possibly do that without using lazy evaluation but in the process I wrote this (in my opinion) interesting code.

public class A
{
    public string Name { get; private set; }
    public B B { get; private set; }
    public A()
    {
        B = new B(this);
        Name = "test";
    }
}

public class B
{
    public A A { get; private set; }
    public B(A a)
    {
        //a.Name is null
        A = a;
    }
}

What I find interesting is that I cannot think of another way to observe object of type A in a state that is not yet fully constructed and that includes threads. Why is this even valid? Are there any other ways to observe the state of an object that is not fully constructed?

Why is this even valid?

Why do you expect it to be invalid?

Because a constructor is supposed to guarantee that the code it contains is executed before outside code can observe the state of the object.

Correct. But the compiler is not responsible for maintaining that invariant. You are. If you write code that breaks that invariant, and it hurts when you do that, then stop doing that.

Are there any other ways to observe the state of an object that is not fully constructed?

Sure. For reference types, all of them involve somehow passing "this" out of the constructor, obviously, since the only user code that holds the reference to the storage is the constructor. Some ways the constructor can leak "this" are:

  • Put "this" in a static field and reference it from another thread
  • make a method call or constructor call and pass "this" as an argument
  • make a virtual call -- particularly nasty if the virtual method is overridden by a derived class, because then it runs before the derived class ctor body runs.

I said that the only user code that holds a reference is the ctor, but of course the garbage collector also holds a reference. Therefore, another interesting way in which an object can be observed to be in a half-constructed state is if the object has a destructor, and the constructor throws an exception (or gets an asynchronous exception like a thread abort; more on that later.) In that case, the object is about to be dead and therefore needs to be finalized, but the finalizer thread can see the half-initialized state of the object. And now we are back in user code that can see the half-constructed object!

Destructors are required to be robust in the face of this scenario. A destructor must not depend on any invariant of the object set up by the constructor being maintained, because the object being destroyed might never have been fully constructed.

Another crazy way that a half-constructed object could be observed by outside code is of course if the destructor sees the half-initialized object in the scenario above, and then copies a reference to that object to a static field, thereby ensuring that the half-constructed, half-finalized object is rescued from death. Please do not do that. Like I said, if it hurts, don't do it.

If you're in the constructor of a value type then things are basically the same, but there are some small differences in the mechanism. The language requires that a constructor call on a value type creates a temporary variable that only the ctor has access to, mutate that variable, and then do a struct copy of the mutated value to the actual storage. That ensures that if the constructor throws, then the final storage is not in a half-mutated state.

Note that since struct copies are not guaranteed to be atomic, it is possible for another thread to see the storage in a half-mutated state; use locks correctly if you are in that situation. Also, it is possible for an asynchronous exception like a thread abort to be thrown halfway through a struct copy. These non-atomicity problems arise regardless of whether the copy is from a ctor temporary or a "regular" copy. And in general, very few invariants are maintained if there are asynchronous exceptions.

In practice, the C# compiler will optimize away the temporary allocation and copy if it can determine that there is no way for that scenario to arise. For example, if the new value is initializing a local that is not closed over by a lambda and not in an iterator block, then S s = new S(123); just mutates s directly.

For more information on how value type constructors work, see:

http://blogs.msdn.com/b/ericlippert/archive/2010/10/11/debunking-another-myth-about-value-types.aspx

And for more information on how C# language semantics try to save you from yourself, see:

http://blogs.msdn.com/b/ericlippert/archive/2008/02/15/why-do-initializers-run-in-the-opposite-order-as-constructors-part-one.aspx

http://blogs.msdn.com/b/ericlippert/archive/2008/02/18/why-do-initializers-run-in-the-opposite-order-as-constructors-part-two.aspx

I seem to have strayed from the topic at hand. In a struct you can of course observe an object to be half-constructed in the same ways -- copy the half-constructed object to a static field, call a method with "this" as an argument, and so on. (Obviously calling a virtual method on a more derived type is not a problem with structs.) And, as I said, the copy from the temporary to the final storage is not atomic and therefore another thread can observe the half-copied struct.


Now let's consider the root cause of your question: how do you make immutable objects that reference each other?

Typically, as you've discovered, you don't. If you have two immutable objects that reference each other then logically they form a directed cyclic graph. You might consider simply building an immutable directed graph! Doing so is quite easy. An immutable directed graph consists of:

  • An immutable list of immutable nodes, each of which contains a value.
  • An immutable list of immutable node pairs, each of which has the start and end point of a graph edge.

Now the way you make nodes A and B "reference" each other is:

A = new Node("A");
B = new Node("B");
G = Graph.Empty.AddNode(A).AddNode(B).AddEdge(A, B).AddEdge(B, A);

And you're done, you've got a graph where A and B "reference" each other.

The problem, of course, is that you cannot get to B from A without having G in hand. Having that extra level of indirection might be unacceptable.

Efficiency of premature return in a function

61 votes

This is a situation I encounter frequently as an inexperienced programmer and am wondering about particularly for an ambitious, speed-intensive project of mine I'm trying to optimize. For the major C-like languages (C, objC, C++, Java, C#, etc) and their usual compilers, will these two functions run just as efficiently? Is there any difference in the compiled code?

void foo1(bool flag)
{
    if (flag)
    {
        //Do stuff
        return;
    }

    //Do different stuff
}

void foo2(bool flag)
{
    if (flag)
    {
        //Do stuff
    }
    else
    {
        //Do different stuff
    }
}

Basically, is there ever a direct efficiency bonus/penalty when breaking or returning early? How is the stackframe involved? Are there optimized special cases? Are there any factors (like inlining or the size of "Do stuff") that could affect this significantly?

I'm always a proponent of improved legibility over minor optimizations (I see foo1 a lot with parameter validation), but this comes up so frequently that I'd like to set aside all worry once and for all.

And I'm aware of the pitfalls of premature optimization... ugh, those are some painful memories.

EDIT: I accepted an answer, but EJP's answer explains pretty succinctly why the use of a return is practically negligible (in assembly, the return creates a 'branch' to the end of the function, which is extremely fast. The branch alters the PC register and may also affect the cache and pipeline, which is pretty minuscule.) For this case in particular, it literally makes no difference because both the if/else and the return create the same branch to the end of the function.

There is no difference at all:

=====> cat test_return.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
    }
    else
        something2();
}
=====> cat test_return2.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
        return;
    }
    something2();
}
=====> rm -f test_return.s test_return2.s
=====> g++ -S test_return.cpp 
=====> g++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> rm -f test_return.s test_return2.s
=====> clang++ -S test_return.cpp 
=====> clang++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> 

Meaning no difference in generated code whatsoever even without optimization in two compilers

What is happening here in this C++ code?

54 votes

Can anyone please explain what is going in this C++ code. It compiles and executes fine on Linux.

#include <iostream>
using namespace std;
int main = ( cout << "Hello world!\n", 195 );

The number "195" is the code of RET instruction on x86.

The C++ compiler (gcc in my case) is unable to recognize that "main" wasn't declared as a function. The compiler only sees that there is the "main" symbol, and presumes that it refers to a function.

The C++ code

int main = ( cout << "Hello world!\n", 195 );

is initializing a variable at file-scope. This initialization code is executed before the C/C++ environment calls main(), but after it initializes the "cout" variable. The initialization prints "Hello, world!\n", and sets the value of variable "main" to 195. After all initialization is done, the C/C++ environment makes a call to "main". The program returns immediately from this call because we put a RET instruction (code 195) at the address of "main".

Sample GDB output:

$ gdb ./a
(gdb) break _fini
Breakpoint 1 at 0x8048704
(gdb) print main
$1 = 0
(gdb) disass &main
Dump of assembler code for function main:
   0x0804a0b4 <+0>:     add    %al,(%eax)
   0x0804a0b6 <+2>:     add    %al,(%eax)
End of assembler dump.
(gdb) run
Starting program: /home/atom/a 
Hello world!

Breakpoint 1, 0x08048704 in _fini ()
(gdb) print main
$2 = 195
(gdb) disass &main
Dump of assembler code for function main:
   0x0804a0b4 <+0>:     ret    
   0x0804a0b5 <+1>:     add    %al,(%eax)
   0x0804a0b7 <+3>:     add    %al,(%eax)
End of assembler dump.

Any idea why I need to cast an integer literal to (int) here?

53 votes

In the following example

int i = -128;
Integer i2 = (Integer) i; // compiles

Integer i3 = (Integer) -128; /*** Doesn't compile ***/

Integer i4 = (Integer) (int) -128; // compiles
Integer i4 = -128; // compiles
Integer i5 = (int) -128; // compiles
Integer i6 = (Integer) (-128); // compiles
Integer i7 = (Integer) 0-128; // compiles

I can't cast -128 with (Integer) but I can cast (int) -128.

I always thought -128 was of int type and casting it with (int) should be redundant.

The error on the line with i3 is

cannot find symbol variable Integer

I tried this with Java 6 update 29 and Java 7 update 1.

EDIT: You get the same behaviour with +128 instead of -128. It does appear to be confusion between unary and binary operators.

The compiler tries to substract 128 from (Integer) instead of casting -128 to Integer. Add () to fix it

Integer i3 = (Integer) -128; // doesn't compile
Integer i3 = (Integer) (-128); // compiles

According to BoltClock in the comments the cast to int works as intended, becaus it is a reserved word and therefore can't be interpreted as an identifier, which makes sense to me.

And Bringer128 found the JLS Reference 15.16.

 CastExpression:
    ( PrimitiveType Dimsopt ) UnaryExpression
    ( ReferenceType ) UnaryExpressionNotPlusMinus

As you can see, casting to a primitive type requires any UnaryExpression, whereas casting to a reference type requires a UnaryExpressionNotPlusMinus. These are defined just before the CastExpression at JLS 15.15.

Does IF perform better than IF-ELSE?

51 votes

Which one of these blocks of code performs better, and which one of them is more readable? I'd guess the gain would be negligible, particularly in the second block. I am just curious.

Block #1

string height;
string width;
if (myFlag == 1)
{
    height = "60%";
    width = "60%";
}
else
{
    height = "80%";
    width = "80%";
}

Block #2

string height = "80%";
string width = "80%";
if (myFlag == 1)
{
    height = "60%";
    width = "60%";
}

Updated

The results when i tested the above code were that both the blocks performed the same

Block #1

myFlag = 1:   3 Milliseconds
myFlag = 0:   3 Milliseconds

Block #2

myFlag = 1:   3 Milliseconds
myFlag = 0:   3 Milliseconds

But one important thing i noticed here(thanks to Matthew Steeples answer here) is that since the block of code that i have tested has not used the variables height and width except for assignment in the if-else and if blocks of Code Block-1 and 2 respectively, the compiler has optimized the IL code by completely removing the if and if-else blocks in question thus showing invalid results for our test here.

I have updated both the code blocks to write the values of both height and width to a file thus using them again and forcing the compiler to run our test blocks(I hope), but you can observe from the code that the actual file writing part does not effect our test results

This is the updated results, C# and IL Code

Results

Block #1

myFlag = 1:   1688 Milliseconds
myFlag = 0:   1664 Milliseconds

Block #2

myFlag = 1:   1700 Milliseconds
myFlag = 0:   1677 Milliseconds

C#.net Code

Block #1

    public long WithIfAndElse(int myFlag)
    {
        Stopwatch myTimer = new Stopwatch();
        string someString = "";
        myTimer.Start();
        for (int i = 0; i < 1000000; i++)
        {
            string height;
            string width;
            if (myFlag == 1)
            {
                height = "60%";
                width = "60%";
            }
            else
            {
                height = "80%";
                width = "80%";
            }
            someString = "Height: " + height + Environment.NewLine + "Width: " + width;
        }
        myTimer.Stop();
        File.WriteAllText("testifelse.txt", someString);
        return myTimer.ElapsedMilliseconds;
    }

Block #2

    public long WithOnlyIf(int myFlag)
    {
         Stopwatch myTimer = new Stopwatch();
        string someString = "";
        myTimer.Start();
        for (int i = 0; i < 1000000; i++)
        {
            string height = "80%";
            string width = "80%";
            if (myFlag == 1)
            {
                height = "60%";
                width = "60%";
            }
            someString = "Height: " + height + Environment.NewLine + "Width: " + width;
        }
        myTimer.Stop();
        File.WriteAllText("testif.txt", someString);
        return myTimer.ElapsedMilliseconds;
    }

IL Code Generated By ildasm.exe

Block #1

.method public hidebysig instance int64  WithIfAndElse(int32 myFlag) cil managed
{
  // Code size       144 (0x90)
  .maxstack  3
  .locals init ([0] class [System]System.Diagnostics.Stopwatch myTimer,
           [1] string someString,
           [2] int32 i,
           [3] string height,
           [4] string width,
           [5] string[] CS$0$0000)
  IL_0000:  newobj     instance void [System]System.Diagnostics.Stopwatch::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldstr      ""
  IL_000b:  stloc.1
  IL_000c:  ldloc.0
  IL_000d:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Start()
  IL_0012:  ldc.i4.0
  IL_0013:  stloc.2
  IL_0014:  br.s       IL_0070
  IL_0016:  ldarg.1
  IL_0017:  ldc.i4.1
  IL_0018:  bne.un.s   IL_0029
  IL_001a:  ldstr      "60%"
  IL_001f:  stloc.3
  IL_0020:  ldstr      "60%"
  IL_0025:  stloc.s    width
  IL_0027:  br.s       IL_0036
  IL_0029:  ldstr      "80%"
  IL_002e:  stloc.3
  IL_002f:  ldstr      "80%"
  IL_0034:  stloc.s    width
  IL_0036:  ldc.i4.5
  IL_0037:  newarr     [mscorlib]System.String
  IL_003c:  stloc.s    CS$0$0000
  IL_003e:  ldloc.s    CS$0$0000
  IL_0040:  ldc.i4.0
  IL_0041:  ldstr      "Height: "
  IL_0046:  stelem.ref
  IL_0047:  ldloc.s    CS$0$0000
  IL_0049:  ldc.i4.1
  IL_004a:  ldloc.3
  IL_004b:  stelem.ref
  IL_004c:  ldloc.s    CS$0$0000
  IL_004e:  ldc.i4.2
  IL_004f:  call       string [mscorlib]System.Environment::get_NewLine()
  IL_0054:  stelem.ref
  IL_0055:  ldloc.s    CS$0$0000
  IL_0057:  ldc.i4.3
  IL_0058:  ldstr      "Width: "
  IL_005d:  stelem.ref
  IL_005e:  ldloc.s    CS$0$0000
  IL_0060:  ldc.i4.4
  IL_0061:  ldloc.s    width
  IL_0063:  stelem.ref
  IL_0064:  ldloc.s    CS$0$0000
  IL_0066:  call       string [mscorlib]System.String::Concat(string[])
  IL_006b:  stloc.1
  IL_006c:  ldloc.2
  IL_006d:  ldc.i4.1
  IL_006e:  add
  IL_006f:  stloc.2
  IL_0070:  ldloc.2
  IL_0071:  ldc.i4     0xf4240
  IL_0076:  blt.s      IL_0016
  IL_0078:  ldloc.0
  IL_0079:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Stop()
  IL_007e:  ldstr      "testifelse.txt"
  IL_0083:  ldloc.1
  IL_0084:  call       void [mscorlib]System.IO.File::WriteAllText(string,
                                                                   string)
  IL_0089:  ldloc.0
  IL_008a:  callvirt   instance int64 [System]System.Diagnostics.Stopwatch::get_ElapsedMilliseconds()
  IL_008f:  ret
} // end of method frmResearch::WithIfAndElse

Block #2

.method public hidebysig instance int64  WithOnlyIf(int32 myFlag) cil managed
{
  // Code size       142 (0x8e)
  .maxstack  3
  .locals init ([0] class [System]System.Diagnostics.Stopwatch myTimer,
           [1] string someString,
           [2] int32 i,
           [3] string height,
           [4] string width,
           [5] string[] CS$0$0000)
  IL_0000:  newobj     instance void [System]System.Diagnostics.Stopwatch::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldstr      ""
  IL_000b:  stloc.1
  IL_000c:  ldloc.0
  IL_000d:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Start()
  IL_0012:  ldc.i4.0
  IL_0013:  stloc.2
  IL_0014:  br.s       IL_006e
  IL_0016:  ldstr      "80%"
  IL_001b:  stloc.3
  IL_001c:  ldstr      "80%"
  IL_0021:  stloc.s    width
  IL_0023:  ldarg.1
  IL_0024:  ldc.i4.1
  IL_0025:  bne.un.s   IL_0034
  IL_0027:  ldstr      "60%"
  IL_002c:  stloc.3
  IL_002d:  ldstr      "60%"
  IL_0032:  stloc.s    width
  IL_0034:  ldc.i4.5
  IL_0035:  newarr     [mscorlib]System.String
  IL_003a:  stloc.s    CS$0$0000
  IL_003c:  ldloc.s    CS$0$0000
  IL_003e:  ldc.i4.0
  IL_003f:  ldstr      "Height: "
  IL_0044:  stelem.ref
  IL_0045:  ldloc.s    CS$0$0000
  IL_0047:  ldc.i4.1
  IL_0048:  ldloc.3
  IL_0049:  stelem.ref
  IL_004a:  ldloc.s    CS$0$0000
  IL_004c:  ldc.i4.2
  IL_004d:  call       string [mscorlib]System.Environment::get_NewLine()
  IL_0052:  stelem.ref
  IL_0053:  ldloc.s    CS$0$0000
  IL_0055:  ldc.i4.3
  IL_0056:  ldstr      "Width: "
  IL_005b:  stelem.ref
  IL_005c:  ldloc.s    CS$0$0000
  IL_005e:  ldc.i4.4
  IL_005f:  ldloc.s    width
  IL_0061:  stelem.ref
  IL_0062:  ldloc.s    CS$0$0000
  IL_0064:  call       string [mscorlib]System.String::Concat(string[])
  IL_0069:  stloc.1
  IL_006a:  ldloc.2
  IL_006b:  ldc.i4.1
  IL_006c:  add
  IL_006d:  stloc.2
  IL_006e:  ldloc.2
  IL_006f:  ldc.i4     0xf4240
  IL_0074:  blt.s      IL_0016
  IL_0076:  ldloc.0
  IL_0077:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Stop()
  IL_007c:  ldstr      "testif.txt"
  IL_0081:  ldloc.1
  IL_0082:  call       void [mscorlib]System.IO.File::WriteAllText(string,
                                                                   string)
  IL_0087:  ldloc.0
  IL_0088:  callvirt   instance int64 [System]System.Diagnostics.Stopwatch::get_ElapsedMilliseconds()
  IL_008d:  ret
} // end of method frmResearch::WithOnlyIf

So we can say that the IF-Else block(Block #1) runs faster than the if block(Block #2) as pointed by many in this forum.

Test Results

10,000,000 iterations of Block 1

myFlag = 0:    23.8ns per iteration
myFlag = 1:    23.8ns per iteration

10,000,000 iterations of Block 2

myFlag = 0:    23.8ns per iteration
myFlag = 1:    46.8ns per iteration

Block 2 is 96% slower than Block 1. Makes sense, since Block 2 does twice the work in the pessimistic case.

i prefer either case, depending on the situation. If myFlag is rarely ever 1, then it want it to stand out as the edge case that we have to handle. If both are equally likely, i want the if-else syntax. But that's preference, not fact.


Decades ago, the intel 80286 dual pipeline would stall if a conditional jump was taken, rather than falling through to the next instruction. By the time of the Pentium that went away; the CPU pre-fetches both branch paths. But in the back of my mind i still have a twinge of fear whenever i write code that has the most common outcome in the else clause. Every time i have to remind myself that it doesn't matter anymore.


Int32 reps = 10000000;

private void Block1(int myFlag)
{
    String width;
    String height;

    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < reps; i++)
    {
        if (myFlag == 1)
        {
            width = String.Format("{0:g}%", 60);
            height = String.Format("{0:g}%", 60);
        }
        else
        {
            width = String.Format("{0:g}%", 80);
            height = String.Format("{0:g}%", 80);
        }
    }
    sw.Stop();
    Double time = (Double)sw.Elapsed.Ticks / Stopwatch.Frequency * 1000000000.0 / reps;
    MessageBox.Show(time.ToString() + " ns");
}

private void Block2(int myFlag)
{
    String width;
    String height;

    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < reps; i++)
    {
        width = String.Format("{0:g}%", 80);
        height = String.Format("{0:g}%", 80);
        if (myFlag == 1)
        {
            width = String.Format("{0:g}%", 60);
            height = String.Format("{0:g}%", 60);
        }
    }
    sw.Stop();

    Double time = (Double)sw.Elapsed.Ticks / Stopwatch.Frequency * 1000000000.0 / reps;
    MessageBox.Show(time.ToString() + " ns");
}
  • String.Format makes IF 96% slower
  • GetPercentageString(0.60) makes IF 96% slower

const
   reps = 10000000;

procedure Block1(myflag: Integer);
var
   width, height: string;
   i: Integer;
   t1, t2: Int64;
   time: Extended;
   freq: Int64;
begin
   QueryPerformanceCounter(t1);
   for i := 1 to reps do
   begin
      if myFlag = 1 then
      begin
         width := '60%';
         height := '60%';
      end
      else
      begin
         width := '80%';
         height := '80%';
      end;
   end;
   QueryPerformanceCounter(t2);
   QueryPerformanceFrequency(freq);

   time := (t2-t1) / freq * 1000000000 / reps;
   ShowMessage(FloatToStr(time)+ 'ns');
end;

procedure Block2(myflag: Integer);
var
   width, height: string;
   i: Integer;
   t1, t2: Int64;
   time: Extended;
   freq: Int64;
begin
   QueryPerformanceCounter(t1);
   for i := 1 to reps do
   begin
      width := '80%';
      height := '80%';
      if myFlag = 1 then
      begin
         width := '60%';
         height := '60%';
      end;
   end;
   QueryPerformanceCounter(t2);
   QueryPerformanceFrequency(freq);

   time := (t2-t1) / freq * 1000000000 / reps;
   ShowMessage(FloatToStr(time)+ 'ns');
end;

Doing twice the amount of work takes roughly twice the amount of time.

Answer: IF does not perform better than IF-ELSE.


enter image description here

Why does the is operator return false when given null?

Asked on Mon, 03 Oct 2011 by Gebb c#
46 votes

It seems to me that the is operator is a bit inconsistent.

bool Test()
{
    // Returns false, but should return true.
    return null is string;
}

One expects that the null value belongs to any reference (or nullable) type. And indeed, the C# language specification says something which supports this hypothesis, for example (6.1.6 Implicit reference conversions):

The implicit reference conversions are:
...
• From the null literal to any reference-type.

The description (7.10.10 The is operator) of the is operator starts by saying that the expression (E is T) will result in true when a reference conversion from E to T exists, but then the authors go on by explicitly excluding the case when E is the null literal or has a null value.

Why do they do that? To me it seems counterintuitive.

You're staring at an empty driveway.

Someone asks you "can your driveway hold a Honda Civic?"

Yes. Yes it can.

Someone points you at a second driveway. It is also empty. They ask "Can the current contents of my driveway fit in your driveway?"

Yes, obviously. Both driveways are empty! So clearly the contents of one can fit in the other, because there are no contents of either in the first place.

Someone asks you "Does your driveway contain a Honda Civic?"

No, it does not.

You're thinking that the is operator answers the second question: given this value, does it fit in a variable of that type? Does a null reference fit into a variable of this type? Yes it does.

That is not the question that the is operator answers. The question that the is operator answers is the third question. y is X does not ask "is y a legal value of a variable of type X?" It asks "Is y a valid reference to an object of type X?" Since a null reference is not a valid reference to any object of any type, the answer is "no". That driveway is empty; it doesn't contain a Honda Civic.

Another way to look at it is that y is X answers the question "if I said y as X, would I get a non-null result? If y is null, clearly the answer is no!


To look a little deeper at your question:

One expects that the null value belongs to any reference (or nullable) type

One would be implicitly assuming that a type is a set of values, and that assignment compatibility of a value y with a variable of type X is nothing more nor less than checking whether y is a member of set x.

Though that is an extremely common way of looking at types, that is not the only way of looking at types, and it is not the way that C# looks at types. Null references are members of no type in C#; assignment compatibility is not merely checking a set to see if it contains a value. Just because a null reference is assignment compatible with a variable of reference type X does not mean that null is a member of type X. The "is assignment compatible with" relation and the "is a member of type" relation obviously have a lot of overlap, but they are not identical in the CLR.

If musings about type theory interest you, check out my recent articles on the subject:

http://blogs.msdn.com/b/ericlippert/archive/2011/08/29/what-is-this-thing-you-call-a-quot-type-quot-part-one.aspx

http://blogs.msdn.com/b/ericlippert/archive/2011/09/07/what-is-this-thing-you-call-a-quot-type-quot-part-two.aspx

What does y -= m < 3 mean?

43 votes

While looking through some example C code, I came across this:

y -= m < 3;

What does this do? It it some kind of condensed for loop or something? It's impossible to google for as far as I know.

m < 3 is either 1 or 0, depending on the truth value.

So y=y-1 when m<3 is true, or y=y-0 when m>=3

What is the name of this bad practice / anti-pattern?

41 votes

I'm trying to explain to my team why this is bad practice, and am looking for an anti-pattern reference to help in my explanation. This is a very large enterprise app, so here's a simple example to illustrate what was implemented:

public void ControlStuff()
    {
        var listOfThings = LoadThings();
        var listOfThingsThatSupportX = new string[] {"ThingA","ThingB", "ThingC"};
        foreach (var thing in listOfThings)
        {
            if(listOfThingsThatSupportX.Contains(thing.Name))
            {
                DoSomething();
            }
        }
    }

I'm suggesting that we add a property to the 'Things' base class to tell us if it supports X, since the Thing subclass will need to implement the functionality in question. Something like this:

public void ControlStuff()
    {
        var listOfThings = LoadThings();
        foreach (var thing in listOfThings)
        {
            if (thing.SupportsX)
            {
                DoSomething();
            }
        }
    }
class ThingBase
{
    public virtual bool SupportsX { get { return false; } }
}
class ThingA : ThingBase
{
    public override bool SupportsX { get { return true; } }
}
class ThingB : ThingBase
{
}

So, it's pretty obvious why the first approach is bad practice, but what's this called? Also, is there a pattern better suited to this problem than the one I'm suggesting?

Normally a better approach (IMHO) would be to use interfaces instead of inheritance

then it is just a matter of checking whether the object has implemented the interface or not.

How to approach number guessing game(with a twist) algorithm?

41 votes

I am learning programming (python and algo’s) and was trying to work on a project that I find interesting. I have created a few basic python scripts but I’m not sure how to approach a solution to a game I am trying to build.

Here’s how the game will work:

users will be given items with a value. For example

Apple = 1
Pears = 2
Oranges  = 3

They will then get a chance to choose any combo of them they like (i.e. 100 apples, 20 pears, and 1 oranges). The only output the computer gets is the total value(in this example, its currently $143). The computer will try to guess what they have. Which obviously it won’t be able to get correctly the first turn.

         Value  quantity(day1)  value(day1)
Apple    1      100             100
Pears    2      20              40
Orange   3      1               3
Total           121             143

The next turn the user can modify their numbers but no more than 5% of the total quantity (or some other percent we may chose. I’ll use 5% for example.). The prices of fruit can change(at random) so the total value may change based on that also(for simplicity I am not changing fruit prices in this example). Using the above example, on day 2 of the game, the user returns a value of $152 and $164 on day 3. Here's an example.

quantity(day2)  %change(day2)   value(day2) quantity(day3)  %change(day3)   value(day3)
104                             104         106                             106
21                              42          23                              46
2                               6           4                               12
127             4.96%           152         133             4.72%           164

*(I hope the tables show up right, I had to manually space them so hopefully its not just doing it on my screen, if it doesn't work let me know and I'll try to upload a screenshot).

I am trying to see if I can figure out what the quantities are over time(assuming the user will have the patience to keep entering numbers). I know right now my only restriction is the total value cannot be more than 5% so I cannot be within 5% accuracy right now so the user will be entering it forever.

What I have done so far

Here’s my solution so far(not much). Basically I take all the values and figure out all the possible combos of them(I am done this part). Then I take all the possible combos and put them in a database as a dictionary(so for example for $143, there could be a dictionary entry {apple:143, Pears:0, Oranges :0}..all the way to {apple:0, Pears:1, Oranges :47}. I do this each time I get a new number so I have a list of all possibilities.

Here’s where I’m stuck. Is using the rules above, how can I figure out the best possible solution? I think I’ll need a fitness function that automatically compares the two days data and removes any possibilities that have more than 5% variance of the previous days data.

Questions:

So my question with user changing the total and me having a list of all the probabilities, how should I approach this? What do I need to learn? Is there any algorithms out there or theories that I can use that are applicable? Or, to help me understand my mistake, can you suggest what rules I can add to make this goal feasible(if its not in its current state. I was thinking adding more fruits and saying they must pick at least 3,etc..)? Also, I only have a vague understanding of genetic algorithms but I thought I could use them here, if is there something I can use?

I'm very very eager to learn so any advice or tips would be greatly appreciated(just please don't tell me this game is impossible).

Thanks in advance.

UPDATE: Getting feedback that this is hard to solve. So I thought I'd add another condition to the game that won't interfere with what the player is doing(game stays the same for them) but everyday the value of the fruits change price(randomly). Would that make it easier to solve? Because within a 5% movement and certain fruit value changes, only a few combo's are probable over time. Day 1, anything is possible and getting a close enough range is almost impossible, but as the prices of fruits change and the user can only choose a 5% change, then shouldn't(over time) the range be narrow and narrow. In the above example, if prices are volatile enough I think I could brute force a solution that gave me a range to guess in, but I'm trying to figure out if there's a more elegant solution or other solutions to keep narrowing this range over time.

UPDATE2: After reading and asking around, I believe this is a hidden markov/Viterbi problem that tracks the changes in fruit prices as well as total sum(weighting the last data point the heaviest). I'm not sure how to apply the relationship though. I think this is the case and could be wrong but at the least I'm starting to suspect this is a some type of machine learning problem.

Update3: I am created a test case(with smaller numbers) and a generator to help automate the user generated data and I am trying to create a graph from it to see what's more likely. Here's the code, along with the total values and comments on what the users actually fruit quantities are.

#!/usr/bin/env python
import itertools

#Fruit price data
fruitPriceDay1 = {'Apple':1,'Pears':2,'Oranges':3}
fruitPriceDay2 = {'Apple':2,'Pears':3,'Oranges':4}
fruitPriceDay3 = {'Apple':2,'Pears':4,'Oranges':5}

#generate possibilities for testing(Warning..will not scale with large numbers)
def possibilityGenerator(target_sum, apple, pears, oranges):
    allDayPossible = {}
    counter = 1
    apple_range = range(0, target_sum + 1, apple)
    pears_range = range(0, target_sum + 1, pears)
    oranges_range = range(0, target_sum + 1, oranges)
    for i, j, k in itertools.product(apple_range, pears_range, oranges_range):
        if i + j + k == target_sum:
            currentPossible = {}
            #print counter
            #print 'Apple', ':', i/apple, ',', 'Pears', ':', j/pears, ',', 'Oranges', ':', k/oranges
            currentPossible['apple'] = i/apple
            currentPossible['pears'] = j/pears
            currentPossible['oranges'] = k/oranges
            #print currentPossible
            allDayPossible[counter] = currentPossible
            counter = counter +1
    return allDayPossible

#total sum being returned by user for value of fruits            
totalSumDay1=26 # computer does not know this but users quantities are apple: 20, pears 3, oranges 0 at the current prices of the day
totalSumDay2=51 # computer does not know this but users quantities are apple: 21, pears 3, oranges 0 at the current prices of the day
totalSumDay3=61 # computer does not know this but users quantities are apple: 20, pears 4, oranges 1 at the current prices of the day
graph = {}
graph['day1'] = possibilityGenerator(totalSumDay1, fruitPriceDay1['Apple'], fruitPriceDay1['Pears'], fruitPriceDay1['Oranges'] )
graph['day2'] = possibilityGenerator(totalSumDay2, fruitPriceDay2['Apple'], fruitPriceDay2['Pears'], fruitPriceDay2['Oranges'] )
graph['day3'] = possibilityGenerator(totalSumDay3, fruitPriceDay3['Apple'], fruitPriceDay3['Pears'], fruitPriceDay3['Oranges'] )
#sample of dict = 1 : {'oranges': 0, 'apple': 0, 'pears': 0}..70 : {'oranges': 8, 'apple': 26, 'pears': 13}
print graph

We'll combine graph-theory and probability:

On the 1st day, build a set of all feasible solutions. Lets denote the solutions set as A1={a1(1), a1(2),...,a1(n)}.

On the second day you can again build the solutions set A2.

Now, for each element in A2, you'll need to check if it can be reached from each element of A1 (given x% tolerance). If so - connect A2(n) to A1(m). If it can't be reached from any node in A1(m) - you can delete this node.

Basically we are building a connected directed acyclic graph.

All paths in the graph are equally likely. You can find an exact solution only when there is a single edge from Am to Am+1 (from a node in Am to a node in Am+1).

Sure, some nodes appear in more paths than other nodes. The probability for each node can be directly deduced based on the number of paths that contains this node.

By assigning a weight to each node, which equals to the number of paths that leads to this node, there is no need to keep all history, but only the previous day.

Also, have a look at non-negative-values linear diphantine equations - A question I asked a while ago. The accepted answer is a great way to enumarte all combos in each step.

C# 4.0 Compiler Crash

39 votes

This code sample is not able to be compiled. Any work arounds out there?

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    using church = Func<dynamic, dynamic, dynamic>;

    class Program
    {
        static void Main(string[] args)
        {
            church True = (a, b) => a;
            church False = (a, b) => b;

            Func<church, church, church> And = (x, y) => x(y(True, False), False);
        }
    }
}

Error 6 Internal Compiler Error (0xc0000005 at address 5476A4CC): likely culprit is 'EMITIL'. An internal error has occurred in the compiler. To work around this problem, try simplifying or changing the program near the locations listed below. Locations at the top of the list are closer to the point at which the internal error occurred. Errors such as this can be reported to Microsoft by using the /errorreport option. TestApplication

I reproduced the crash using VS2010 (WinXP 64).

Two workarounds:

1. don't use the using alias

The following code compiles cleanly on VS2010:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            Func<dynamic, dynamic, dynamic> True = (a, b) => a;
            Func<dynamic, dynamic, dynamic> False = (a, b) => b;

            Func<Func<dynamic, dynamic, dynamic>, 
                 Func<dynamic, dynamic, dynamic>,
                 Func<dynamic, dynamic, dynamic> > And 
                = (x, y) => x(y(True, False), False);
        }
    }
}

2. use the Mono compiler

No problem with Mono 2.10 compiler (dmcs).

[mono] /tmp @ dmcs test.cs
test.cs(18,42): warning CS0219: The variable `And' is assigned but its value is never used
Compilation succeeded - 1 warning(s)
[mono] /tmp @ ./test.exe 
[mono] /tmp @ 

This was tested on linux.

  1. You can run binaries created with mono on Windows .NET.
  2. Mono compiler comes with an installer MSI and runs on Windows as well.

What is normalized UTF-8 all about?

39 votes

The ICU project (which also now has a PHP library) contains the classes needed to help normalize UTF-8 strings to make it easier to compare values when searching.

However, I'm trying to figure out what this means for applications. For example, in which cases do I want "Canonical Equivalence" instead of "Compatibility equivalence", or vis-versa?

Everything You Never Wanted to Know about Unicode Normalization

Canonical Normalization

Unicode includes multiple ways to encode some characters, most notably accented characters. Canonical normalization changes the code points into a canonical encoding form. The resulting code points should appear identical to the original ones barring any bugs in the fonts or rendering engine.

When To Use

Because the results appear identical, it is always safe to apply canonical normalization to a string before storing or displaying it, as long as you can tolerate the result not being bit for bit identical to the input.

Canonical normalization comes in 2 forms: NFD and NFC. The two are equivalent in the sense that one can convert between these two forms without loss. Comparing two strings under NFC will always give the same result as comparing them under NFD.

NFD

NFD has the characters fully expanded out. This is the faster normalization form to calculate, but the results in more code points (i.e. uses more space).

If you just want to compare two strings that are not already normalized, this is the preferred normalization form unless you know you need compatability normalization.

NFC

NFC recombines code points when possible after running the NFD algorithm. This takes a little longer, but results in shorter strings.

Compatibility Normalization

Unicode also includes many characters that really do not belong, but were used in legacy character sets. Unicode added these to allow text in those character sets to be processed as Unicode, and then be converted back without loss.

Compatibility normalization converts these to the corresponding sequence of "real" characters, and also performs canonical normalization. The results of compatibility normalization may not appear identical to the originals.

Characters that include formatting information are replaced with ones that do not. For example the character gets converted to 9. Others don't involve formatting differences. For example the roman numeral character is converted to the regular letters IX.

Obviously, once this transformation has been performed, it is no longer possible to losslessly convert back to the original character set.

When to use

The Unicode Consortium suggests thinking of compatibility normalization like a ToUpperCase transform. It is something that may be useful in some circumstances, but you should not just apply it willy-nilly.

An excellent use case would be a search engine since you would probably want a search for 9 to match .

One thing you should probably not do is display the result of applying compatibility normalization to the user.

NFKC/NFKD

Compatibility normalization form comes in two forms NFKD and NFKC. They have the same relationship as between NFD and C.

Any string in NFKC is inherently also in NFC, and the same for the NFKD and NFD. Thus NFKD(x)=NFD(NFKC(x)), and NFKC(x)=NFC(NFKD(x)), etc.

Conclusion

If in doubt, go with canonical normalization. Choose NFC or NFD based on the space/speed tradeoff applicable, or based on what is required by something you are interoperating with.

Rake "already initialized constant" warning

31 votes

Trying to run rake cucumber:ok and am getting this error:

/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rack-1.3.4/lib/rack/backports/uri/common_192.rb:53: warning: already initialized constant WFKV_

Then:
Command failed with status (1): [/Users/dev/.rbenv/versions/1.9.2-p290/bin...]

I am pretty new to Rails and Google didn't turn anything up for this error.

EDIT: I've tried adding bundle exec and that makes no difference.

Here's what I got with --trace:

/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/file_utils.rb:53:in `block in create_shell_runner'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/file_utils.rb:45:in `call'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/file_utils.rb:45:in `sh'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/file_utils_ext.rb:36:in `sh'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/cucumber-1.1.0/lib/cucumber/rake/task.rb:104:in `run'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/cucumber-1.1.0/lib/cucumber/rake/task.rb:193:in `block in define_task'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:205:in `call'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:205:in `block in execute'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:200:in `each'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:200:in `execute'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:158:in `block in invoke_with_call_chain'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/1.9.1/monitor.rb:201:in `mon_synchronize'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:151:in `invoke_with_call_chain'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/task.rb:144:in `invoke'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:112:in `invoke_task'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `block (2 levels) in top_level'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `each'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:90:in `block in top_level'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:84:in `top_level'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:62:in `block in run'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:129:in `standard_exception_handling'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/lib/rake/application.rb:59:in `run'
/Users/dev/.rbenv/versions/1.9.2-p290/lib/ruby/gems/1.9.1/gems/rake-0.9.2/bin/rake:32:in `<top (required)>'
/Users/dev/.rbenv/versions/1.9.2-p290/bin/rake:19:in `load'
/Users/dev/.rbenv/versions/1.9.2-p290/bin/rake:19:in `<main>'
Tasks: TOP => cucumber:ok

I started having the same problem this evening. It seems to be related to Rack 1.3.4. I fixed it by adding this to my Gemfile:

gem 'rack', '1.3.3'

Then running:

bundle update rack

Incidentally, I tried Bozhidar's suggestion before this, but to no avail.

Bizarre use of conditional operator in Linux

31 votes

In the 3.0.4 Linux kernel, mm/filemap.c has this line of code:

retval = retval ?: desc.error;

I've tried compiling a similar minimal test case with gcc -Wall and don't get any warnings; the behavior seems identical to:

retval = retval ? retval : desc.error;

Looking at the C99 standard, I can't figure out what formally describes this behavior. Why is this OK?

As several others have said, this is a GCC extension, not part of any standard. You'll get a warning for it if you use the -pedantic switch.

The point of this extension is not really visible in this case, but imagine if instead it was

retval = foo() ?: desc.error;

With the extension, foo() is called only once. Without it, you have to introduce a temporary variable to avoid calling foo() twice.

Why is '+' not understood by Python sets?

30 votes

I would like to know why this is valid:

set(range(10)) - set(range(5))

but this is not valid:

set(range(10)) + set(range(5))

Is it because '+' could mean both intersection and union?

Python sets don't have an implementation for the + operator.

You can use | for set union and & for set intersection.

Sets do implement - as set difference. You can also use ^ for symmetric set difference (i.e., it will return a new set with only the objects that appear in one set but do not appear in both sets).