Best questions in June 2012

Why is processing a sorted array faster than an unsorted array?

1088 votes

Here is a piece of code that shows some very peculiar performance. For some strange reason, sorting the data miraculously speeds up the code by almost 6x:

#include <algorithm>
#include <ctime>
#include <iostream>

int main()
{
    // generate data
    const unsigned arraySize = 32768;
    int data[arraySize];

    for (unsigned c = 0; c < arraySize; ++c)
        data[c] = std::rand() % 256;


    // !!! with this, the next loop runs faster
    std::sort(data, data + arraySize);


    // test
    clock_t start = clock();
    long long sum = 0;

    for (unsigned i = 0; i < 100000; ++i)
    {
        // primary loop
        for (unsigned c = 0; c < arraySize; ++c)
        {
            if (data[c] >= 128)
                sum += data[c];
        }
    }

    double elapsedTime = static_cast<double>(clock() - start) / CLOCKS_PER_SEC;

    std::cout << elapsedTime << std::endl;
    std::cout << "sum = " << sum << std::endl;
}
  • Without std::sort(data, data + arraySize);, the code runs in 11.54 seconds.
  • With the sorted data, the code runs in 1.93 seconds.

Initially I thought this might be just a language or compiler anomaly. So I tried it Java...

import java.util.Arrays;
import java.util.Random;

public class Main
{
    public static void main(String[] args)
    {
        // generate data
        int arraySize = 32768;
        int data[] = new int[arraySize];

        Random rnd = new Random(0);
        for (int c = 0; c < arraySize; ++c)
            data[c] = rnd.nextInt() % 256;


        // !!! with this, the next loop runs faster
        Arrays.sort(data);


        // test
        long start = System.nanoTime();
        long sum = 0;

        for (int i = 0; i < 100000; ++i)
        {
            // primary loop
            for (int c = 0; c < arraySize; ++c)
            {
                if (data[c] >= 128)
                    sum += data[c];
            }
        }

        System.out.println((System.nanoTime() - start) / 1000000000.0);
        System.out.println("sum = " + sum);
    }
}

with a similar but less extreme result.


My first thought was that sorting brings the data into cache, but my next thought was how silly that is because the array was just generated.

What is going on? Why is a sorted array faster than an unsorted array? The code is summing up some independent terms, the order should not matter.

You are the victim of branch prediction fail.


What is Branch Prediction?

Consider a railroad junction:

enter image description here Image by Mecanismo, from Wikimedia Commons: http://commons.wikimedia.org/wiki/File:Entroncamento_do_Transpraia.JPG

Now for the sake of argument, suppose this is back in the 1800s - before long distance or radio communication.

You are the operator of a junction and you hear a train coming. You have no idea which way it will go. You stop the train to ask the captain which direction he wants. And then you set the switch appropriately.

Trains are heavy and have a lot of momentum. So they take forever to start up and slow down.

Is there a better way? You guess which direction the train will go!

  • If you guessed right, it continues on.
  • If you guessed wrong, the captain will stop, back up, and yell at you to flip the switch. Then it can restart down the other path.

If you guess right every time, the train will never have to stop.
If you guess wrong too often, the train will spend a lot of time stopping, backing up, and restarting.


Consider an if-statement: At the processor level, it is a branch instruction:

enter image description here

You are a processor and you see a branch. You have no idea which way it will go. What do you do? You halt execution and wait until the previous instructions are complete. Then you continue down the correct path.

Modern processors are complicated and have long pipelines. So they take forever to "warm up" and "slow down".

Is there a better way? You guess which direction the branch will go!

  • If you guessed right, you continue executing.
  • If you guessed wrong, you need to flush the pipeline and roll back to the branch. Then you can restart down the other path.

If you guess right every time, the execution will never have to stop.
If you guess wrong too often, you spend a lot of time stalling, rolling back, and restarting.


This is branch prediction. I admit it's not the best analogy since the train could just signal the direction with a flag. But in computers, the processor doesn't know which direction a branch will go until the last moment.

So how would you strategically guess to minimize the number of times that the train must back up and go down the other path? You look at the past history! If the train goes left 99% of the time, then you guess left. If it alternates, then you alternate your guesses. If it goes one way every 3 times, you guess the same...

In other words, you try to identify a pattern and follow it. This is more or less how branch predictors work.

Most applications have well-behaved branches. So modern branch predictors will typically achieve >90% hit rates. But when faced with unpredictable branches with no recognizable patterns, branch predictors are virtually useless.

Further Reading: http://en.wikipedia.org/wiki/Branch_predictor


As hinted from above, the culprit is this if-statement:

if (data[c] >= 128)
    sum += data[c];

Notice that the data is evenly distributed between 0 and 255. When the data is sorted, roughly the first half of the iterations will not enter the if-statement. After that, they will all enter the if-statement.

This is very friendly to the branch predictor since the branch consecutively goes the same direction many times. Even a simple saturating counter will correctly predict the branch except for the few iterations after it switches direction.

Quick visualization:

T = branch taken
N = branch not taken

data[] = 0, 1, 2, 3, 4, ... 126, 127, 128, 129, 130, ... 250, 251, 252, ...
branch = N  N  N  N  N  ...   N    N    T    T    T  ...   T    T    T  ...

       = NNNNNNNNNNNN ... NNNNNNNTTTTTTTTT ... TTTTTTTTTT  (easy to predict)

However, when the data is completely random, the branch predictor is rendered useless because it can't predict random data. Thus there will probably be around 50% misprediction. (no better than random guessing)

data[] = 226, 185, 125, 158, 198, 144, 217, 79, 202, 118,  14, 150, 177, 182, 133, ...
branch =   T,   T,   N,   T,   T,   T,   T,  N,   T,   N,   N,   T,   T,   T,   N  ...

       = TTNTTTTNTNNTTTN ...   (completely random - hard to predict)

So what can be done?

If the compiler isn't able to optimize the branch into a conditional move, you can try some hacks if you are willing to sacrifice readability for performance.

Replace:

if (data[c] >= 128)
    sum += data[c];

with:

int t = (data[c] - 128) >> 31;
sum += ~t & data[c];

This eliminates the branch and replaces it with some bitwise operations.

(Note that this hack is not strictly equivalent to the original if-statement. But in this case, it's valid for all the input values of data[].)

Benchmarks: Core i7 920 @ 3.5 GHz

C++ - Visual Studio 2010 - x64 Release

//  Branch - Random
seconds = 11.777

//  Branch - Sorted
seconds = 2.352

//  Branchless - Random
seconds = 2.564

//  Branchless - Sorted
seconds = 2.587

Java - Netbeans 7.1.1 JDK 7 - x64

//  Branch - Random
seconds = 10.93293813

//  Branch - Sorted
seconds = 5.643797077

//  Branchless - Random
seconds = 3.113581453

//  Branchless - Sorted
seconds = 3.186068823

Observations:

  • With the Branch: There is a huge difference between the sorted and unsorted data.
  • With the Hack: There is no difference between sorted and unsorted data.
  • In the C++ case, the hack is actually a tad slower than with the branch when the data is sorted.

A general rule of thumb is to avoid data-dependent branching in critical loops. (such as in this example)


Update :

  • GCC 4.6.1 with -O3 or -ftree-vectorize on x64 is able to generate a conditional move. So there is no difference between the sorted and unsorted data - both are fast.

  • VC++ 2010 is unable to generate conditional moves for this branch even under /Ox.

  • Intel Compiler 11 does something miraculous. It interchanges the two loops, thereby hoisting the unpredictable branch to the outer loop. So not only is it immune the mispredictions, it is also twice as fast as whatever VC++ and GCC can generate! In other words, ICC took advantage of the test-loop to defeat the benchmark...

  • If you give the Intel Compiler the branchless code, it just out-right vectorizes it... and is just as fast as with the branch (with the loop interchange).

This goes to show the even the mature modern compilers can vary wildly in their ability to optimize code...

Why are my balls disappearing?

145 votes

Pardon the funny title. I've created a little graphic demo of 200 balls bouncing and colliding, both against the walls and each other. You can see what I have currently here: http://www.exeneva.com/html5/multipleBallsBouncingAndColliding/

The problem is that whenever they collide with each other, they disappear. I'm not sure why. Can someone take a look and help me out?

UPDATE: Apparently the balls array has balls with coordinates of NaN. Below is the code where I push balls to the array. I'm not entirely sure how the coordinates are getting NaN.

// Variables
var numBalls = 200;  // number of balls
var maxSize = 15;
var minSize = 5;
var maxSpeed = maxSize + 5;
var balls = new Array();
var tempBall;
var tempX;
var tempY;
var tempSpeed;
var tempAngle;
var tempRadius;
var tempRadians;
var tempVelocityX;
var tempVelocityY;

// Find spots to place each ball so none start on top of each other
for (var i = 0; i < numBalls; i += 1) {
  tempRadius = 5;
  var placeOK = false;
  while (!placeOK) {
    tempX = tempRadius * 3 + (Math.floor(Math.random() * theCanvas.width) - tempRadius * 3);
    tempY = tempRadius * 3 + (Math.floor(Math.random() * theCanvas.height) - tempRadius * 3);
    tempSpeed = 4;
    tempAngle = Math.floor(Math.random() * 360);
    tempRadians = tempAngle * Math.PI/180;
    tempVelocityX = Math.cos(tempRadians) * tempSpeed;
    tempVelocityY = Math.sin(tempRadians) * tempSpeed;

    tempBall = {
      x: tempX, 
      y: tempY, 
      nextX: tempX, 
      nextY: tempY, 
      radius: tempRadius, 
      speed: tempSpeed,
      angle: tempAngle,
      velocityX: tempVelocityX,
      velocityY: tempVelocityY,
      mass: tempRadius
    };
    placeOK = canStartHere(tempBall);
  }
  balls.push(tempBall);
}

Your error comes from this line initially:

var direction1 = Math.atan2(ball1.velocitY, ball1.velocityX);

You have ball1.velocitY (which is undefined) instead of ball1.velocityY. So Math.atan2 is giving you NaN, and that NaN value is propagating through all your calculations.

This is not the source of your error, but there is something else that you might want to change on these four lines:

ball1.nextX = (ball1.nextX += ball1.velocityX);
ball1.nextY = (ball1.nextY += ball1.velocityY);
ball2.nextX = (ball2.nextX += ball2.velocityX);
ball2.nextY = (ball2.nextY += ball2.velocityY);

You don't need the extra assignments, and can just use the += operator alone:

ball1.nextX += ball1.velocityX;
ball1.nextY += ball1.velocityY;
ball2.nextX += ball2.velocityX;
ball2.nextY += ball2.velocityY;

Is it possible to make a div 50px less than 100% in CSS3?

132 votes

Is it possible to make a div 50px less than 100% in pure CSS? I want the <div> to be only 50px less than 100%. I don't want any JavaScript.

Yes you can. Without using the IE's expression(), you can do that in CSS3 by using calc().

div {
    width: 100%;
    width: -webkit-calc(100% - 50px);
    width: -moz-calc(100% - 50px);
    width: calc(100% - 50px);
}

Demo: http://jsfiddle.net/thirtydot/Nw3yd/66/

This will make your life so much easier. It is currently supported in the 3 main browsers: Firefox, Google Chrome (WebKit), and IE9: http://caniuse.com/calc

MDN: https://developer.mozilla.org/en/CSS/-moz-calc

~x + ~y == ~(x + y) is always false?

117 votes

Does this code always evaluate to false? Both variables are two's complement signed ints.

~x + ~y == ~(x + y)

I feel like there should be some number that satisfies the conditions. I tried testing the numbers between -5000 and 5000 but never achieved equality. Is there a way to set up an equation to find the solutions to the condition?

Will swapping one for the other cause an insidious bug in my program?

Assume for the sake of contradiction that there exists some x and some y (mod 2n) such that

~(x+y) == ~x + ~y

By two's complement*, we know that,

      -x == ~x + 1
<==>  -1 == ~x + x

Noting this result, we have,

      ~(x+y) == ~x + ~y
<==>  ~(x+y) + (x+y) == ~x + ~y + (x+y)
<==>  ~(x+y) + (x+y) == (~x + x) + (~y + y)
<==>  ~(x+y) + (x+y) == -1 + -1
<==>  ~(x+y) + (x+y) == -2
<==>  -1 == -2

Hence, a contradiction. Therefore, ~(x+y) != ~x + ~y for all x and y (mod 2n).


*It is interesting to note that on a machine with one's complement arithmetic, the equality actually holds true for all x and y. This is because under one's complement, ~x = -x. Thus, ~x + ~y == -x + -y == -(x+y) == ~(x+y).

fork() branches more than expected?

94 votes

Consider the following piece of code:

int main()
{
    int i;
    for(i = 0; i < 2; i++)
    {
        fork();
        printf(".");
    }
}

This program output is 8 dots. How can that be possible? Should not there be 6 dots instead?

The fork() primitive often stretches the imagination. Until you get a feel for it, you should trace out on paper what each operation is and account for the number of processes. Don't forget that fork() creates a near-perfect copy of the current process. The most significant difference (for most purposes) is that fork()'s return value differs between parent and child. (Since this code ignores the return value, it makes no difference.)

So, at first, there is one process. That creates a second process, both of which print a dot and loop. On their second iteration, each creates another copy, so there are four processes print a dot, and then exit. So we can easily account for six dots, like you expect.

However, what printf() really does is buffer its output. So the first dot from when there were only two processes does not appear when written. Those dots remain in the buffer—which is duplicated at fork(). It is not until the process is about to exit that the buffered dot appears. Four processes printing a buffered dot, plus the new one gives 8 dots.

If you wanted to avoid that behavior, call fflush(stdout); after printf().

Why does ",,," == Array(4) in Javascript?

92 votes

Boot up your interpreter/console and try the comparison

> ",,," == Array(4)
True

Why? At first I thought maybe since you could think of ",,," as an array of four characters with a '\0' terminating slice, that might be why, but

"..." == Array(4)

Returns "False". So... why? I know it's some idiosyncratic bit of duck typing in Javascript, but just curious what underlines this behavior. Gleaned this from Zed Shaw's excellent presentation here btw: http://vimeo.com/43380467

Thanks,

Nathan

Because the right hand operand is converted to a string and the string representation of Array(4) is ,,,:

> Array(4).toString()
  ",,,"

If you use the array constructor function and pass a number, it sets the length of the array to that number. So you can say you have four empty indexes (same as [,,,]) and the default string representation of arrays is a comma-separated list of its elements:

> ['a','b','c'].toString()
  "a,b,c"

How the comparison works is described in section 11.9.3 of the specification. There you will see (x == y):

8. If Type(x) is either String or Number and Type(y) is Object,
return the result of the comparison x == ToPrimitive(y).

(arrays are objects in JavaScript)

and if you follow the ToPrimitive method you will eventually find that it it calls toString.

Why do people say there is modulo bias when using a random number generator?

78 votes

I have seen this question asked a lot but never seen a true concrete answer to it. So I am going to post one here which will hopefully help people understand why exactly there is "modulo bias" when using a random number generator, like rand() in C++.

So rand() is a pseudo-random number generator which chooses a natural number between 0 and RAND_MAX, which is a constant defined in cstdlib (see this article for a general overview on rand()).

Now what happens if you want to generate a random number between say 0 and 2. For the sake of explanation, lets say RAND_MAX was 10 and I decide that the best way to generate a random number between 0 and 2 is to do rand()%3. Assuming rand() does generate each number between 0 and 10 with equal probability, (this is arguable but for this post I will assume it does), why would rand()%3 not produce the numbers between 0 and 2 with equal probability? When rand() returns 0, 3, 6, or 9, rand()%3 == 0. When rand() returns 1, 4, 7, or 10, rand()%3 == 1. When rand() returns 2, 5, or 8, rand()%3 == 2. Now if we analyze this statistically, we very quickly see that the probability of getting a 0 is 4/11, 1 is 4/11 but 2 is 3/11. This does not generate the numbers between 0 and 2 with equal probability. Of course for small ranges this might not be the biggest issue but for a larger range this could skew the distribution, biasing the smaller numbers.

So when does rand()%n return a range of numbers from 0 to n-1 with equal probability? When RAND_MAX%n == n - 1. In this case, along with our earlier assumption rand() does return a number between 0 and RAND_MAX with equal probability, the modulo classes of n would also be equally distributed.

So how do we solve this problem? One way is to keep generating random numbers till you get a number in your desired range:

int x; 
do
{
    x = rand();
} while (x >= n);

Hope that helps everyone!

Works cited and further reading:

Why does ("foo" === new String("foo")) evaluate to false in JavaScript?

73 votes

I was going to start using === all the time when comparing string values, but now I find that

"foo" === new String("foo")

is false, and same with this:

var f = "foo", g = new String("foo");
f === g; // false

Of course:

f == g; // true

So is it recommended to always use == for string comparison, or always convert variables to strings before comparing?

"foo" is a string primitive. (this concept does not exist in C# or Java)

new String("foo") is boxed string object.

The === operator behaves differently on primitives and objects.
When comparing primitives (of the same type), === will return true if they both have the same value.

When comparing objects, === will return true only if they refer to the same object (comparing by reference). Thus, new String("a") !== new String("a").

In your case, === returns false because the operands are of different types (one is a primitive and the other is an object).


Primitives are not objects at all.
The typeof operator will not return "object" for primitives.

When you try to access a property of a primitive (using it as an object), the Javascript language will box it to an object, creating a new object every time. This is described in the specification.

This is why you cannot put properties on primitives:

var x = "a";
x.property = 2;
alert(x.property) //undefined

Each time you write x.property, a different boxed String object is created.

When should null values of Boolean be used?

58 votes

Java boolean allows values of true and false while Boolean allows true, false, and null. I have started to convert my booleans to Booleans. This can cause crashes in tests such as

Boolean set = null;
...
if (set) ...

while the test

if (set != null && set) ...

seems contrived and error-prone.

When, if ever, is it useful to use Booleans with null values? If never, then what are the main advantages of the wrapped object?

UPDATE: There has been such a lot of valuable answers that I have summarised some of it in my own answer. I am at best an intermediate in Java so I have tried to show the things that I find useful. Note that the question is "incorrectly phrased" (Boolean cannot "have a null value") but I have left it in case others have the same misconception

Use boolean rather than Boolean every time you can. This will avoid many NullPointerExceptions and make your code more robust.

Boolean is useful, for example

  • to store booleans in a collection (List, Map, etc.)
  • to represent a nullable boolean (coming from a nullable boolean column in a database, for example). The null value might mean "we don't know if it's true or false" in this context.
  • each time a method needs an Object as argument, and you need to pass a boolean value. For example, when using reflection or methods like MessageFormat.format().

Benefit of Polymorphism

51 votes

When I started to look for the benefits of polymorphism, I found with this question here. But here I was unable to find my answer. Let me tell what I want to find. Here I have some classes:

class CoolingMachines{
    public void startMachine(){
        //No implementationion
    }
    public void stopMachine(){
        //No implementationion
    }
}

class Refrigerator extends CoolingMachines{
    public void startMachine(){
        System.out.println("Refrigerator Starts");
    }
    public void stopMachine(){
        System.out.println("Refrigerator Stop");
    }
    public void trip(){
        System.out.println("Refrigerator Trip");
    }
}

class AirConditioner extends CoolingMachines{
    public void startMachine(){
        System.out.println("AC Starts");
    }
    public void stopMachine(){
        System.out.println("AC Stop");
    }
}

public class PolymorphismDemo {
    CoolingMachines cm = new Refrigerator();
    Refrigerator rf = new Refrigerator();
}

Now here I created two objects in the Demo class and are references of Refrigerator. I have completely understood that from the rf object I am able to call the trip() method of Refrigerator, but that method will be hidden for the cm object. Now my question is why should I use polymorphism or why should I use

CoolingMachines cm = new Refrigerator();

when I am OK with

Refrigerator rf = new Refrigerator();

Is polymorphic object's efficiency is good or light in weight? What is the basic purpose and difference between both of these objects? Is there any difference between cm.start(); and rf.start()?

It is useful when you handle lists... A short example:

List<CoolingMachines> coolingMachines = ... // a list of CoolingMachines 
for (CoolingMachine current : coolingMachines) {
    current.start();
}

Or when you want to allow a method to work with any subclass of CoolingMachines

Benefits of pure function

51 votes

Today i was reading about pure function, got confused with its use:

A function is said to be pure if it returns same set of values for same set of inputs and does not have any observable side effects.

e.g. strlen() is a pure function while rand() is an impure one.

__attribute__ ((pure)) int fun(int i)
{
    return i*i;
}

int main()
{
    int i=10;
    printf("%d",fun(i));//outputs 100
    return 0;
}

http://ideone.com/33XJU

The above program behaves in the same way as in the absence of pure declaration.

What are the benefits of declaring a function as pure[if there is no change in output]?

pure lets the compiler know that it can make certain optimisations about the function: imagine a bit of code like

for (int i = 0; i < 1000; i++)
{
    printf("%d", fun(10));
}

With a pure function, the compiler can know that it needs to evaluate fun(10) once and once only, rather than 1000 times. For a complex function, that's a big win.

C# - Is it possible to save a Type (using "typeof()") in an enum?

47 votes

So I'm creating a game in XNA, C# 4.0, and I need to manage a lot of PowerUps (which in code are all inherited from class "PowerUp"), and to handle back-end management of the PowerUps I currently have an enum, PowerupEffectType, with a value for each child class of PowerUp. Eventually in the code I need to do conversions from PowerupEffectType to the Powerup type (of class Type, achieved usually with typeof([class name])).

Since this is a group project, I want to marry each value of PowerupEffectType to its corresponding class Type as well as possible, i.e.: Not just expect my other programmers to use switch statements to do the conversion manually, and making sure additions/expansions later involve as few changes in as few places as possible. I have a few options for this, and the best I've discovered so far is creating enum pseudo-methods that condense everything down to a single switch statement (99% of what I want), thanks to some tips I found here: http://msdn.microsoft.com/en-us/library/bb383974.aspx

But I'm trying to take it one step further - can I save a Type in an enum? I know you can save enums as a specific type (link: http://msdn.microsoft.com/en-us/library/cc138362.aspx), but Type isn't one of them. The current choices are byte, sbyte, short, ushort, int, uint, long, and ulong. Is there any feasible way to save a convert a Type to any of the above data types and back?

Just to be clear, this is what I WISH I could do, and I'm looking for a way to do:

// (Assuming 'LightningPowerup', 'FirePowerup', and 'WaterPowerup' are
// all declared classes that inherit from a single base class)

public enum PowerupEffectType
{
    LIGHTNING = typeof(LightningPowerup),
    FIRE = typeof(FirePowerup),
    WATER = typeof(WaterPowerup)
}

Is there any way to do this, or am I just overcomplicating a solution to a problem that's already 99% complete?

Thanks in advance!

You can't do it as the value of an enum, but you could specify it in an attribute:

using System;
using System.Runtime.CompilerServices;

[AttributeUsage(AttributeTargets.Field)]
public class EffectTypeAttribute : Attribute
{
    public Type Type { get; private set; }

    public EffectTypeAttribute(Type type)
    {
        this.Type = type;
    }
}

public class LightningPowerup {}
public class FirePowerup {}
public class WaterPowerup {}

public enum PowerupEffectType
{
    [EffectType(typeof(LightningPowerup))]
    Lightning,
    [EffectType(typeof(FirePowerup))]
    Fire,
    [EffectType(typeof(WaterPowerup))]
    Water
}

You can then extract those attribute values at execution time with reflection. However, I would personally just create a dictionary:

private static Dictionary<PowerupEffectType, Type> EffectTypeMapping =
    new Dictionary<PowerupEffectType, Type>
{
    { PowerupEffectType.Lightning, typeof(LightningPowerup) },
    { PowerupEffectType.Fire, typeof(FirePowerup) },
    { PowerupEffectType.Water, typeof(WaterPowerup) }
};

No need for a special attribute, no need to extract the values with fiddly reflection code.

Free memory allocated in a different function?

45 votes

I'm trying to learn C and I'm currently trying to write a basic stack data structure, but I can't seem to get basic malloc/free right.

Here's the code I've been using (I'm just posting a small part here to illustrate a specific problem, not the total code, but the error message was generated just by running this example code in valgrind)

#include <stdio.h>
#include <stdlib.h>

typedef struct Entry {
    struct Entry *previous;
    int value;
} Entry;

void destroyEntry(Entry entry);

int main(int argc, char *argv[])
{
    Entry* apple;
    apple = malloc(sizeof(Entry));
    destroyEntry(*(apple));
    return 0;
}

void destroyEntry(Entry entry)
{
    Entry *entry_ptr = &entry;
    free(entry_ptr);
    return;
}

When I run it through valgrind with --leak-check=full --track-origins=yes, I get the following error:

==20674== Invalid free() / delete / delete[] / realloc()
==20674==    at 0x4028E58: free (vg_replace_malloc.c:427)
==20674==    by 0x80485B2: destroyEntry (testing.c:53)
==20674==    by 0x8048477: main (testing.c:26)
==20674==  Address 0xbecc0070 is on thread 1's stack

I think this error means that the destroyEntry function is not allowed to modify memory allocated explicitly in main. Is that right? Why can't I just free the memory I allocated in main in another function? (and is this behavior somehow specific to main?)

Whenever you pass a parameter to a function, a copy is made, and the function works on that copy. So in your case, you are trying to free a copy of the original object, which doesn't make any sense.

You should modify your function to take a pointer, and then you can have it call free directly on that pointer.

Case sensitivity of Java class names

41 votes

If one writes two public Java classes with the same case-insensitive name in different directories then both classes are not usable at runtime. (I tested this on Windows, Mac and Linux with several with several versions of the HotSpot JVM. I would not be surprised if there other JVMs where they are usable simultaneously.) For example, if I create a class named a and one named A like so:

// lowercase/src/testcase/a.java
package testcase;
public class a {
    public static String myCase() {
        return "lower";
    }
}

// uppercase/src/testcase/A.java 
package testcase;
public class A {
    public static String myCase() {
        return "upper";
    }
}

Three eclipse projects containing the code above are available from my website.

If try I calling myCase on both classes like so:

System.out.println(A.myCase());
System.out.println(a.myCase());

The typechecker succeeds, but when I run the class file generate by the code directly above I get:

Exception in thread "main" java.lang.NoClassDefFoundError: testcase/A (wrong name: testcase/a)

In Java, names are in general case sensitive. Some file systems (e.g. Windows) are case insensitive, so I'm not surprised the above behavior happens, but it seems wrong. Unfortunately the Java specifications are oddly non-commital about which classes are visible. The Java Language Specification (JLS), Java SE 7 Edition (Section 6.6.1, page 166) says:

If a class or interface type is declared public, then it may be accessed by any code, provided that the compilation unit (ยง7.3) in which it is declared is observable.

In Section 7.3, the JLS defines observability of a compilation unit in extremely vague terms:

All the compilation units of the predefined package java and its subpackages lang and io are always observable. For all other packages, the host system determines which compilation units are observable.

The Java Virtual Machine Specification is similarly vague (Section 5.3.1):

The following steps are used to load and thereby create the nonarray class or interface C denoted by [binary name] N using the bootstrap class loader [...] Otherwise, the Java virtual machine passes the argument N to an invocation of a method on the bootstrap class loader to search for a purported representation of C in a platform-dependent manner.

All of this leads to four questions in descending order of importance:

  1. Are there any guarantees about which classes are loadable by the default class loader(s) in every JVM? In other words, can I implement a valid, but degenerate JVM, that won't load any classes except those in java.lang and java.io?
  2. If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
  3. Is there any way to make HotSpot load a and A simultaneously? Would writing a custom class loader work?

  • Are there any guarantees about which classes are loadable by the bootstrap class loader in every JVM?

The core bits and pieces of the language, plus supporting implementation classes. Not guaranteed to include any class that you write. (The normal JVM loads your classes in a separate classloader from the bootstrap one, and in fact the normal bootstrap loader loads its classes out of a JAR normally, as this makes for more efficient deployment than a big old directory structure full of classes.)

  • If there are any guarantees, does the behavior in the example above violate the guarantee (i.e. is the behavior a bug)?
  • Is there any way to make "standard" JVMs load a and A simultaneously? Would writing a custom class loader work?

Java loads classes by mapping the full name of the class into a filename that is then searched for on the classpath. Thus testcase.a goes to testcase/a.class and testcase.A goes to testcase/A.class. Some filesystems mix these things up, and may serve the other up when one is asked for. Others get it right (in particular, the variant of the ZIP format used in JAR files is fully case-sensitive and portable). There is nothing that Java can do about this (though an IDE could handle it for you by keeping the .class files away from the native FS, I don't know if any actually do and the JDK's javac most certainly isn't that smart).

However that's not the only point to note here: class files know internally what class they are talking about. The absence of the expected class from the file just means that the load fails, leading to the NoClassDefFoundError you received. What you got was a problem (a mis-deployment in at least some sense) that was detected and dealt with robustly. Theoretically, you could build a classloader that could handle such things by keeping searching, but why bother? Putting the class files inside a JAR will fix things far more robustly; those are handled correctly.

More generally, if you're running into this problem for real a lot, take to doing production builds on a Unix with a case-sensitive filesystem (a CI system like Jenkins is recommended) and find which developers are naming classes with just case differences and make them stop as it is very confusing!

Including header files in C/C++ just once

37 votes

Is it ever useful to include a header file more than once in C or C++?

If the mechanism is never used, why would the compiler ever worry about including a file twice; if it really were useless, wouldn't it be more convenient if newer compilers made sure every header is included only once?

Edit:

I understand that there are standard ways of doing things like include guards and pragma once, but why should you have to specify even that? Shouldn't it be the default behavior of the compiler to include files only once?

Yes, it's useful when generating code with the preprocessor, or doing tricks like Boost.PP does.

For an example, see X Macros. The basic idea is that the file contains the body of the macro and you #define the arguments and then #include it. Here's a contrived example:

macro.xpp

std::cout << MESSAGE;
#undef MESSAGE

file.cpp:

int main() {
# define MESSAGE "hello world"
# include "macro.xpp"
}

This also allows you to use #if and friends on the arguments, something that normal macros can't do.

Java Synchronization Not Working as Expected

34 votes

I have a "simple" 4 class example that reliably shows unexpected behavior from java synchronization on multiple machines. As you can read below, given the contract of the java sychronized keyword, Broke Synchronization should never be printed from the class TestBuffer.

Here are the 4 classes that will reproduce the issue (at least for me). I'm not interested in how to fix this broken example, but rather why it breaks in the first place.

Sync Issue - Controller.java

Sync Issue - SyncTest.java

Sync Issue - TestBuffer.java

Sync Issue - Tuple3f.java

And here is the output I get when I run it:

java -cp . SyncTest
Before Adding
Creating a TestBuffer
Before Remove
Broke Synchronization
1365192
Broke Synchronization
1365193
Broke Synchronization
1365194
Broke Synchronization
1365195
Broke Synchronization
1365196
Done

UPDATE: @Gray has the simplest example that breaks thus far. His example can be found here: Strange JRC Race Condition

Based on the feedback I've gotten from others, it looks like the issue may occur on Java 64-bit 1.6.0_20-1.6.0_31 (unsure about newer 1.6.0's) on Windows and OSX. Nobody has been able to reproduce the issue on Java 7. It may also require a multi-core machine to reproduce the issue.

ORIGINAL QUESTION:

I have a class which provides the following methods:

  • remove - Removes the given item from the list
  • getBuffer - Iterates over all the items in the list

I've reduced the problem down to the 2 functions below, both of which are in the same object and they're both synchronized. Unless I am mistaken, "Broke Synchronization" should never be printed because insideGetBuffer should always be set back to false before remove can be entered. However, in my application it is printing "Broke Synchronization" when I have 1 thread calling remove repeatedly while the other calls getBuffer repeatedly. The symptom is that I get a ConcurrentModificationException.

See Also:

Very strange race condition which looks like a JRE issue

I think you are indeed looking at a JVM bug in the OSR. Using the simplified program from @Gray (slight modifications to print an error message) and some options to mess with/print JIT compilation, you can see what is going on with the JIT. And, you can use some options to control that to a degree that can suppress the issue, which lends a lot of evidence to this being a JVM bug.

Running as:

java -XX:+PrintCompilation -XX:CompileThreshold=10000 phil.StrangeRaceConditionTest

you can get an error condition (like others about 80% of the runs) and the compilation print somewhat like:

 68   1       java.lang.String::hashCode (64 bytes)
 97   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
104   3       java.math.BigInteger::mulAdd (81 bytes)
106   4       java.math.BigInteger::multiplyToLen (219 bytes)
111   5       java.math.BigInteger::addOne (77 bytes)
113   6       java.math.BigInteger::squareToLen (172 bytes)
114   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
116   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
121   8       java.math.BigInteger::montReduce (99 bytes)
126   9       sun.security.provider.SHA::implCompress (491 bytes)
138  10       java.lang.String::charAt (33 bytes)
139  11       java.util.ArrayList::ensureCapacity (58 bytes)
139  12       java.util.ArrayList::add (29 bytes)
139   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
158  13       java.util.HashMap::indexFor (6 bytes)
159  14       java.util.HashMap::hash (23 bytes)
159  15       java.util.HashMap::get (79 bytes)
159  16       java.lang.Integer::valueOf (32 bytes)
168  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
168  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
171  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
172   3%      phil.StrangeRaceConditionTest::strangeRaceConditionTest @ 36 (76 bytes)
ERRORS //my little change
219  15      made not entrant  java.util.HashMap::get (79 bytes)

There are three OSR replacements (the ones with the % annotation on the compile ID). My guess is that it is the third one, which is the loop calling remove(), that is responsible for the error. This can be excluded from JIT via a .hotspot_compiler file located in the working directory with the following contents:

exclude phil/StrangeRaceConditionTest strangeRaceConditionTest

When you run the program again, you get this output:

CompilerOracle: exclude phil/StrangeRaceConditionTest.strangeRaceConditionTest
 73   1       java.lang.String::hashCode (64 bytes)
104   2       sun.nio.cs.UTF_8$Decoder::decodeArrayLoop (553 bytes)
110   3       java.math.BigInteger::mulAdd (81 bytes)
113   4       java.math.BigInteger::multiplyToLen (219 bytes)
118   5       java.math.BigInteger::addOne (77 bytes)
120   6       java.math.BigInteger::squareToLen (172 bytes)
121   7       java.math.BigInteger::primitiveLeftShift (79 bytes)
123   1%      java.math.BigInteger::multiplyToLen @ 138 (219 bytes)
128   8       java.math.BigInteger::montReduce (99 bytes)
133   9       sun.security.provider.SHA::implCompress (491 bytes)
145  10       java.lang.String::charAt (33 bytes)
145  11       java.util.ArrayList::ensureCapacity (58 bytes)
146  12       java.util.ArrayList::add (29 bytes)
146   2%      phil.StrangeRaceConditionTest$Buffer::<init> @ 38 (62 bytes)
165  13       java.util.HashMap::indexFor (6 bytes)
165  14       java.util.HashMap::hash (23 bytes)
165  15       java.util.HashMap::get (79 bytes)
166  16       java.lang.Integer::valueOf (32 bytes)
174  17 s     phil.StrangeRaceConditionTest::getBuffer (66 bytes)
174  18 s     phil.StrangeRaceConditionTest::remove (10 bytes)
### Excluding compile: phil.StrangeRaceConditionTest::strangeRaceConditionTest
177  19 s     phil.StrangeRaceConditionTest$Buffer::remove (34 bytes)
324  15      made not entrant  java.util.HashMap::get (79 bytes)

and the problem does not appear (at least not in the repeated attempts that I've made).

Also, if you change the JVM options a bit, you can cause the problem to go away. Using either of the following I cannot get the problem to appear.

java -XX:+PrintCompilation -XX:CompileThreshold=100000 phil.StrangeRaceConditionTest
java -XX:+PrintCompilation -XX:FreqInlineSize=1 phil.StrangeRaceConditionTest

Interestingly, the compilation output for both of these still show the OSR for the remove loop. My guess (and it is a big one) is that delaying the JIT via the compilation threshold or changing the FreqInlineSize cause changes to the OSR processing in these cases that bypass a bug that you are otherwise hitting.

See here for info on the JVM options.

See here and here for information on the output of -XX:+PrintCompilation and how to mess with what the JIT does.

Difference between java.util.Random and java.security.SecureRandom

33 votes

My team got handed over some server side code (in Java) that generates random tokens and I have a question regarding the same -

The purpose of these tokens is fairly sensitive - used for session id, password reset links etc. So they do need to be cryptographically random to avoid somebody guessing them or brute force them feasibly. The token is a "long" so it is 64 bits long.

The code currently uses the java.util.Random class to generate these tokens. The documentation (http://docs.oracle.com/javase/7/docs/api/java/util/Random.html) for java.util.Random clearly states the following:

Instances of java.util.Random are not cryptographically secure. Consider instead using SecureRandom to get a cryptographically secure pseudo-random number generator for use by security-sensitive applications.

However, the way the code is currently using java.util.Random is this - It instantiates the java.security.SecureRandom class and then uses the SecureRandom.nextLong() method to obtain the seed that is used for instantiating the java.util.Randomclass. Then it uses java.util.Random.nextLong() method to generate the token.

So my question now - Is it still insecure given that the java.util.Random is being seeded using java.security.SecureRandom? Do I need to modify the code so that it uses java.security.SecureRandom exclusively to generate the tokens?

Regards

EDIT: Re @Tom's question - Currently the code seed's the Random once at startup

The standard Oracle JDK 7 implementation uses what's called a Linear Congruential Generator to produce random values in java.util.Random.

Taken from java.util.Random source code (JDK 7u2), from a comment on the method protected int next(int bits), which is the one that generates the random values:

This is a linear congruential pseudorandom number generator, as defined by D. H. Lehmer and described by Donald E. Knuth in The Art of Computer Programming, Volume 3: Seminumerical Algorithms, section 3.2.1.

Predictability of Linear Congruential Generators

Hugo Krawczyk wrote a pretty good paper about how these LCGs can be predicted ("How to predict congruential generators"). If you're lucky and interested, you may still find a free, downloadable version of it on the web. And there's plenty more research that clearly shows that you should never use an LCG for security-critical purposes. This also means that your random numbers are predictable right now, something you don't want for session IDs and the like.

How to break a Linear Congruential Generator

The assumption that an attacker would have to wait for the LCG to repeat after a full cycle is wrong. Even with an optimal cycle (the modulus m in its recurrence relation) it is very easy to predict future values in much less time than a full cycle. After all, it's just a bunch of modular equations that need to be solved, which becomes easy as soon as you have observed enough output values of the LCG.

The security doesn't improve with a "better" seed. It simply doesn't matter if you seed with a random value generated by SecureRandom or even produce the value by rolling a dice several times.

An attacker will simply compute the seed from the output values observed. This takes significantly less time than 2^48 in the case of java.util.Random. Disbelievers may try out this experiment, where it is shown that you can predict future Random outputs observing only two(!) output values in time roughly 2^16. It takes not even a second on a modern computer to predict the output of your random numbers right now.

Conclusion

Replace your current code. Use SecureRandom exclusively. Then at least you will have a little guarantee that the result will be hard to predict. If you want the properties of a cryptographically secure PRNG (in your case, that's what you want), then you have to go with SecureRandom only. Being clever about changing the way it was supposed to be used will almost always result in something less secure...

How can I specialize a C++ template for a range of values?

32 votes

Is there a way to have a template specialization based on a range of values instead of just one? I know the following code is not valid C++ code but it shows what I would like to do. I'm writing code for a 8-bit machine, so there is a difference in speed for using ints and chars.

template<unsigned SIZE>
class circular_buffer {
   unsigned char buffer[SIZE];
   unsigned int head; // index
   unsigned int tail; // index
};

template<unsigned SIZE <= 256>
class circular_buffer {
   unsigned char buffer[SIZE];
   unsigned char head; // index
   unsigned char tail; // index
};

Try std::conditional:

#include <type_traits>

template<unsigned SIZE>
class circular_buffer {

    typedef typename
        std::conditional< SIZE < 256,
                          unsigned char,
                          unsigned int
                        >::type
        index_type;

    unsigned char buffer[SIZE];
    index_type head;
    index_type tail;
};

If your compiler doesn't yet support this part of C++11, there's equivalent in boost libraries.

Then again, it's easy to roll your own (credit goes to KerrekSB):

template <bool, typename T, typename F>
struct conditional {
    typedef T type;
};

template <typename T, typename F>  // partial specialization on first argument
struct conditional<false, T, F> {
    typedef F type;
}; 

Does moving a vector invalidate iterators?

31 votes

If I have an iterator to a vector, then I move-construct or move-assign another vector from that vector, does that iterator still point to a valid element in the new vector? Here's a simple example:

#include <vector>
#include <iostream>

int main(int argc, char *argv[])
{
    std::vector<int>::iterator a_iter;
    std::vector<int> b;
    {
        std::vector<int> a{1, 2, 3, 4, 5};
        a_iter = a.begin() + 2;
        b = std::move(a);
    }
    std::cout << *a_iter << std::endl;
    return 0;
}

Is a_iter still valid since a has been moved into b, or is the iterator invalidated by the move? For reference, std::vector::swap does not invalidate iterators.

While it might be reasonable to assume that iterators are still valid after a move, I don't think the Standard actually guarantees this. Therefore, the iterators are in an undefined state after the move.


There is no reference I can find in the Standard which specifically states that iterators that existed before a move are still valid after the move.

On the surface, it would seem to be perfectly reasonable to assume that an iterator is typically implemented as pointers in to the controlled sequence. If that's the case, then the iterators would still be valid after the move.

But the implementation of an iterator is implementation-defined. Meaning, so long as the iterator on a particular platform meets the requirements set forth by the Standard, it can be implemented in any way whatsoever. It could, in theory, be implemented as a combination of a pointer back to the vector class along with an index. If that's the case, then the iterators would become invalid after the move.

Whether or not an iterator is actually implemented this way is irrelevant. It could be implemented this way, so without a specific guarantee from the Standard that post-move iterators are still valid, you cannot assume that they are. Bear in mind also that there is such a guarantee for iterators after a swap. This was specifically clarified from the previous Standard. Perhaps it was simply an oversight of the Std comittee to not make a similar clarification for iterators after a move, but in any case there is no such guarantee.

Therefore, the long and the short of it is you can't assume your iterators are still good after a move.

EDIT:

23.2.1/11 in Draft n3242 states that:

Unless otherwise specified (either explicitly or by defining a function in terms of other functions), invoking a container member function or passing a container as an argument to a library function shall not invalidate iterators to, or change the values of, objects within that container.

This might lead one to conclude that the iterators are valid after a move, but I disagree. In your example code, a_iter was an iterator in to the vector a. After the move, that container, a has certainly been changed. My conclusion is the above clause does not apply in this case.

Regex with -, ::, ( and )

28 votes

I need to split the string

(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)

into

string[0] = (age-is-25::OR::last_name-is-qa6)

string[1] = AND

string[2] = (age-is-20::OR::first_name-contains-test)

I tried writing so many regex expressions, but nothing works as expected.

Using the following regex, Matcher.groupCount() which returns 2 but assigning results to an arraylist returns null as the elements.

Pattern pattern = Pattern.compile("(\\)::)?|(::\\()?");

I tried to split it using ):: or ::(.

I know the regex looks too stupid, but being a beginner this is the best I could write.

You can use positive lookahead and lookbehind to match the first and last parentheses.

String str = "(age-is-25::OR::last_name-is-qa6)::AND::(age-is-20::OR::first_name-contains-test)";

for (String s : str.split("(?<=\\))::|::(?=\\()"))
    System.out.println(s);

Outputs:

(age-is-25::OR::last_name-is-qa6)
AND
(age-is-20::OR::first_name-contains-test)

Just a note however: It seems like you are parsing some kind of recursive language. Regular expressions are not good at doing this. If you are doing advanced parsing I would recommend you to look at other parsing methods.