Best questions in May 2013

\d is less efficient than [0-9]

513 votes

I made a comment yesterday on an answer where someone had used [0123456789] in a regular expression rather than [0-9] or \d. I said it was probably more efficient to use a range or digit specifier than a character set.

I decided to test that out today and found out to my surprise that (in the C# regex engine at least) \d appears to be less efficient than either of the other two which don't seem to differ much. Here is my test output over 10000 random strings of 1000 random characters with 5077 actually containing a digit:

Regular expression \d           took 00:00:00.2141226 result: 5077/10000
Regular expression [0-9]        took 00:00:00.1357972 result: 5077/10000  63.42 % of first
Regular expression [0123456789] took 00:00:00.1388997 result: 5077/10000  64.87 % of first

It's a surprise to me for two reasons:

  1. I would have thought the range would be implemented much more efficiently than the set.
  2. I can't understand why \d is worse than [0-9]. Is there more to \d than simply shorthand for [0-9]?

Here is the test code:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
using System.Text.RegularExpressions;

namespace SO_RegexPerformance
{
    class Program
    {
        static void Main(string[] args)
        {
            var rand = new Random(1234);
            var strings = new List<string>();
            //10K random strings
            for (var i = 0; i < 10000; i++)
            {
                //Generate random string
                var sb = new StringBuilder();
                for (var c = 0; c < 1000; c++)
                {
                    //Add a-z randomly
                    sb.Append((char)('a' + rand.Next(26)));
                }
                //In roughly 50% of them, put a digit
                if (rand.Next(2) == 0)
                {
                    //Replace one character with a digit, 0-9
                    sb[rand.Next(sb.Length)] = (char)('0' + rand.Next(10));
                }
                strings.Add(sb.ToString());
            }

            var baseTime = testPerfomance(strings, @"\d");
            Console.WriteLine();
            var testTime = testPerfomance(strings, "[0-9]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
            testTime = testPerfomance(strings, "[0123456789]");
            Console.WriteLine("  {0:P2} of first", testTime.TotalMilliseconds / baseTime.TotalMilliseconds);
        }

        private static TimeSpan testPerfomance(List<string> strings, string regex)
        {
            var sw = new Stopwatch();

            int successes = 0;

            var rex = new Regex(regex);

            sw.Start();
            foreach (var str in strings)
            {
                if (rex.Match(str).Success)
                {
                    successes++;
                }
            }
            sw.Stop();

            Console.Write("Regex {0,-12} took {1} result: {2}/{3}", regex, sw.Elapsed, successes, strings.Count);

            return sw.Elapsed;
        }
    }
}

\d checks all Unicode digits, while [0-9] is limited to these 10 characters. For example, Persian digits, ۱۲۳۴۵۶۷۸۹, are an example of Unicode digits which are matched with \d, but not [0-9].

You can generate a list of all such characters using the following code:

var sb = new StringBuilder();
for(UInt16 i = 0; i < UInt16.MaxValue; i++)
{
    string str = Convert.ToChar(i).ToString();
    if (Regex.IsMatch(str, @"\d"))
        sb.Append(str);
}
Console.WriteLine(sb.ToString());

Which generates:

0123456789٠١٢٣٤٥٦٧٨٩۰۱۲۳۴۵۶۷۸۹߀߁߂߃߄߅߆߇߈߉०१२३४५६७८९০১২৩৪৫৬৭৮৯੦੧੨੩੪੫੬੭੮੯૦૧૨૩૪૫૬૭૮૯୦୧୨୩୪୫୬୭୮୯௦௧௨௩௪௫௬௭௮௯౦౧౨౩౪౫౬౭౮౯೦೧೨೩೪೫೬೭೮೯൦൧൨൩൪൫൬൭൮൯๐๑๒๓๔๕๖๗๘๙໐໑໒໓໔໕໖໗໘໙༠༡༢༣༤༥༦༧༨༩၀၁၂၃၄၅၆၇၈၉႐႑႒႓႔႕႖႗႘႙០១២៣៤៥៦៧៨៩᠐᠑᠒᠓᠔᠕᠖᠗᠘᠙᥆᥇᥈᥉᥊᥋᥌᥍᥎᥏᧐᧑᧒᧓᧔᧕᧖᧗᧘᧙᭐᭑᭒᭓᭔᭕᭖᭗᭘᭙᮰᮱᮲᮳᮴᮵᮶᮷᮸᮹᱀᱁᱂᱃᱄᱅᱆᱇᱈᱉᱐᱑᱒᱓᱔᱕᱖᱗᱘᱙꘠꘡꘢꘣꘤꘥꘦꘧꘨꘩꣐꣑꣒꣓꣔꣕꣖꣗꣘꣙꤀꤁꤂꤃꤄꤅꤆꤇꤈꤉꩐꩑꩒꩓꩔꩕꩖꩗꩘꩙0123456789

What is "cache-friendly" code?

157 votes

Could someone possibly give an example of "cache unfriendly code" and the "cache friendly" version of that code?

How can I make sure I write cache-efficient code?

Preliminaries

Modern computer architectures feature complex memory hierarchies: registers, typically several levels of cache within the CPU chip (L1, L2, L3, instruction caches, ...), RAM, HDDs (armed with their own caches) and so forth. The basic mantra is: fast memory is expensive. This is the core reason for the advanced caching we see today. Caching is one of the main methods to reduce the impact of latency (cfr. the Herb Sutter talk I linked at the start). To paraphrase Herb Sutter (cfr. links below): increasing bandwidth is easy, but we can't buy our way out of latency.

Data is always retrieved through the memory hierarchy (smallest == fastest to slowest). A cache hit/miss usually refers to a hit/miss in the highest level of cache in the CPU -- by highest level I mean the largest == slowest. The cache hit rate is crucial for performance, since every cache miss results in fetching data from RAM (or worse ...) which takes a lot of time (hundreds of cycles for RAM, tens of millions of cycles for HDD). In comparison, reading data from the (highest level) cache typically takes only a handful of cycles.

In modern computer architectures, the performance bottleneck is leaving the CPU die (e.g. accessing RAM or higher). This will only get worse over time. The increase in processor frequency is currently no longer relevant to increase performance. The problem is memory access. Hardware design efforts in CPUs therefore currently focus heavily on optimizing caches, prefetching, pipelines and concurrency. For instance, modern CPUs spend around 85% of die on caches and up to 99% for storing/moving data!

There is quite a lot to be said on the subject. Here are a few great references about caches, memory hierarchies and proper programming:

Main concepts for cache-friendly code

A very important aspect of cache-friendly code is all about the principle of locality, the goal of which is to place related data close in memory to allow efficient caching. In terms of the CPU cache, it's important to be aware of cache lines to understand how this works: How do cache lines work?

The following particular aspects are of high importance to optimize caching:

  1. Temporal locality: when a given memory location was accessed, it is likely that the same location is accessed again in the near future. Ideally, this information will still be cached at that point.
  2. Spatial locality: this refers to placing related data close to eachother. Caching happens on many levels, not just in the CPU. For example, when you read from RAM, typically a larger chunk of memory is fetched than what was specifically asked for because very often the program will require that data soon. HDD caches follow the same line of thought. Specifically for CPU caches, the notion of cache lines is important.

Use appropriate containers

A simple example of cache-friendly versus cache-unfriendly is 's std::vector versus std::list. Elements of a std::vector are stored in contiguous memory, and as such accessing them is much more cache-friendly than accessing elements in a std::list, which stores its content all over the place. This is due to spatial locality.

A very nice illustration of this is given by Bjarne Stroustrup in this youtube clip (thanks to @Magtheridon96 for the link!).

Don't neglect the cache in data structure and algorithm design

Whenever possible, try to adapt your data structures and order of computations in a way that allows maximum use of the cache. An common technique in this regard is cache blocking, which is of extreme importance in high-performance computing (cfr. for example ATLAS).

Know and exploit the implicit structure of data

Another simple example, which many people in the field sometimes forget is column-major (ex. ,) vs. row-major ordering (ex. ,) for storing two dimensional arrays. For example, consider the following matrix:

1 2
3 4

In row-major ordering, this is stored in memory as 1 2 3 4; in column-major ordering this would be stored as 1 3 2 4. It is easy to see that implementations which do not exploit this ordering will quickly run into (easily avoidable!) cache issues. Unfortunately, I see stuff like this very often in my domain (machine learning). @MatteoItalia showed this example in more detail in his answer.

When fetching a certain element of a matrix from memory, elements near it will be fetched as well and stored in a cache line. If the ordering is exploited, this will result in fewer memory accesses (because the next few values which are needed for subsequent computations are already in a cache line).

For simplicity, assume the cache comprises a single cache line which can contain 2 matrix elements and that when a given element is fetched from memory, the next one is too. Say we want to take the sum over all elements in the example 2x2 matrix above (lets call it M):

Exploiting the ordering (e.g. changing column index first in ):

M[0][0] (memory) + M[0][1] (cached) + M[1][0] (memory) + M[1][1] (cached)
= 1 + 2 + 3 + 4
--> 2 cache hits, 2 memory accesses

Not exploiting the ordering (e.g. changing row index first in ):

M[0][0] (memory) + M[1][0] (memory) + M[0][1] (memory) + M[1][1] (memory)
= 1 + 3 + 2 + 4
--> 0 cache hits, 4 memory accesses

In this simple example, exploiting the ordering approximately doubles execution speed (since memory access requires much more cycles than computing the sums). In practice the performance difference can be much larger.

Avoid unpredictable branches

Modern architectures feature pipelines and compilers are becoming very good at reordering code to minimize delays due to memory access. When your critical code contains (unpredictable) branches, it is hard or impossible to prefetch data. This will indirectly lead to more cache misses.

This is explained very well here (thanks to @0x90 for the link): Why is processing a sorted array faster than an unsorted array?

Avoid virtual functions

In the context of , virtual methods represent a controversial issue with regard to cache misses (a general consensus exists that they should be avoided when possible in terms of performance). Virtual functions can induce cache misses during look up, but this only happens if the specific function is not called often (otherwise it would likely be cached), so this is regarded as a non-issue by some. For reference about this issue, check out: What is the performance cost of having a virtual method in a C++ class?

Common problems

A common problem in modern architectures with multiprocessor caches is called false sharing. This occurs when each individual processor is attempting to use data in another memory region and attempts to store it in the same cache line. This causes the cache line -- which contains data another processor can use -- to be overwritten again and again. Effectively, different threads make each other wait by inducing cache misses in this situation. See also (thanks to @Matt for the link): How and when to align to cache line size?

An extreme symptom of poor caching in RAM memory (which is probably not what you mean in this context) is so-called thrashing. This occurs when the process continuously generates page faults (e.g. accesses memory which is not in the current page) which require disk access.

Android Studio installation on Windows 7 fails, no JDK found

123 votes

I downloaded Android Studio and attempted to launch the program.

This is running on Windows 7 64-bit with Java 1.7. During the installation my Java 1.7 is detected, and the rest of the installation goes through just fine. However, when attempting to launch the application from the desktop icon, nothing happens. Looking at the task manager, a new process from the CMD is loaded. This is because it's attempting to run the batch file studio.bat.

When I execute via CMD, I get the following error:

ERROR: cannot start Android Studio. No JDK found. Please validate either ANDROID_STUDIO_JDK, or JDK_HOME or JAVA_HOME points to valid JDK installation. ECHO is off. Press any key to continue . . .

I've attempted to open the idea properties file to see if there was something I could configure for this ANDROID_STUDIO_JDK or something like that. However, I found nothing. I hope some of you can let me know if you were able to install this or if you are having problems as well.

Adding a system variable JDK_HOME with value c:\Program Files\Java\jdk1.7.0_21\ worked for me.

Latest Java release can be downloaded here.

Missing return statement in a non-void method compiles

104 votes

I encountered a situation where a non-void method is missing a return statement and the code still compiles. I know that the statements after the while loop are unreachable(dead code) and would never be executed. But why doesn't the compiler even warn about returning something? Or why would a language allow us to have a non-void method having an infinite loop and not returning anything?

public int doNotReturnAnything() {
    while(true) {
        //do something
    }
    //no return statement
}

If I add a break statement(even a conditional one) in the while loop, the compiler complains of the infamous errors: 'Method does not return a value'(Eclipse) and 'Not all code paths return a value'(Visual Studio)

public int doNotReturnAnything() {
    while(true) {
        if(mustReturn) break;
        //do something
    }
    //no return statement
}

This is true of both Java and C#

Why would a language allow us to have a non-void method having an infinite loop and not returning anything?

The rule for non-void methods is every code path that returns must return a value, and that rule is satisfied in your program: zero out of zero code paths that return do return a value. The rule is not "every non-void method must have a code path that returns".

This enables you to write stub-methods like:

IEnumerator IEnumerable.GetEnumerator() 
{ 
    throw new NotImplementedException(); 
}

That's a non-void method. It has to be a non-void method in order to satisfy the interface. But it seems silly to make this implementation illegal because it does not return anything.

That your method has an unreachable end point because of a goto (remember, a while(true) is just a more pleasant way to write goto) instead of a throw (which is another form of goto) is not relevant.

Why doesn't the compiler even warn about returning something?

Because the compiler has no good evidence that the code is wrong. Someone wrote while(true) and it seems likely that the person who did that knew what they were doing.

Where can I read more about reachability analysis in C#?

See my articles on the subject, here:

http://ericlippert.com/category/reachability/

And you might also consider reading the C# specification.

Python syntax for "if a or b or c but not all of them"

89 votes

I have a python script that can receive either zero or three command line arguments. (Either it runs on default behavior or needs all three values specified.) What's the ideal syntax for something like:

if a and (not b or not c) or b and (not a or not c) or c and (not b or not a):

Apologies if this is an existing question -- my theory is too shallow to know what words to search for.

If you mean a minimal form, go with this:

if (not a or not b or not c) and (a or b or c):

Which translates the title of your question.

UPDATE: as correctly said by Volatility and Supr, you can apply De Morgan's law and obtain equivalent:

if (a or b or c) and not (a and b and c):

My advice is to use whichever form is more significant to you and to other programmers. The first means "there is something false, but also something true", the second "There is something true, but not everything". If I were to optimize or do this in hardware, I would choose the second, here just choose the most readable (also taking in consideration the conditions you will be testing and their names). I picked the first.

What is the fastest integer division supporting division by zero no matter what the result is?

74 votes

Summary:

I'm looking for the fastest way to calculate

(int) x / (int) y

without getting an exception for y==0. Instead I just want an arbitrary result.


Background:

When coding image processing algorithms I often need to divide by an (accumulated) alpha value. The most simple variant is plain C code with integer arithmetic. My problem is that I typically get a division by zero error for result pixels with alpha==0. However this are exactly the pixels where the result doesn't matter at all: I don't care about color values of pixels with alpha==0.


Details:

I'm looking for something like:

result = (y==0)? 0 : x/y;

or

result = x / MAX( y, 1 );

x and y are positive integers. The code is executed a huge number of times in a nested loop, so I'm looking for a way to get rid of the conditional branching.

When y does not exceed the byte range, I'm happy with the solution

unsigned char kill_zero_table[256] = { 1, 1, 2, 3, 4, 5, 6, 7, [...] 255 };
[...]
result = x / kill_zero_table[y];

But this obviously does not work well for bigger ranges.

I guess the final question is: Whats the fastest bit twiddling hack changing 0 to any other integer value, while leaving all other values unchanged?


Clarifications

I'm not 100% sure that branching is too expensive. However, different compilers are used, so I prefer benchmarking with little optimizations (which is indeed questionable).

For sure, compilers are great when it comes to bit twiddling, but I can't express the "don't care" result in C, so the compiler will never be able to use the full range of optimizations.

Code should be fully C compatible, main platforms are Linux 64 Bit with gcc & clang and MacOS.

Inspired by some of the comments I got rid of the branch on my Pentium and gcc compiler using

int f (int x, int y)
{
        y += y == 0;
        return x/y;
}

The compiler basically recognizes that it can use a condition flag of the test in the addition.

As per request the assembly:

.globl f
    .type   f, @function
f:
    pushl   %ebp
    xorl    %eax, %eax
    movl    %esp, %ebp
    movl    12(%ebp), %edx
    testl   %edx, %edx
    sete    %al
    addl    %edx, %eax
    movl    8(%ebp), %edx
    movl    %eax, %ecx
    popl    %ebp
    movl    %edx, %eax
    sarl    $31, %edx
    idivl   %ecx
    ret

As this turned out to be such a popular question and answer, I'll elaborate a bit more. The above example is based on programming idiom that a compiler recognizes. In the above case a boolean expression is used in integral arithmetic and the use of condition flags are invented in hardware for this purpose. In general condition flags are only accessible in C through using idiom. That is why it so hard to make a portable multiple precision integer library in C without resorting to (inline) assembly. My guess is that most decent compilers will understand the above idiom.

Another way of avoiding branches, as also remarked in some of the above comments, is predicated execution. I therefore took philipp's first code and my code and ran it through the compiler from ARM and the GCC compiler for the ARM architecture, which features predicated execution. Both compilers avoid the branch in both samples of code:

Philipp's version with the ARM compiler:

f PROC
        CMP      r1,#0
        BNE      __aeabi_idivmod
        MOVEQ    r0,#0
        BX       lr

Philipp's version with GCC:

f:
        subs    r3, r1, #0
        str     lr, [sp, #-4]!
        moveq   r0, r3
        ldreq   pc, [sp], #4
        bl      __divsi3
        ldr     pc, [sp], #4

My code with the ARM compiler:

f PROC
        RSBS     r2,r1,#1
        MOVCC    r2,#0
        ADD      r1,r1,r2
        B        __aeabi_idivmod

My code with GCC:

f:
        str     lr, [sp, #-4]!
        cmp     r1, #0
        addeq   r1, r1, #1
        bl      __divsi3
        ldr     pc, [sp], #4

All versions still need a branch to the division routine, because this version of the ARM doesn't have hardware for a division, but the test for y == 0 is fully implemented through predicated execution.

GetType() can lie?

74 votes

Based on the following question asked a few days ago in SO: GetType() and polymorphism and reading Eric Lippert's answer, I started thinking if making GetType() not be virtual really ensured that an object could not lie about its Type.

Specifically, Eric's answer states the following:

The framework designers are not going to add an incredibly dangerous feature such as allowing an object to lie about its type merely to make it consistent with three other methods on the same type.

Now the question is: can I make an object that does lie about its type without it being immediately obvious? I may be profoundly wrong here and I'd love clarification if that is the case, but consider the following code:

public interface IFoo
{
    Type GetType();
}

And the following two implementations of said interface:

public class BadFoo : IFoo
{
    Type IFoo.GetType()
    {
        return typeof(int);
    }
}

public class NiceFoo : IFoo
{
}

Then if you run the following simple program:

static void Main(string[] args)
{
    IFoo badFoo = new BadFoo();
    IFoo niceFoo = new NiceFoo();
    Console.WriteLine("BadFoo says he's a '{0}'", badFoo.GetType().ToString());
    Console.WriteLine("NiceFoo says he's a '{0}'", niceFoo.GetType().ToString());
    Console.ReadLine();
}

Sure enough badFoo outputs an erroneous Type.

Now I don't know if this has any serious implications based on Eric describing this behavior as an "incredibly dangerous feature", but could this pattern pose a credible threat?

Nice question! The way I see it, you could only really mislead a fellow developer if GetType was virtual on object, which it isn't.

What you did is akin to shadowing GetType, like this:

public class BadFoo
{
    public new Type GetType()
    {
        return typeof(int);
    }
}

with this class (and using the sample code from the MSDN for the GetType() method) you could indeed have:

int n1 = 12;
BadFoo foo = new BadFoo();

Console.WriteLine("n1 and n2 are the same type: {0}",
                  Object.ReferenceEquals(n1.GetType(), foo.GetType())); 
// output: 
// n1 and n2 are the same type: True

so, yikes, you've successfully lied, right? Well, yes and no... Consider that using this as an exploit would mean using your BadFoo instance as an argument to a method somewhere, that expects likely an object or a common base type for a hierarchy of objects. Something like this:

public void CheckIfInt(object ob)
{
    if(ob.GetType() == typeof(int))
    {
        Console.WriteLine("got an int! Initiate destruction of Universe!");
    }
    else
    {
        Console.WriteLine("not an int");
    }
}

but CheckIfInt(foo) prints "not an int".

So, basically (back to your example), you could really only exploit your "lying type" with code that someone wrote against your IFoo interface, which is very explicit about the fact that it has a "custom" GetType() method.

Only if GetType() was virtual on object you would be able to craft a "lying" type that could be used with methods like CheckIfInt above to create havoc in libraries written by someone else.

Why does my application spend 24% of its life doing a null check?

72 votes

I've got a performance critical binary decision tree, and I'd like to focus this question on a single line of code. The code for the binary tree iterator is below with the results from running performance analysis against it.

        public ScTreeNode GetNodeForState(int rootIndex, float[] inputs)
        {
0.2%        ScTreeNode node = RootNodes[rootIndex].TreeNode;

24.6%       while (node.BranchData != null)
            {
0.2%            BranchNodeData b = node.BranchData;
0.5%            node = b.Child2;
12.8%           if (inputs[b.SplitInputIndex] <= b.SplitValue)
0.8%                node = b.Child1;
            }

0.4%        return node;
        }

BranchData is a field, not a property. I did this to prevent the risk of it not being inlined.

The BranchNodeData class is as follows:

public sealed class BranchNodeData
{
    /// <summary>
    /// The index of the data item in the input array on which we need to split
    /// </summary>
    internal int SplitInputIndex = 0;

    /// <summary>
    /// The value that we should split on
    /// </summary>
    internal float SplitValue = 0;

    /// <summary>
    /// The nodes children
    /// </summary>
    internal ScTreeNode Child1;
    internal ScTreeNode Child2;
}

As you can see, the while loop / null check is a massive hit on performance. The tree is massive, so I would expect searching for a leaf to take a while, but I'd like to understand the disproportionate amount of time spent on that one line.

I've tried:

  • Separating the Null check from the while - it's the Null check that's the hit.
  • Adding a boolean field to the object and checking against that, it made no difference. It doesn't matter what's being compared, it's the comparison that's the issue.

Is this a branch prediction issue? If so, what can I do about it? If anything?

I won't pretend to understand the CIL, but I'll post it for anyone does so they can try to scrape some information from it.

.method public hidebysig
instance class OptimalTreeSearch.ScTreeNode GetNodeForState (
    int32 rootIndex,
    float32[] inputs
) cil managed
{
    // Method begins at RVA 0x2dc8
    // Code size 67 (0x43)
    .maxstack 2
    .locals init (
        [0] class OptimalTreeSearch.ScTreeNode node,
        [1] class OptimalTreeSearch.BranchNodeData b
    )

    IL_0000: ldarg.0
    IL_0001: ldfld class [mscorlib]System.Collections.Generic.List`1<class OptimalTreeSearch.ScRootNode> OptimalTreeSearch.ScSearchTree::RootNodes
    IL_0006: ldarg.1
    IL_0007: callvirt instance !0 class [mscorlib]System.Collections.Generic.List`1<class OptimalTreeSearch.ScRootNode>::get_Item(int32)
    IL_000c: ldfld class OptimalTreeSearch.ScTreeNode OptimalTreeSearch.ScRootNode::TreeNode
    IL_0011: stloc.0
    IL_0012: br.s IL_0039
    // loop start (head: IL_0039)
        IL_0014: ldloc.0
        IL_0015: ldfld class OptimalTreeSearch.BranchNodeData OptimalTreeSearch.ScTreeNode::BranchData
        IL_001a: stloc.1
        IL_001b: ldloc.1
        IL_001c: ldfld class OptimalTreeSearch.ScTreeNode OptimalTreeSearch.BranchNodeData::Child2
        IL_0021: stloc.0
        IL_0022: ldarg.2
        IL_0023: ldloc.1
        IL_0024: ldfld int32 OptimalTreeSearch.BranchNodeData::SplitInputIndex
        IL_0029: ldelem.r4
        IL_002a: ldloc.1
        IL_002b: ldfld float32 OptimalTreeSearch.BranchNodeData::SplitValue
        IL_0030: bgt.un.s IL_0039

        IL_0032: ldloc.1
        IL_0033: ldfld class OptimalTreeSearch.ScTreeNode OptimalTreeSearch.BranchNodeData::Child1
        IL_0038: stloc.0

        IL_0039: ldloc.0
        IL_003a: ldfld class OptimalTreeSearch.BranchNodeData OptimalTreeSearch.ScTreeNode::BranchData
        IL_003f: brtrue.s IL_0014
    // end loop

    IL_0041: ldloc.0
    IL_0042: ret
} // end of method ScSearchTree::GetNodeForState

Edit: I decided to do a branch prediction test, I added an identical if within the while, so we have

while (node.BranchData != null)

and

if (node.BranchData != null)

inside that. I then ran performance analysis against that, and it took six times longer to execute the first comparison as it did to execute the second comparison that always returned true. So it looks like it is indeed a branch prediction issue - and I'm guessing there's nothing I can do about it?!

Another Edit

The above result would also occur if node.BranchData had to be loaded from the RAM for the while check - it would then be cached for the if statement.


This is my third question on a similar topic. This time I'm focusing on a single line of code. My other questions on this subject are:

The tree is massive

By far the most expensive thing a processor ever does is not executing instructions, it is accessing memory. The execution core of a modern CPU is many times faster than the memory bus. A problem related to distance, the further an electrical signal has to travel, the harder it gets to get that signal delivered to the other end of the wire without it being corrupted. The only cure for that problem is to make it go slower. A big problem with the wires that connect the CPU to the RAM in your machine, you can pop the case and see the wires.

Processors have a counter-measure for this problem, they use caches, buffers that store a copy of the bytes in RAM. An important one is the L1 cache, typically 16 kilobytes for data and 16 kilobytes for instructions. Small, allowing it to be close to the execution engine. Reading bytes from the L1 cache typically takes 2 or 3 CPU cycles. Next up is the L2 cache, bigger and slower. Upscale processors also have an L3 cache, bigger and slower yet. As process technology improves, those buffers take less space and automatically becomes faster as they get closer to the core, a big reason why newer processors are better and how they manage to use an ever increasing number of transistors.

Those caches are however not a perfect solution. The processor will still stall on a memory access if the data is not available in one of the caches. It cannot continue until the very slow memory bus has supplied the data. Losing a fat hundred CPU cycles is possible on a single instruction.

Tree structures are a problem, they are not cache friendly. Their nodes tend to be scattered throughout the address space. The fastest way to access memory is by reading from sequential addresses. The unit of storage for the L1 cache is 64 bytes. Or in other words, once the processor reads one byte, the next 63 are very fast since they'll be present in the cache.

Which makes an array by far the most efficient data structure. Also the reason that the .NET List<> class isn't a list at all, it uses an array for storage. The same for other collection types, like Dictionary, structurally not remotely similar to an array, but internally implemented with arrays.

So your while() statement is very likely to be suffering from CPU stalls because it is dereferencing a pointer to access the BranchData field. The next statement is very cheap because the while() statement already did the heavy lifting of retrieving the value from memory. Assigning the local variable is cheap, a processor uses a buffer for writes.

Not otherwise a simple problem to solve, flattening your tree into arrays is very likely to be unpractical. Not in the least because you typically cannot predict in which order the nodes of the tree will be visited. A red-black tree might help, it isn't clear from the question. So a simple conclusion to draw is that it is already running as fast you can hope for. And if you need it to go faster then you'll need better hardware with a faster memory bus. DDR4 is going mainstream this year.

Is it expensive to use try-catch blocks even if an exception is never thrown?

71 votes

We know that it is expensive to catch exceptions. But, is it also expensive to use a try-catch block in Java even if an exception is never thrown?

I found the Stack Overflow question/answer Why are try blocks expensive?, but it is for .NET.

try has almost no expense at all. Instead of doing the work of setting up the try at runtime, the code's metadata is structured at compile time such that when an exception is thrown, it now does a relatively expensive operation of walking up the stack and seeing if any try blocks exist that would catch this exception. From a layman's perspective, try may as well be free. It's actually throwing the exception that costs you - but unless you're throwing hundreds or thousands of exceptions, you still won't notice the cost.


try has some minor costs associated with it. Java cannot do some optimizations on code in a try block that it would otherwise do. For example, Java will often re-arrange instructions in a method to make it run faster - but Java also needs to guarantee that if an exception is thrown, the method's execution is observed as though its statements, as written in the source code, executed in order up to some line.

Because in a try block an exception can be thrown (at any line in the try block! Some exceptions are thrown asynchronously, such as by calling stop on a Thread (which is deprecated), and even besides that OutOfMemoryError can happen almost anywhere) and yet it can be caught and code continue to execute afterwards in the same method, it is more difficult to reason about optimizations that can be made, so they are less likely to happen. (Someone would have to program the compiler to do them, reason about and guarantee correctness, etc. It'd be a big pain for something meant to be 'exceptional') But again, in practice you won't notice things like this.

Why C# behaves differently on two int array syntaxes

66 votes

Array in C# is co-variant implicitly on reference type:

object[] listString = new string[] { "string1", "string2" };

But not on value type, so if you change string to int, you will get compiled error:

object[] listInt = new int[] {0, 1}; // compile error

Now, the concern is when you declare int array like two syntaxes below which do not explicitly declare the type int, just only differentiate on new[], compiler will treat differently:

object[] list1 = { 0, 1 };       //compile successfully
object[] list2 = new[] {0, 1};    //compile error

You will get object[] list1 = { 0, 1 }; compiled successfully, but object[] list2= new[] {0, 1}; compiled error.

It seems the C# compiler treats

object[] list1 = { 0, 1 };

as

object[] list1 = new object[]{ 0, 1 };

but

object[] list2 = new[] { 0, 1 };

as

object[] list2 = new int[]{ 0, 1 };  //error because of co-variant

Why C# compiler behaves in the different way on this case?

The version that compiles uses an array initializer to initialize list1. The C# language spec, §1.110 ("Array initializers") states:

An array initializer consists of a sequence of variable initializers, enclosed by “{”and “}” tokens and separated by “,” tokens. Each variable initializer is an expression or, in the case of a multi-dimensional array, a nested array initializer.

The context in which an array initializer is used determines the type of the array being initialized. In an array creation expression, the array type immediately precedes the initializer, or is inferred from the expressions in the array initializer. In a field or variable declaration, the array type is the type of the field or variable being declared.

When an array initializer is used in a field or variable declaration, such as:

int[] a = {0, 2, 4, 6, 8};

it is simply shorthand for an equivalent array creation expression:

int[] a = new int[] {0, 2, 4, 6, 8};

So it is obvious that this should compile.

The second version uses an explicit array creation expression, where you instruct the compiler specifically what type of array to create. §1.51.10.4 ("Array creation expressions") states:

An array creation expression of the third form is referred to as an implicitly typed array creation expression. It is similar to the second form, except that the element type of the array is not explicitly given, but determined as the best common type (§1.50.2.14) of the set of expressions in the array initializer.

Therefore, the second version is equivalent to

object[] list2 = new int[] { 0, 1 };

So the question now effectively becomes "why can I not assign an int[] to an object[]", just as you mention at the end of the question. And the answer is also simple, given in §1.109 ("Array covariance"):

Array covariance specifically does not extend to arrays of value-types. For example, no conversion exists that permits an int[] to be treated as an object[].

Type of `this` in static member function?

63 votes

In C++ 5.1.1/3 [expr.prim.general] it says:

The type and value category [of this] are defined within a static member function.

What does this mean? How is it relevant?

Note that:

this shall not appear in the declaration of a static member function

The language in the standard can be traced to n3282, which is a resolution for defects 1207 and 1017. In particular, the language appears in the proposed resolution for defect 1207, and thus should be considered in the context of the standard as it stood at the time that defect was addressed. At that time there was some concern over the rewriting of id-expressions into member access expressions using *this (9.3.1p3), in particular in the context of trailing-return-type declarations (see issue 945).

If we compare the proposed resolution to defect 1207 to the eventual language in n3282 and subsequently in the standard, there is one significant difference to 9.3.1p3:

Defect 1207:

When an id-expression (5.1 [expr.prim]) that is not part of a class member access syntax (5.2.5 [expr.ref]) and not used to form a pointer to member (5.3.1 [expr.unary.op]) is used in the declaration of a member function of class X, if name lookup (3.4 [basic.lookup]) resolves the name...

n3282 and C++11:

When an id-expression (5.1 [expr.prim]) that is not part of a class member access syntax (5.2.5 [expr.ref]) and not used to form a pointer to member (5.3.1 [expr.unary.op]) is used in a member of class X in a context where this can be used (5.1.1 [expr.prim.general]), if name lookup (3.4 [basic.lookup]) resolves the name [...]

It is apparent that the proposed resolution to defect 1207 carried the belief that id-expressions (to a static member) within a static member functions would need to be transformed to *this member access expressions and thus would need access to the type and value category of this. By the time n3282 was written this had been resolved in favour of the qualified-id transformation (also 9.3.1p3) which does not require this, but the language in 5.1.1p3 remained vestigially.

I would recommend raising this issue on the C++ standards discussion newsgroup; it may be possible to get the vestigial language removed editorially.

Libraries do not get added to APK anymore after upgrade to ADT 22

60 votes

I have a rather big Android App project that is referencing several library projects. Everything was fine until i upgraded the eclipse ADT plugin to the newest version (v22). I also upgraded the SDK of course. I do not see any compile errors in eclipse, but when i run the project on the phone i get a NoClassDefFoundError.

java.lang.NoClassDefFoundError: org.acra.ACRA
....

The arca library is included in one of the referenced library project (in the libs folder) and i can see it in the "Android Private Libraries" in the package explorer, also as i said, no compile errors. The project runs fine on everyone else's computer that did not upgrade ADT.

I have already tried a whole bunch of stuff including but not limited to:

  • re-install the android SDK
  • download a fresh ADT bundle
  • delete all my code an get it again from git
  • copy the library in question to the app project
  • comment out the code that uses this library - i just get the same error for the next library

all without any success, so i'm getting really desperate here.

I would be really happy if anyone could give me a hint on how to solve that problem.

Quoting Streets of Boston from his adt-dev post:

When upgrading, the 'Order and Export' of the new 'Android Private Libraries' is not always checked. And the android-support-v4.jar is now in this 'Android Private Libraries' section.

To fix this, go to 'Order and Export' and check 'Android Private Libraries'. Then refresh/clean/rebuild.

After you done this 'fix' for a library project, you may need to just close and re-open any depending project, because they may not see this 'fix' immediately.

Give this a shot and with luck it will solve your problem.

enter image description here

Why does C# allow statements after a case but not before it?

56 votes

Why does C# allow this:

var s = "Nice";
switch (s)
{
    case "HI":
        break;
    const string x = "Nice";
    case x:
        Console.Write("Y");
        break;
}

But not this:

var s = "Nice";
switch (s)
{
    const string x = "Nice";
    case x:
        Console.Write("Y");
        break;
}

Because your indentation is misleading, the first code actually is:

var s = "Nice";
switch (s)
{
    case "HI":
        break;
        const string x = "Nice";
    case x:
        Console.Write("Y");
        break;
}

That is, x is declared inside a case statement (though after a break), where it is valid. However, directly inside a switch statement it’s invalid – the only valid statements there are case and default.

Furthermore, const declarations are evaluated at compile time, so x is defined even though there’s a break statement before.

However, note that the Mono C# compiler will not compile this code, it complains that “the name ‘x’ does not exist in the current scope” so Mono seems to implement more checks than the .NET compiler. However, I can’t find any rules in the C# standard which forbid this use of the const declaration so I assume that the .NET compiler is right and the Mono compiler is wrong.

When splitting an empty string in Python, why does split() return an empty list while split('\n') returns ['']?

55 votes

I am using split('\n') to get lines in one string, and found that ''.split() returns an empty list, [], while ''.split('\n') returns ['']. Is there any specific reason for such a difference?

And is there any more convenient way to count lines in a string?

Question: I am using split('\n') to get lines in one string, and found that ''.split() returns empty list [], while ''.split('\n') returns [''].

The str.split() method has two algorithms. If no arguments are given, it splits on repeated runs of whitespace. However, if an argument is given, it is treated as a single delimiter with no repeated runs.

In the case of splitting an empty string, the first mode (no argument) will return an empty list because the whitespace is eaten and there are no values to put in the result list.

In contrast, the second mode (with an argument such as \n) will produce the first empty field. Consider if you had written '\n'.split('\n'), you would get two fields (one split, gives you two halves).

Question: Is there any specific reason for such a difference?

This first mode is useful when data is aligned in columns with variable amounts of whitespace. For example:

>>> data = '''\
Shasta      California     14,200
McKinley    Alaska         20,300
Fuji        Japan          12,400
'''
>>> for line in data.splitlines():
        print line.split()

['Shasta', 'California', '14,200']
['McKinley', 'Alaska', '20,300']
['Fuji', 'Japan', '12,400']

The second mode is useful for delimited data such as CSV where repeated commas denote empty fields. For example:

>>> data = '''\
Guido,BDFL,,Amsterdam
Barry,FLUFL,,USA
Tim,,,USA
'''
>>> for line in data.splitlines():
        print line.split(',')

['Guido', 'BDFL', '', 'Amsterdam']
['Barry', 'FLUFL', '', 'USA']
['Tim', '', '', 'USA']

Note, the number of result fields is one greater than the number of delimiters. Think of cutting a rope. If you make no cuts, you have one piece. Making one cut, gives two pieces. Making two cuts, gives three pieces. And so it is with Python's str.split(delimiter) method:

>>> ''.split(',')       # No cuts
['']
>>> ','.split(',')      # One cut
['', '']
>>> ',,'.split(',')     # Two cuts
['', '', '']

Question: And is there any more convenient way to count lines in a string?

Yes, there are a couple of easy ways. One uses str.count() and the other uses str.splitlines(). Both ways will give the same answer unless the final line is missing the \n. If the final newline is missing, the str.splitlines approach will give the accurate answer. A faster technique that is also accurate uses the count method but then corrects it for the final newline:

>>> data = '''\
Line 1
Line 2
Line 3
Line 4'''

>>> data.count('\n')                               # Inaccurate
3
>>> len(data.splitlines())                         # Accurate, but slow
4
>>> data.count('\n') + (not data.endswith('\n'))   # Accurate and fast
4    

Question from @Kaz: Why the heck are two very different algorithms shoe-horned into a single function?

The signature for str.split is about 20 years old, and a number of the APIs from that era are strictly pragmatic. While not perfect, the method signature isn't "terrible" either. For the most part, Guido's API design choices have stood the test of time.

The current API is not without advantages. Consider strings such as:

ps_aux_header  = "USER               PID  %CPU %MEM      VSZ"
patient_header = "name,age,height,weight"

When asked to break these strings into fields, people tend to describe both using the same English word, "split". When asked to read code such as fields = line.split() or fields = line.split(','), people tend to correctly interpret the statements as "splits a line into fields".

Microsoft Excel's text-to-columns tool made a similar API choice and incorporates both splitting algorithms in the same tool. People seem to mentally model field-splitting as a single concept even though more than one algorithm is involved.

Where is the 'this' pointer stored in computer memory?

54 votes

Where exactly is the 'this' pointer stored in memory? Is it allocated on the stack, in the heap, or in the data segment?

#include <iostream>
using namespace std;

class ClassA
{
    int a, b;

    public:
        void add()
        {
            a = 10;
            b = 20;
            cout << a << b << endl;
        }
};

int main()
{
    ClassA obj;
    obj.add();
    return 0;
}

In the above code I am calling the member function add() and the receiver object is passed implicitly as the 'this' pointer. Where is this stored in memory?

Other answers have done a very good job explaining how a typical compiler implements this (by passing it as an implicit first parameter to the function).

I think it's also useful to see what the C++ ISO spec explicitly says about this. According to the C++03 ISO spec, §9.3.2/1:

In the body of a nonstatic (9.3) member function, the keyword this is a non-lvalue expression whose value is the address of the object for which the function is called.

It's important to note that this is not a variable - it's an expression, much in the same way that the expression 1 + 2 * 3 is an expression. The value of this expression is permitted to be stored pretty much anywhere. The compiler might put it on the stack and pass it as an implicit parameter to a function, or it might put it in a register, and it conceivably could put it in the heap or in the data segment. The C++ specification deliberately gives the implementation some flexibility here.

I think that the "language-lawyer" answer is "this is completely implementation-defined, and moreover this is technically not a pointer, but an expression that evaluates to a pointer."

Hope this helps!

How can Eclipse create a class with unresolved compilation problems?

51 votes

When I try to compile this class with javac, I get a compilation error and Test.class is not created.

public class Test {
    public static void main(String[] args) {
        int x = 1L;  // <- this cannot compile
    }
}

But when I create this class in Eclipse, I can see that Test.class appears in target/classes. When I try to run this class from command line with java.exe, I get

Exception in thread "main" java.lang.Error: Unresolved compilation problem:
Type mismatch: cannot convert from long to int

Does Eclipse use its own special Java compiler to create a broken .class? How does java.exe know about complilation problems in .class?

This is how the Java compiler knows about the compilation error in the class.

public static void main(String[] paramArrayOfString)
{
    throw new Error("Unresolved compilation problem: \n\tType mismatch: cannot convert from long to int.\n");
}

If you decompile your class file, you can see the above main() method of the class file, which the compiler has generated. This is because of the compiler which Eclipse uses (Eclipse Compiler for Java) is not the same as the standard Java compiler!

What does "this[0]" mean in C#?

Asked on Mon, 20 May 2013 by Cemre c#
51 votes

I was going through some library code and saw a method like:

public CollapsingRecordNodeItemList List
{
    get { return this[0] as CollapsingRecordNodeItemList; }
}

The class that contains this method is not a list or something iterable, so what exactly does this[0] mean?

Look for an indexer in the class.

C# lets you define indexers to allow this sort of access.

Here is an example from the official guide for "SampleCollection".

public T this[int i]
    {
        get
        {
            // This indexer is very simple, and just returns or sets 
            // the corresponding element from the internal array. 
            return arr[i];
        }
        set
        {
            arr[i] = value;
        }
    }

Here is the definition from the official language specification:

An indexer is a member that enables objects to be indexed in the same way as an array. An indexer is declared like a property except that the name of the member is this followed by a parameter list written between the delimiters [ and ]. The parameters are available in the accessor(s) of the indexer. Similar to properties, indexers can be read-write, read-only, and write-only, and the accessor(s) of an indexer can be virtual.

One can find the full and complete definition in section 10.9 Indexers of the specification.

Extracting pairs of words using String.split()

40 votes

Given a String such as

String input = "one two three four five six seven";

Is there a regex that works with String.split() to grab (up to) two words at a time, such that:

String[] pairs = input.split("some regex");
System.out.println(Arrays.toString(pairs));

results in this:

[one two, three four, five six, seven]

Note: This question is about the split regex. It is not about "finding a work-around" or other "making it work another way" solutions.

Is this what you are looking for?
(you can replace \\w with \\S to include all non-space characters but for this example I will leave \\w since it is easier to analyze regex with \\w\\s then \\S\\s)

String input = "one two three four five six seven";
String[] pairs = input.split("(?<!\\G\\w+)\\s");
System.out.println(Arrays.toString(pairs));

output:

[one two, three four, five six, seven]

\G is previous match, (?<!regex) is negative lookbehind.

In split we are trying to

  1. find spaces -> \\s
  2. that are not predicted -> (?<!negativeLookBehind)
  3. by some word -> \\w+
  4. with previously matched (space) -> \\G
  5. before it ->\\G\\w+.

Only confusion that I had at start was how would it work for first space since we want that space to be ignored. Important information is that \\G at start matches start of the String ^.

So before first iteration regex in negative look-behind will look like (?<!^\\w+) and since first space do have ^\\w+ before, it can't be match for split. Next space will not have this problem, so it will be matched and informations about it (like its position in input String) will be stored in \\G and used later in next negative look-behind.

So for 3rd space regex will check if there is previously matched space \\G and word \\w+ before it. Since result of this test will be positive, negative look-behind wont accept it so this space wont be matched, but 4th space wont have this problem because space before it wont be the same as stored in \\G (it will have different position in input String).


Also if someone would like to separate on lets say every 3rd space you can use this form (based on @maybeWeCouldStealAVan's answer which was deleted when I posted this fragment of answer)

input.split("(?<=\\G\\w{1,100}\\s\\w{1,100}\\s\\w{1,100})\\s")

Instead of 100 you can use some bigger value that will be at least the size of length of longest word in String.

Using two values for one switch case statement

36 votes

In my code, the program does something depending on the text entered by the user. My code looks like:

switch (name) {
        case text1: {
            //blah
            break;
        }
        case text2: {
            //blah
            break;
        }
        case text3: {
            //blah
            break;
        }
        case text4: {
            //blah
            break;
        }

However, the code inside cases text1 and text4 is the same. I was therefore wondering if it would be possible for me to implement something like

case text1||text4: {
            //blah
            break;
        }

I know that the || operator won't work in the case statement but is there something similar I can use.

You can use have both CASE statements as follows.

  case text1: 
  case text4:{
            //blah
            break;
        }

SEE THIS EXAMPLE:The code example calculates the number of days in a particular month:

 class SwitchDemo {
    public static void main(String[] args) {

        int month = 2;
        int year = 2000;
        int numDays = 0;

        switch (month) {
            case 1:
            case 3:
            case 5:
            case 7:
            case 8:
            case 10:
            case 12:
                numDays = 31;
                break;
            case 4:
            case 6:
            case 9:
            case 11:
                numDays = 30;
                break;
            case 2:
                if (((year % 4 == 0) && 
                     !(year % 100 == 0))
                     || (year % 400 == 0))
                    numDays = 29;
                else
                    numDays = 28;
                break;
            default:
                System.out.println("Invalid month.");
                break;
        }
        System.out.println("Number of Days = "
                           + numDays);
    }
}
This is the output from the code:

Number of Days = 29

FALLTHROUGH:

Another point of interest is the break statement. Each break statement terminates the enclosing switch statement. Control flow continues with the first statement following the switch block. The break statements are necessary because without them, statements in switch blocks fall through: All statements after the matching case label are executed in sequence, regardless of the expression of subsequent case labels, until a break statement is encountered.

EXAMPLE CODE:

 public class SwitchFallThrough {

    public static void main(String[] args) {
        java.util.ArrayList<String> futureMonths =
            new java.util.ArrayList<String>();

        int month = 8;

        switch (month) {
            case 1:  futureMonths.add("January");
            case 2:  futureMonths.add("February");
            case 3:  futureMonths.add("March");
            case 4:  futureMonths.add("April");
            case 5:  futureMonths.add("May");
            case 6:  futureMonths.add("June");
            case 7:  futureMonths.add("July");
            case 8:  futureMonths.add("August");
            case 9:  futureMonths.add("September");
            case 10: futureMonths.add("October");
            case 11: futureMonths.add("November");
            case 12: futureMonths.add("December");
            default: break;
        }

        if (futureMonths.isEmpty()) {
            System.out.println("Invalid month number");
        } else {
            for (String monthName : futureMonths) {
               System.out.println(monthName);
            }
        }
    }
}
This is the output from the code:

August
September
October
November
December

FROM Java Docs

C sizeof 2500 million is 8 bytes but size of 1250 million x 2 is 4 bytes... why?

36 votes

So I am really confused why sizeof does not evaluate an expression before determining the number of bytes is needed. Why does this happen:

sizeof(2500000000) = 8

But,

sizeof(1250000000 * 2) = 4

Why? How can I get around this?

EDIT:

Is there anyway to determine the memory on runtime?

EDIT #2:

I really did not understand how OS's reserve memory for applications. Ignore my question but look at the voted reply it's very useful! In addition, wikipedia "Stack Overflow". Thank you everyone that participated in this question!

NOTE: This is a homework problem so please be vague or do not lead me to a particular solution, just help clarify my question(s).

2500000000 doesn't fit in an int, so the compiler correctly interprets it as a long (or long long, or a type where it fits). 1250000000 does, and so does 2. The parameter to sizeof isn't evaluated, so the compiler can't possibly know that the multiplication doesn't fit in an int, and so returns the size of an int.

Also, even if the parameter was evaluated, you'd likely get an overflow (and undefined behavior), but probably still resulting in 4.

Here:

#include <iostream>
int main()
{
    long long x = 1250000000 * 2;
    std::cout << x;
}

can you guess the output? If you think it's 2500000000, you'd be wrong. The type of the expression 1250000000 * 2 is int, because the operands are int and int and multiplication isn't automagically promoted to a larger data type if it doesn't fit.

http://ideone.com/4Adf97

So here, gcc says it's -1794967296, but it's undefined behavior, so that could be any number. This number does fit into an int.

In addition, if you cast one of the operands to the expected type (much like you cast integers when dividing if you're looking for a non-integer result), you'll see this working:

#include <iostream>
int main()
{
    long long x = (long long)1250000000 * 2;
    std::cout << x;
}

yields the correct 2500000000.