Best performance questions in March 2012

Is inline assembly language slower than native C++ code?

61 votes

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

Here's main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"\n";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

And here comes the result.

Function of assembly version:

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

Function of C++ version:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

Yes, yes yes. Most times.

Compilers can do optimizations that most people can't even imagine (see this short list).
They can take in account inter-procedural optimization and whole-program optimization. Assembly programmer has to make well-defined functions with a well-defined call interface. This prevents many of the optimization methods that compilers use, such as register allocation, constant propagation, common subexpression elimination across functions, scheduling across functions, and other complex, not obvious optimizations (Polytope model, for example). It's not amazing, they can verify in one second what you'll need 2 days to calculate. On RISC architecture guys stopped to worry about this many years ago (Instruction Scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

If you write something in assembly, I think you have to consider at least some simple optimizations (take a look at MasmCode. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again. It could demonstrate why your debug version is slower in pure C++ (no optimizations).
That said, modern compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (I'm not an assembly guru so I don't even try talk about code you wrote). Just for loops this is a short list of loop optimizations of what is common to check for a compiler (do you think you may do it by yourself when your schedules has been decided for a C# program?)

These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs! Do you want to support them all? Each has a specific micro-architecture and some specific instruction set. For small tasks (like this) the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).

You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 200.000+ lines of C++ code). I think compilers will produce better assembly code 95% times (moreover we don't have to forget that an assembler is a compiler too and it'll do optimizations) and sometimes and only some rare times you may need to write assembly code for few, short, highly used, performance critical routines.

Non-intersecting line segment

41 votes

I would like to find a better algorithm to solve the following problem:

There are N starting points (purple) and N target points (green) in 2D. I want an algorithm that connects starting points to target points by a line segment (brown) without any of these segments intersecting (red) and while minimizing the cumulative length of all segments.

My first effort in C++ was permuting all possible states, find intersection-free states, and among those the state with minimum total segment length. But I think there has to be a better way.

enter image description here

Any idea? Or good keywords for search?

This is Minimum Euclidean Matching in 2D. The link contains a bibliography of what's known about this problem. Given that you want to minimize the total length, the non-intersection constraint is redundant, as the length of any pair of segments that cross can be reduced by uncrossing them.

Why is jQuery.ready recommended when it’s so slow?

22 votes

I have asked a similar question before but I never made my point exactly clear, or at least I think it’s such a relevant question that it’s worth to bring it up and see if anyone can give some insightful thoughts.

When using jQuery, many of us use the jQuery.ready function to execute an init when the DOM has loaded. It has become the de-facto standard way of adding DOM manipulation programs to a web page using jQuery. A related event exists natively some browsers, but jQuery emulates it in other browsers, such as some IE versions. Example:

<head>
<script>
    var init = function() { alert('hello world'); };
    $.ready(init);
</script>

Now, all our tests show that this event can be quite slow. It’s not nearly as slow as window.onload, but it’s still often around 100 ms delay before execution. If FF it can be up to 200-300 ms, especially on refresh.

These are some very important milliseconds, because this is the amount of time the initial layout is shown before any DOM manipulations are made (such as hiding a dropdown). Many times, a layout "flicker" is mainly caused by using a slow DOM ready event, forcing programmers to hide elements using CSS instead and potentially making it less accessible.

Now if we instead place an init function in a script tag before closing the body tag, it will be executed much faster, usually around half the time but sometimes even faster:

<head>
<script>
    var init = function() { alert('hello world'); };
</script>
</head>
<body>
<!-- some HTML -->
<script>init();</script>
</body>

A simple test page that proves the differences: http://jsbin.com/aqifon/10

I mean, we are not talking about barely noticeable differences as some of the "optimization police" promotes when it comes to using effective selectors. We are talking about some major delays when doing DOM manipulations onload. Trying this example in FF, domready can sometimes be more than 100 times slower (300ms vs 2ms).

Now to my question: Why is jQuery.ready recommended to use when it’s obviously much slower that other alternatives? And what are the drawbacks of calling the init before closing the BODY vs using jQuery.ready? It’s arguably more "safe" to use domReady, but in what context is it more safe than the other option? (I’m thinking stuff like document.write and deferred scripts) We have used the BODY way for nearly 5 years on many client sites, and we never run into any problems. It’s just a lot faster.

I’m also wondering, since there is so much fuzz about jsPerf and optimizing selectors for a couple of ms per 10000 executions, how come there is not much talk about this? It’s basically the first delay the user faces, and it seems to be fairly simple to slice 50-100 ms on each page load...

To the point first:

No, there is no disadvantage in calling you init before closing the <body>. It will as you have noticed perform better that relying on $.ready() and will also work with all the browsers flawlessly (even on IE).

Now, there are however reasons to use $.ready(), which in your case they do not probably apply:

  1. $.ready() makes it easy for developers to do stuff in the right order. In particular, the critical thing is to not reference DOM elements that have not been loaded. While this is simple enough, lots of developers still find it confusing. $.ready() is a no-brainer, albeit a slow one.
  2. In you have say several scripts that need to init(), it is not necessarily easy/convenient to manually do that at the end of your body. It requires discipline and knowledge of what these scripts do. In particular you will often see $.ready() in libraries dependent on jQuery, since it makes things work no matter what way developers will use to load the libs.
  3. With Asynchronous Module Definition (for instance require.js) becoming popular as a way to load your javascript, the end of <body/> method is not guaranteed.

The most efficient way to search for an array of strings in another string

16 votes

I have a large arrray of strings that looks something like this: String temp[] = new String[200000].

I have another String, let's call it bigtext. What I need to do is iterate through each entry of temp, checking to see if that entry is found in bigtext and then do some work based on it. So, the skeletal code looks something like this:

for (int x = 0; x < temp.length; x++) {
  if (bigtext.indexOf(temp[x]) > -1 {

  //do some stuff
  } else continue;
}

Because there are so many entries in temp and there are many instances of bigtext as well, I want to do this in the most efficient way. I am wondering if what I've outlined is the most efficient way to iterate through this search of if there are better ways to do this.

Thanks,

Elliott

I think you're looking for an algorithm like Rabin-Karp or Aho–Corasick which are designed to search in parallel for a large number of sub-strings in a text.

Why is math.factorial much slower in Python 2.x than 3.x?

11 votes

I get the following results on my machine:

Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('factorial(10000)', 'from math import factorial', number=100)
1.9785256226699202
>>>

Python 2.7.2 (default, Jun 12 2011, 15:08:59) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.timeit('factorial(10000)', 'from math import factorial', number=100)
9.403801111593792
>>>

I thought this might have something to do with int/long conversion, but factorial(10000L) isn't any faster in 2.7.

Python 2 uses the naive factorial algorithm:

1121 for (i=1 ; i<=x ; i++) {
1122     iobj = (PyObject *)PyInt_FromLong(i);
1123     if (iobj == NULL)
1124         goto error;
1125     newresult = PyNumber_Multiply(result, iobj);
1126     Py_DECREF(iobj);
1127     if (newresult == NULL)
1128         goto error;
1129     Py_DECREF(result);
1130     result = newresult;
1131 }

Python 3 uses the divide-and-conquer factorial algorithm:

1229 * factorial(n) is written in the form 2**k * m, with m odd. k and m are
1230 * computed separately, and then combined using a left shift.

See the Python Bugtacker issue for the discussion. Thanks DSM for pointing that out.

Save object states in .data or attr - Performance vs CSS?

9 votes

In response to my answer yesterday about rotating an Image, Jamund told me to use .data() instead of .attr()

First I thought that he is right, but then I thought about a bigger context... Is it always better to use .data() instead of .attr()? I looked in some other posts like what-is-better-data-or-attr or jquery-data-vs-attrdata

The answers were not satisfactory for me...

So I moved on and edited the example by adding CSS. I thought it might be useful to make a different Style on each image if it rotates. My style was the following:

.rp[data-rotate="0"] {
    border:10px solid #FF0000;
}
.rp[data-rotate="90"] {
    border:10px solid #00FF00;
}
.rp[data-rotate="180"] {
    border:10px solid #0000FF;
}
.rp[data-rotate="270"] {
    border:10px solid #00FF00;
}

Because design and coding are often separated, it could be a nice feature to handle this in CSS instead of adding this functionality into JavaScript. Also in my case the data-rotate is like a special state which the image currently has. So in my opinion it make sense to represent it within the DOM.

I also thought this could be a case where it is much better to save with .attr() then with .data(). Never mentioned before in one of the posts I read.

But then i thought about performance. Which function is faster? I built my own test following:

<!DOCTYPE HTML>
<html>
<head>
<title>test</title>
<script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.7.1/jquery.min.js"></script>
<script type="text/javascript">
function runfirst(dobj,dname){
  console.log("runfirst "+dname);
  console.time(dname+"-attr");
  for(i=0;i<10000;i++){
    dobj.attr("data-test","a"+i);
  }
  console.timeEnd(dname+"-attr");
  console.time(dname+"-data");
  for(i=0;i<10000;i++){
    dobj.data("data-test","a"+i);
  }
  console.timeEnd(dname+"-data");
}
function runlast(dobj,dname){
  console.log("runlast "+dname);
  console.time(dname+"-data");
  for(i=0;i<10000;i++){
    dobj.data("data-test","a"+i);
  }
  console.timeEnd(dname+"-data");
  console.time(dname+"-attr");
  for(i=0;i<10000;i++){
    dobj.attr("data-test","a"+i);
  }
  console.timeEnd(dname+"-attr");  
}
$().ready(function() {
  runfirst($("#rp4"),"#rp4");
  runfirst($("#rp3"),"#rp3");
  runlast($("#rp2"),"#rp2");
  runlast($("#rp1"),"#rp1");
});
</script>
</head>
<body>
    <div id="rp1">Testdiv 1</div>
    <div id="rp2" data-test="1">Testdiv 2</div>
    <div id="rp3">Testdiv 3</div>
    <div id="rp4" data-test="1">Testdiv 4</div>
</body>
</html>

It should also show if there is a difference with a predefined data-test or not.

One result was this:

runfirst #rp4
#rp4-attr: 515ms
#rp4-data: 268ms
runfirst #rp3
#rp3-attr: 505ms
#rp3-data: 264ms
runlast #rp2
#rp2-data: 260ms
#rp2-attr: 521ms
runlast #rp1
#rp1-data: 284ms
#rp1-attr: 525ms

So the .attr() function did always need more time than the .data() function. This is an argument for .data() I thought. Because performance is always an argument!

Then I wanted to post my results here with some questions, and in the act of writing I compared with the questions Stack Overflow showed me (similar titles)

And true enough, there was one interesting post about performance

I read it and run their example. And now I am confused! This test showed that .data() is slower then .attr() !?!! Why is that so?

First I thought it is because of a different jQuery library so I edited it and saved the new one. But the result wasn't changing...

So now my questions to you:

  • Why are there some differences in the performance in these two examples?
  • Would you prefer to use data- HTML5 attributes instead of data, if it represents a state? Although it wouldn't be needed at the time of coding? Why - Why not?

Now depending on the performance:

  • Would performance be an argument for you using .attr() instead of data, if it shows that .attr() is better? Although data is meant to be used for .data()?

UPDATE 1:
I did see that without overhead .data() is much faster. Misinterpreted the data :) But I'm more interested in my second question. :)

Would you prefer to use data- HTML5 attributes instead of data, if it represents a state? Although it wouldn't be needed at the time of coding? Why - Why not?

Are there some other reasons you can think of, to use .attr() and not .data()? e.g. interoperability? because .data() is jquery style and HTML Attributes can be read by all...

UPDATE 2:

As we see from T.J Crowder's speed test in his answer attr is much faster then data! which is again confusing me :) But please! Performance is an argument, but not the highest! So give answers to my other questions please too!

UPDATE 3:

My test seems to be false because of the fire-bug I used while testing! The same file in chrome listed attr faster and a second test on jsperf also says attr is faster

This performance part of the question screams of premature optimization; see below. (Lest you get the wrong idea: I too am frequently guilty of wondering about the same sort of premature optimization question.)

But getting performance out of the way (other points addressed below the graph): As far as I can see, attr is faster than data in jQuery 1.7.1: http://jsperf.com/jquery-setting-attr-vs-data This surprises me. Not that it's remotely likely to matter.

Gratuitous bar graph (longer lines = faster performance):

Gratuitous bar graph from jsperf

Are there some other reasons you can think of, to use .attr() and not .data()?

At least a couple come to mind:

  1. The advantage of data is that it doesn't have to write to the element every time; you only write to the actual element the first time, and from then on jQuery is just updating a value in a JavaScript object it maintains in a separate object cache (connected to the element via a key). (I'm not sure why it's slower than attr; perhaps because of the indirection.)

  2. One thing I dislike about data is that it's not symmetrical: The first time you access data on an element, the data object is seeded with data-* attributes from the element; but from there on out, there is no connection between the two.

    Example (live copy | live source):

    var target = $("#target");
    display("data('foo'): " + target.data("foo"));
    display("data-foo: " + target.attr("data-foo"));
    display("Setting data('foo')");
    target.data("foo", "updated data('foo')");
    display("data('foo'): " + target.data("foo"));
    display("data-foo: " + target.attr("data-foo"));
    display("Setting data-foo");
    target.attr("data-foo", "updated data-foo");
    display("data('foo'): " + target.data("foo"));
    display("data-foo: " + target.attr("data-foo"));
    

    Assuming the #target element starts out with data-foo="bar", the output is:

    data('foo'): bar
    data-foo: bar
    Setting data('foo')
    data('foo'): updated data('foo')
    data-foo: bar
    Setting data-foo
    data('foo'): updated data('foo')
    data-foo: updated data-foo

    That can be confusing and surprising. The way you have to think about it is that the data-* attributes are default values only. I just don't like how they're so dependent on whether you've called data before or not; unless you never write to the data-* attribute directly, you can't be sure what value data will get (the original from the markup, or a value you updated later before you called data). It seems a bit chaotic to me, but if you set yourself rules (never write to data-* attributes directly and only ever use data, for instance), you can avoid the chaos.

  3. When you use attr, you can only store strings. When you use data, you can store any JavaScript value or object reference.


Because performance is always an argument!

Not in 2012. :-) Or at least, it's a lot lower down the list relative to other arguments than it used to be absent a specific, demonstrable performance problem.

Let's look at your runfirst #rp4 results: 10k iterations of attr took 515ms; 10k iterations of data took 268ms. That's 51.5 usec (microseconds, millionths of a second) each vs. 26.8 usec each. So you're wondering whether to use data if it saves you 24.7 usec per operation. Humans perceive things on the order of tenths of seconds. So for it to matter, you have to do this op roughly 4,000 times in a tight loop for a human to notice the difference. That's just not even close to worth worrying about, even in a mousemove handler.

If you're into that kind of territory (4,000/second in a tight loop), you'll probably want to avoid storing the information on the element at all.

Dynamic vs Inline RegExp performance in JavaScript

8 votes

I stumbled upon that performance test, saying that RegExps in JavaScript are not necessarily slow: http://jsperf.com/regexp-indexof-perf

There's one thing i didn't get though: two cases involve something that i believed to be exactly the same:

RegExp('(?:^| )foo(?: |$)').test(node.className);

And

/(?:^| )foo(?: |$)/.test(node.className);

In my mind, those two lines were exactly the same, the second one being some kind of shorthand to create a RegExp object. Still, it's twice faster than the first.

Those cases are called "dynamic regexp" and "inline regexp".

Could someone help me understand the difference (and the performance gap) between these two?

in the second case, the regular expression object is created during the parsing of the language, and in the first case, the RegExp class constructor has to parse an arbitrary string.

Does C code run faster?

7 votes

Is there any performance gain in calling C code from Objective-C?

I've read somewhere that message passing is slower compared to other languages that use function calling. So if I call a C function from Objective-C code, am I avoiding the messaging overhead?

When optimizing for performance, is it recommended practice to code the most critical functions and procedures in C instead of using Objective-C objects?

EDIT:
Given the ammount of answers warning about premature optimization and code readability, I want to clarify that I was not thinking on regular applications, but very specific ones such as:

  • Graphics
  • Encryption or compression algorithms.
  • Maths

And in general, fuctions or procedures that do not need OO design and are intended to be called many times with parameters.

This is a benchmark that compares messaging to calling C functions. Here are the results of calling different implementations of the fibonacci function about 1.4 billion times.

Message Passing 23.495 seconds
IMP Calling     16.033 seconds
C Function      9.709 seconds

So yes, there is some overhead when calling an Objective C method. But, except in some situations, this is not what will impact the performance of your app. In fact, messaging is still more efficient than most other operations such as floating-point division.

In addition, the majority apps spend most time waiting for user input, downloading data, etc.

Two sql for Sorted timestamp date

7 votes

I have 98w rows data. When I want sort my data with pub_time, I found an interest thing.

Here is the SQL:

select * 
from t_p_blog_article_info t  
order by t.pub_time desc

It cost 19s.

select * 
from t_p_blog_article_info t 
where t.pub_time > to_date( '1900-01-01 01:00:00', 'yyyy-mm-dd   hh24:mi:ss ')  
order by t.pub_time desc

It cost 0.2s.

I want to know, why?

You probably have an index on pub_time on your table.

Therefore, the second query can make use of this index to return only those records with non-null dates after the specified date, whereas the first query has to query the whole table.

Slow Regex performance

7 votes

The code below contains a regular expression designed to extract a C# string literal but the performance of the regex matching for input strings of more than a few characters is woeful.

class Program
{
   private static void StringMatch(string s)
    {
        // regex: quote, zero-or-more-(zero-or-more-non-backslash-quote, optional-backslash-anychar), quote
        Match m = Regex.Match(s, "\"(([^\\\\\"]*)(\\\\.)?)*\"");
        if (m.Success)
            Trace.WriteLine(m.Value);
        else
            Trace.WriteLine("no match");
    }

    public static void Main()
    {
        // this first string is unterminated (so the match fails), but it returns instantly
        StringMatch("\"OK");

        // this string is terminated (the match succeeds)
        StringMatch("\"This is a longer terminated string - it matches and returns instantly\"");

        // this string is unterminated (so the match will fail), but it never returns
        StringMatch("\"This is another unterminated string and takes FOREVER to match");
    }
}

I can refactor the regex into a different form, but can anyone offer an explanation why the performance is so bad?

You're running into catastrophic backtracking:

Let's simplify the regex a bit (without the escaped quotes and without the second optional group because, as in your comment, it's irrelevant for the tested strings):

"(([^\\"]*))*" 

([^\\"]*) matches any string except quotes or backslashes. This again is enclosed in an optional group that can repeat any number of times.

Now for the string "ABC, the regex engine needs to try the following permutations:

  • ", ABC
  • ", ABC, <empty string>
  • ", AB, C
  • ", AB, C, <empty string>
  • ", AB, <empty string>, C
  • ", AB, <empty string>, C, <empty string>
  • ", <empty string>, AB, C
  • ", <empty string>, AB, C, <empty string>
  • ", <empty string>, AB, <empty string>, C, <empty string>
  • ", <empty string>, AB, <empty string>, C
  • ", A, BC
  • ", A, BC, <empty string>
  • ", A, <empty string>, BC
  • ", <empty string>, A, BC
  • etc.
  • ", A, B, C
  • ", A, B, C, <empty string>
  • ", A, B, <empty string>, C
  • etc. etc.

each of which then fails because there is no following ".

Also, you're only testing for substrings instead of forcing the regex to match the entire string. And you usually want to use verbatim strings for regexes to cut down on the number of backslashes you need. How about this:

foundMatch = Regex.IsMatch(subjectString, 
    @"\A     # Start of the string
    ""       # Match a quote
    (?:      # Either match...
     \\.     # an escaped character
    |        # or
     [^\\""] # any character except backslash or quote
    )*       # any number of times
    ""       # Match a quote
    \Z       # End of the string", 
    RegexOptions.IgnorePatternWhitespace);