Best performance questions in October 2011

Does IF perform better than IF-ELSE?

51 votes

Which one of these blocks of code performs better, and which one of them is more readable? I'd guess the gain would be negligible, particularly in the second block. I am just curious.

Block #1

string height;
string width;
if (myFlag == 1)
{
    height = "60%";
    width = "60%";
}
else
{
    height = "80%";
    width = "80%";
}

Block #2

string height = "80%";
string width = "80%";
if (myFlag == 1)
{
    height = "60%";
    width = "60%";
}

Updated

The results when i tested the above code were that both the blocks performed the same

Block #1

myFlag = 1:   3 Milliseconds
myFlag = 0:   3 Milliseconds

Block #2

myFlag = 1:   3 Milliseconds
myFlag = 0:   3 Milliseconds

But one important thing i noticed here(thanks to Matthew Steeples answer here) is that since the block of code that i have tested has not used the variables height and width except for assignment in the if-else and if blocks of Code Block-1 and 2 respectively, the compiler has optimized the IL code by completely removing the if and if-else blocks in question thus showing invalid results for our test here.

I have updated both the code blocks to write the values of both height and width to a file thus using them again and forcing the compiler to run our test blocks(I hope), but you can observe from the code that the actual file writing part does not effect our test results

This is the updated results, C# and IL Code

Results

Block #1

myFlag = 1:   1688 Milliseconds
myFlag = 0:   1664 Milliseconds

Block #2

myFlag = 1:   1700 Milliseconds
myFlag = 0:   1677 Milliseconds

C#.net Code

Block #1

    public long WithIfAndElse(int myFlag)
    {
        Stopwatch myTimer = new Stopwatch();
        string someString = "";
        myTimer.Start();
        for (int i = 0; i < 1000000; i++)
        {
            string height;
            string width;
            if (myFlag == 1)
            {
                height = "60%";
                width = "60%";
            }
            else
            {
                height = "80%";
                width = "80%";
            }
            someString = "Height: " + height + Environment.NewLine + "Width: " + width;
        }
        myTimer.Stop();
        File.WriteAllText("testifelse.txt", someString);
        return myTimer.ElapsedMilliseconds;
    }

Block #2

    public long WithOnlyIf(int myFlag)
    {
         Stopwatch myTimer = new Stopwatch();
        string someString = "";
        myTimer.Start();
        for (int i = 0; i < 1000000; i++)
        {
            string height = "80%";
            string width = "80%";
            if (myFlag == 1)
            {
                height = "60%";
                width = "60%";
            }
            someString = "Height: " + height + Environment.NewLine + "Width: " + width;
        }
        myTimer.Stop();
        File.WriteAllText("testif.txt", someString);
        return myTimer.ElapsedMilliseconds;
    }

IL Code Generated By ildasm.exe

Block #1

.method public hidebysig instance int64  WithIfAndElse(int32 myFlag) cil managed
{
  // Code size       144 (0x90)
  .maxstack  3
  .locals init ([0] class [System]System.Diagnostics.Stopwatch myTimer,
           [1] string someString,
           [2] int32 i,
           [3] string height,
           [4] string width,
           [5] string[] CS$0$0000)
  IL_0000:  newobj     instance void [System]System.Diagnostics.Stopwatch::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldstr      ""
  IL_000b:  stloc.1
  IL_000c:  ldloc.0
  IL_000d:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Start()
  IL_0012:  ldc.i4.0
  IL_0013:  stloc.2
  IL_0014:  br.s       IL_0070
  IL_0016:  ldarg.1
  IL_0017:  ldc.i4.1
  IL_0018:  bne.un.s   IL_0029
  IL_001a:  ldstr      "60%"
  IL_001f:  stloc.3
  IL_0020:  ldstr      "60%"
  IL_0025:  stloc.s    width
  IL_0027:  br.s       IL_0036
  IL_0029:  ldstr      "80%"
  IL_002e:  stloc.3
  IL_002f:  ldstr      "80%"
  IL_0034:  stloc.s    width
  IL_0036:  ldc.i4.5
  IL_0037:  newarr     [mscorlib]System.String
  IL_003c:  stloc.s    CS$0$0000
  IL_003e:  ldloc.s    CS$0$0000
  IL_0040:  ldc.i4.0
  IL_0041:  ldstr      "Height: "
  IL_0046:  stelem.ref
  IL_0047:  ldloc.s    CS$0$0000
  IL_0049:  ldc.i4.1
  IL_004a:  ldloc.3
  IL_004b:  stelem.ref
  IL_004c:  ldloc.s    CS$0$0000
  IL_004e:  ldc.i4.2
  IL_004f:  call       string [mscorlib]System.Environment::get_NewLine()
  IL_0054:  stelem.ref
  IL_0055:  ldloc.s    CS$0$0000
  IL_0057:  ldc.i4.3
  IL_0058:  ldstr      "Width: "
  IL_005d:  stelem.ref
  IL_005e:  ldloc.s    CS$0$0000
  IL_0060:  ldc.i4.4
  IL_0061:  ldloc.s    width
  IL_0063:  stelem.ref
  IL_0064:  ldloc.s    CS$0$0000
  IL_0066:  call       string [mscorlib]System.String::Concat(string[])
  IL_006b:  stloc.1
  IL_006c:  ldloc.2
  IL_006d:  ldc.i4.1
  IL_006e:  add
  IL_006f:  stloc.2
  IL_0070:  ldloc.2
  IL_0071:  ldc.i4     0xf4240
  IL_0076:  blt.s      IL_0016
  IL_0078:  ldloc.0
  IL_0079:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Stop()
  IL_007e:  ldstr      "testifelse.txt"
  IL_0083:  ldloc.1
  IL_0084:  call       void [mscorlib]System.IO.File::WriteAllText(string,
                                                                   string)
  IL_0089:  ldloc.0
  IL_008a:  callvirt   instance int64 [System]System.Diagnostics.Stopwatch::get_ElapsedMilliseconds()
  IL_008f:  ret
} // end of method frmResearch::WithIfAndElse

Block #2

.method public hidebysig instance int64  WithOnlyIf(int32 myFlag) cil managed
{
  // Code size       142 (0x8e)
  .maxstack  3
  .locals init ([0] class [System]System.Diagnostics.Stopwatch myTimer,
           [1] string someString,
           [2] int32 i,
           [3] string height,
           [4] string width,
           [5] string[] CS$0$0000)
  IL_0000:  newobj     instance void [System]System.Diagnostics.Stopwatch::.ctor()
  IL_0005:  stloc.0
  IL_0006:  ldstr      ""
  IL_000b:  stloc.1
  IL_000c:  ldloc.0
  IL_000d:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Start()
  IL_0012:  ldc.i4.0
  IL_0013:  stloc.2
  IL_0014:  br.s       IL_006e
  IL_0016:  ldstr      "80%"
  IL_001b:  stloc.3
  IL_001c:  ldstr      "80%"
  IL_0021:  stloc.s    width
  IL_0023:  ldarg.1
  IL_0024:  ldc.i4.1
  IL_0025:  bne.un.s   IL_0034
  IL_0027:  ldstr      "60%"
  IL_002c:  stloc.3
  IL_002d:  ldstr      "60%"
  IL_0032:  stloc.s    width
  IL_0034:  ldc.i4.5
  IL_0035:  newarr     [mscorlib]System.String
  IL_003a:  stloc.s    CS$0$0000
  IL_003c:  ldloc.s    CS$0$0000
  IL_003e:  ldc.i4.0
  IL_003f:  ldstr      "Height: "
  IL_0044:  stelem.ref
  IL_0045:  ldloc.s    CS$0$0000
  IL_0047:  ldc.i4.1
  IL_0048:  ldloc.3
  IL_0049:  stelem.ref
  IL_004a:  ldloc.s    CS$0$0000
  IL_004c:  ldc.i4.2
  IL_004d:  call       string [mscorlib]System.Environment::get_NewLine()
  IL_0052:  stelem.ref
  IL_0053:  ldloc.s    CS$0$0000
  IL_0055:  ldc.i4.3
  IL_0056:  ldstr      "Width: "
  IL_005b:  stelem.ref
  IL_005c:  ldloc.s    CS$0$0000
  IL_005e:  ldc.i4.4
  IL_005f:  ldloc.s    width
  IL_0061:  stelem.ref
  IL_0062:  ldloc.s    CS$0$0000
  IL_0064:  call       string [mscorlib]System.String::Concat(string[])
  IL_0069:  stloc.1
  IL_006a:  ldloc.2
  IL_006b:  ldc.i4.1
  IL_006c:  add
  IL_006d:  stloc.2
  IL_006e:  ldloc.2
  IL_006f:  ldc.i4     0xf4240
  IL_0074:  blt.s      IL_0016
  IL_0076:  ldloc.0
  IL_0077:  callvirt   instance void [System]System.Diagnostics.Stopwatch::Stop()
  IL_007c:  ldstr      "testif.txt"
  IL_0081:  ldloc.1
  IL_0082:  call       void [mscorlib]System.IO.File::WriteAllText(string,
                                                                   string)
  IL_0087:  ldloc.0
  IL_0088:  callvirt   instance int64 [System]System.Diagnostics.Stopwatch::get_ElapsedMilliseconds()
  IL_008d:  ret
} // end of method frmResearch::WithOnlyIf

So we can say that the IF-Else block(Block #1) runs faster than the if block(Block #2) as pointed by many in this forum.

Test Results

10,000,000 iterations of Block 1

myFlag = 0:    23.8ns per iteration
myFlag = 1:    23.8ns per iteration

10,000,000 iterations of Block 2

myFlag = 0:    23.8ns per iteration
myFlag = 1:    46.8ns per iteration

Block 2 is 96% slower than Block 1. Makes sense, since Block 2 does twice the work in the pessimistic case.

i prefer either case, depending on the situation. If myFlag is rarely ever 1, then it want it to stand out as the edge case that we have to handle. If both are equally likely, i want the if-else syntax. But that's preference, not fact.


Decades ago, the intel 80286 dual pipeline would stall if a conditional jump was taken, rather than falling through to the next instruction. By the time of the Pentium that went away; the CPU pre-fetches both branch paths. But in the back of my mind i still have a twinge of fear whenever i write code that has the most common outcome in the else clause. Every time i have to remind myself that it doesn't matter anymore.


Int32 reps = 10000000;

private void Block1(int myFlag)
{
    String width;
    String height;

    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < reps; i++)
    {
        if (myFlag == 1)
        {
            width = String.Format("{0:g}%", 60);
            height = String.Format("{0:g}%", 60);
        }
        else
        {
            width = String.Format("{0:g}%", 80);
            height = String.Format("{0:g}%", 80);
        }
    }
    sw.Stop();
    Double time = (Double)sw.Elapsed.Ticks / Stopwatch.Frequency * 1000000000.0 / reps;
    MessageBox.Show(time.ToString() + " ns");
}

private void Block2(int myFlag)
{
    String width;
    String height;

    Stopwatch sw = new Stopwatch();
    sw.Start();
    for (int i = 0; i < reps; i++)
    {
        width = String.Format("{0:g}%", 80);
        height = String.Format("{0:g}%", 80);
        if (myFlag == 1)
        {
            width = String.Format("{0:g}%", 60);
            height = String.Format("{0:g}%", 60);
        }
    }
    sw.Stop();

    Double time = (Double)sw.Elapsed.Ticks / Stopwatch.Frequency * 1000000000.0 / reps;
    MessageBox.Show(time.ToString() + " ns");
}
  • String.Format makes IF 96% slower
  • GetPercentageString(0.60) makes IF 96% slower

const
   reps = 10000000;

procedure Block1(myflag: Integer);
var
   width, height: string;
   i: Integer;
   t1, t2: Int64;
   time: Extended;
   freq: Int64;
begin
   QueryPerformanceCounter(t1);
   for i := 1 to reps do
   begin
      if myFlag = 1 then
      begin
         width := '60%';
         height := '60%';
      end
      else
      begin
         width := '80%';
         height := '80%';
      end;
   end;
   QueryPerformanceCounter(t2);
   QueryPerformanceFrequency(freq);

   time := (t2-t1) / freq * 1000000000 / reps;
   ShowMessage(FloatToStr(time)+ 'ns');
end;

procedure Block2(myflag: Integer);
var
   width, height: string;
   i: Integer;
   t1, t2: Int64;
   time: Extended;
   freq: Int64;
begin
   QueryPerformanceCounter(t1);
   for i := 1 to reps do
   begin
      width := '80%';
      height := '80%';
      if myFlag = 1 then
      begin
         width := '60%';
         height := '60%';
      end;
   end;
   QueryPerformanceCounter(t2);
   QueryPerformanceFrequency(freq);

   time := (t2-t1) / freq * 1000000000 / reps;
   ShowMessage(FloatToStr(time)+ 'ns');
end;

Doing twice the amount of work takes roughly twice the amount of time.

Answer: IF does not perform better than IF-ELSE.


enter image description here

How to measure elapsed time in C# and C++

18 votes

I have a simple C# and C++ code that computes a sum of dot products.

The C# code is:

using System;

namespace DotPerfTestCS
{
    class Program
    {
        struct Point3D
        {
            public double X, Y, Z;

            public Point3D(double x, double y, double z)
            {
                X = x;
                Y = y;
                Z = z;
            }
        }

        static void RunTest()
        {
            unchecked
            {
                const int numPoints = 100000;
                const int numIters = 100000000;

                Point3D[] pts = new Point3D[numPoints];
                for (int i = 0; i < numPoints; i++) pts[i] = new Point3D(i, i + 1, i + 2);

                var begin = DateTime.Now;
                double sum = 0.0;
                var u = new Point3D(1, 2, 3);
                for (int i = 0; i < numIters; i++)
                {
                    var v = pts[i % numPoints];
                    sum += u.X * v.X + u.Y * v.Y + u.Z * v.Z;
                }
                var end = DateTime.Now;
                Console.WriteLine("Sum: {0} Time elapsed: {1} ms", sum, (end - begin).TotalMilliseconds);
            }
        }

        static void Main(string[] args)
        {
            for (int i = 0; i < 5; i++) RunTest();
        }
    }
}

and the C++ is

#include <iostream>
#include <vector>
#include <time.h>

using namespace std;

typedef struct point3d
{
    double x, y, z;

    point3d(double x, double y, double z)
    {
        this->x = x;
        this->y = y;
        this->z = z;
    }
} point3d_t;

double diffclock(clock_t clock1,clock_t clock2)
{
    double diffticks=clock1-clock2;
    double diffms=(diffticks*10)/CLOCKS_PER_SEC;
    return diffms;
}

void runTest()
{
    const int numPoints = 100000;
    const int numIters = 100000000;

    vector<point3d_t> pts;
    for (int i = 0; i < numPoints; i++) pts.push_back(point3d_t(i, i + 1, i + 2));

    auto begin = clock();
    double sum = 0.0, dum = 0.0;
    point3d_t u(1, 2, 3);
    for (int i = 0; i < numIters; i++) 
    {
        point3d_t v = pts[i % numPoints];
        sum += u.x * v.x + u.y * v.y + u.z * v.z;
    }
    auto end = clock();
    cout << "Sum: " << sum << " Time elapsed: " << double(diffclock(end,begin)) << " ms" << endl;

}

int main()
{
    for (int i = 0; i < 5; i++) runTest();
    return 0;
}

The C# version (Release x86 with optimization on, x64 is even slower) output is

Sum: 30000500000000 Time elapsed: 551.0299 ms 
Sum: 30000500000000 Time elapsed: 551.0315 ms 
Sum: 30000500000000 Time elapsed: 552.0294 ms
Sum: 30000500000000 Time elapsed: 551.0316 ms 
Sum: 30000500000000 Time elapsed: 550.0315 ms

while C++ (default VS2010 Release build settings) yields

Sum: 3.00005e+013 Time elapsed: 4.27 ms
Sum: 3.00005e+013 Time elapsed: 4.27 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms

Now I would expect the C# code would be a little slower. But 130 times slower seems way too much to me. Can someone please explain to me what is going on here?

EDIT

I am not a C++ programmer and I just took the diffclock code somewhere from the internet without really checking if it's correct.

Using std::difftime the C++ results are

Sum: 3.00005e+013 Time elapsed: 457 ms
Sum: 3.00005e+013 Time elapsed: 452 ms
Sum: 3.00005e+013 Time elapsed: 451 ms
Sum: 3.00005e+013 Time elapsed: 451 ms
Sum: 3.00005e+013 Time elapsed: 451 ms

which seems about right.

Your diffclock code is wrong.

If you change your C++ code to use the std::clock and std::difftime it appears to show the actual runtime:

#include <iostream>
#include <vector>
#include <ctime>

using namespace std;

typedef struct point3d
{
    double x, y, z;

    point3d(double x, double y, double z)
    {
        this->x = x;
        this->y = y;
        this->z = z;
    }
} point3d_t;

void runTest()
{
    const int numPoints = 100000;
    const int numIters = 100000000;

    vector<point3d_t> pts;
    for (int i = 0; i < numPoints; i++) pts.push_back(point3d_t(i, i + 1, i + 2));

    auto begin = clock();
    double sum = 0.0, dum = 0.0;
    point3d_t u(1, 2, 3);
    for (int i = 0; i < numIters; i++) 
    {
        point3d_t v = pts[i % numPoints];
        sum += u.x * v.x + u.y * v.y + u.z * v.z;
    }
    auto end = clock();
    cout << "Sum: " << sum << " Time elapsed: " << double(std::difftime(end,begin)) << " ms" << endl;

}

int main()
{
    for (int i = 0; i < 5; i++) runTest();
    return 0;
}

Results:

Sum: 3.00005e+013 Time elapsed: 346 ms
Sum: 3.00005e+013 Time elapsed: 344 ms
Sum: 3.00005e+013 Time elapsed: 346 ms
Sum: 3.00005e+013 Time elapsed: 347 ms
Sum: 3.00005e+013 Time elapsed: 347 ms

That is running the application in default release mode optimizations, outside of vs2010.

EDIT

As others have pointed out, in C++ using clock() is not the most accurate way to time a function (as in C#, Stopwatch is better than DateTime).

If you're using windows, you can always use the QueryPerformanceCounter for high-resolution timing.

Regular Expressions in C# running slowly

13 votes

I have been doing a little work with regex over the past week and managed to make a lot of progress, however, I'm still fairly n00b. I have a regex written in C#:

string isMethodRegex = 
    @"\b(public|private|internal|protected)?\s*(static|virtual|abstract)?"+
    @"\s*(?<returnType>[a-zA-Z\<\>_1-9]*)\s(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>(([a-zA-Z\[\]\<\>_1-9]*\s*[a-zA-Z_1-9]*\s*)[,]?\s*)+)\)";
IsMethodRegex = new Regex(isMethodRegex);

For some reason, when calling the regular expression IsMethodRegex.IsMatch() it hangs for 30+ seconds on the following string:

"\t * Returns collection of active STOP transactions (transaction type 30) "

Does anyone how the internals of Regex works and why this would be so slow on matching this string and not others. I have had a play with it and found that if I take out the * and the parenthesis then it runs fine. Perhaps the regular expression is poorly written?

Any help would be so greatly appreciated.

EDIT: I think the performance issue comes in the way <parameters> matching group is done. I have rearranged to match a first parameter, then any number of successive parameters, or optionally none at all. Also I have changed the \s* between parameter type and name to \s+ (I think this was responsible for a LOT of backtracking because it allows no spaces, so that object could match as obj and ect with \s* matching no spaces) and it seems to run a lot faster:

string isMethodRegex = 
    @"\b(public|private|internal|protected)?\s*(static|virtual|abstract)?"+
    @"\s*(?<returnType>[a-zA-Z\<\>_1-9]*)\s*(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>((\s*[a-zA-Z\[\]\<\>_1-9]*\s+[a-zA-Z_1-9]*\s*)"+
    @"(\s*,\s*[a-zA-Z\[\]\<\>_1-9]*\s+[a-zA-Z_1-9]*\s*)*\s*))?\)";

EDIT: As duly pointed out by @Dan, the following is simply because the Regex can exit early.

This is indeed a really bizarre situation, but if I remove the two optional matching at the beginning (for public/private/internal/protected and static/virtual/abstract) then it starts to run almost instantaneously again:

string isMethodRegex = 
    @"\b(public|private|internal|protected)\s*(static|virtual|abstract)"+
    @"(?<returnType>[a-zA-Z\<\>_1-9]*)\s(?<method>[a-zA-Z\<\>_1-9]+)\s*\"+
    @"((?<parameters>(([a-zA-Z\[\]\<\>_1-9]*\s*[a-zA-Z_1-9]*\s*)[,]?\s*)+)\)";
var IsMethodRegex = new Regex(isMethodRegex);

string s = "\t * Returns collection of active STOP transactions (transaction type 30) ";

Console.WriteLine(IsMethodRegex.IsMatch(s));

Technically you could split into four separate Regex's for each possibility to deal with this particular situation. However, as you attempt to deal with more and more complicated scenarios, you will likely run into this performance issue again and again, so this is probably not the ideal approach.

Why is this Javascript much *slower* than its jQuery equivalent?

11 votes

I have a HTML list of about 500 items and a "filter" box above it. I started by using jQuery to filter the list when I typed a letter (timing code added later):

$('#filter').keyup( function() {
    var jqStart = (new Date).getTime();

    var search = $(this).val().toLowerCase();
    var $list = $('ul.ablist > li');

    $list.each( function() {
        if ( $(this).text().toLowerCase().indexOf(search) === -1 )
            $(this).hide();
        else
            $(this).show();
    } );

    console.log('Time: ' + ((new Date).getTime() - jqStart));
} );

However, there was a couple of seconds delay after typing each letter (particularly the first letter). So I thought it may be slightly quicker if I used plain Javascript (I read recently that jQuery's each function is particularly slow). Here's my JS equivalent:

document.getElementById('filter').addEventListener( 'keyup', function () {
    var jsStart = (new Date).getTime();

    var search = this.value.toLowerCase();
    var list = document.querySelectorAll('ul.ablist > li');
    for ( var i = 0; i < list.length; i++ )
    {
        if ( list[i].innerText.toLowerCase().indexOf(search) === -1 )
            list[i].style.display = 'none';
        else
            list[i].style.display = 'block';
    }

    console.log('Time: ' + ((new Date).getTime() - jsStart));
}, false );

To my surprise however, the plain Javascript is up to 10 times slower than the jQuery equivalent. The jQuery version takes around 2-3 seconds to filter on each letter, while the Javascript version takes 17+ seconds! I'm using Google Chrome on Ubuntu Linux.

This isn't for anything really important so it doesn't need to be super efficient. But am I doing something really dumb with my Javascript here?

You could try using textContent instead of innerText , I think it should be faster. Also timing the list-generation and loop separately would tell if there is problem in list-generation.

How to measure memory usage for a Live ASP.NET MVC web application?

8 votes

So right off the bat, not sure if this question is better suited for another StackExchange site.

I've got an ASP.NET MVC 3 web application running on Windows Server 2008 and IIS 7.5

Site runs fine initially, but i can see the memory usage gradually growing. After about 12 hours it's nearly out of memory and the site chokes.

I'm using a lot of caching, so i'm thinking this combined with some possibly memory leaks is the cause of the issue.

So my question - what's the best way (tools, for example) to monitor memory usage on a web server running ASP.NET MVC?

In the past i've used good old' perfmon and put the IIS counters on to measure these things.

It this still the best way, and if so, can someone recommend a good perfmon counter template for my scenario?

Perfmon's counters are still a good technique (and free!). PAL (Performance Analysis of Logs), a free tool, has an ASP.NET perfmon counter template for general health (in addition to generating reports of counter log files based on thresholds).

Check out:

.NET Debugging Demos Lab 7: Memory Leak

.NET Memory Leak Case Study: The Event Handlers That Made The Memory Baloon

Tracking down managed memory leaks (how to find a GC leak)

Determine if your .NET Application has a Memory Leak

Commercial tools like MemProfiler, RedGate's memory profiling tool and JetBrains Profiler are all very good (and all have free trials).

Proper way to generate html dynamically with jQuery

7 votes

I found some different and conflicting answers on this topic.

I am building an application which works mostly with html dynamically generated by jQuery, based on results acquired from underlying API in form of JSON data.

I was told by some of my collegues (personally), that the best way would be to do something like this:

var ul = $("<ul>").addClass("some-ul");
$.each(results, function(index) {
  ul.append($("<li>").html(this).attr("id", index));
});
$("body").append($("<div>").attr("id", "div-id").addClass("some-div").append(ul));

etc. The reason I was told it was that "updates the DOM directly instead of parsing html to achieve it".

However, I see lots of code like this (same example):

var toAppend = '<div class="some-div" id="div-id"><ul>';
$.each(results, function(index) {
  toAppend += '<li id="' + index + '">' + this + '</li>';
});
toAppend += '</ul></div>'

Which I personally consider as not as elegant - but is it better? I googled the issue for a couple of minutes and found this article. Basically, it is about increasing performance drastically by using string concatenation - my "second way".

The main issue of this article is that it has been released in 2009 and discussed jQuery version is 1.3. Today, the current release is version 1.6.4 which can behave quite differently. And this is the issue of most articles on the subject I have already found and I'm also somehow suspicious about their credibility.

That's why I have decided to post the question here and ask - which method of generating DOM is actually the proper one, based on performance?

IMPORTANT EDIT:

I have written a little benchmark to test which approach is better considering performance.

jsFiddle - concatenation version

jsFiddle - array join version

Code:

var text = "lorem ipsum";
var strings = $("#strings");
var objects = $("#objects");
var results = $("#results");

// string concatenation
var start = new Date().getTime();
var toAppend = ['<div class="div-class" id="div-id1"><ul class="ul-class" id="ul-id1">'];
for (var i = 1; i <= 20000; i++) {
    toAppend[i] = '<li class="li-class" id="li-id1-' + i + '">' + text + '</li>';
}
toAppend[i++] = '</ul></div>';
results.append(toAppend.join(""));
strings.html(new Date().getTime() - start);

// jquery objects
var start = new Date().getTime();
var ul = $("<ul>").attr("id", "ul-id2").addClass("ul-class");
for (var i = 0; i < 20000; i++) {
    ul.append($("<li>").attr("id", "li-id2-" + i).addClass("li-class"));
}
results.append($("<div>").attr("id", "div-id2").addClass("div-class").append(ul));
objects.html(new Date().getTime() - start);

It seems that operating on strings is faster (in Firefox 7 about 7 times) than using jQuery objects and methods. But I can be wrong, especially if there are any mistakes or performance-decreasing bugs in this "benchmark's" code. Feel free to make any changes.

Note: I used Array join because of the article mentioned earlier instead of actual concatenation.

EDIT: Based on suggestion by @hradac, I used actual string concatenation in the benchmark and it did in fact improve the times.

First of all, this kind of micro-benchmarking almost never tells you what you want to really know. Second of all, your benchmarks are varied and not equivalent. For example, your first example generates lines that look like this:

<li class="li-class" id="li-id1-493">lorem ipsum</li>

and your second lines like this:

<li id="li-id2-0" class="li-class"></li>

Notice the different element order and the lack of "lorem ipsum" text. Nor is there any attempt to clean out the results div between tests to avoid performance issues as a result of the first 20K results already being there.

But beyond these issues is the question, "Is performance on this really disrupting the client side user experience?" Seriously? You're rendering such a quantity of text this way that you're seeing noticeable differences between the alternative methods rendering the text?

I'll harken back to what others have said, use a templating engine. The fastest ones are quite quick indeed and even have pre-compilation options to allow you to re-render the same template and get quick results. So don't believe me. Instead, believe a demonstration. Here's my jsFiddle to demonstrate the performance of the new JsRender library that is supposed to replace the jQuery Template engine...

http://jsfiddle.net/LZLWM/10/

Note: It can take several seconds for JsRender to load into the Fiddle. It's because I'm pulling it straight out of GitHub and that's not something that GitHub is particularly good at. I don't recommend that in actual practice. It doesn't change the timings though and it's necessary until jsFiddle starts incorporating templating engines as options.

Notice that the second example, much closer to a real-world example generates 20,000 lines using JSON as its starting point in time approximately the same as your fastest test (< 50ms difference on my machine). Note also that both the code and the template are much clearer and easier to work with than any mess of appends and string concatenation is ever going to be. How many iterations am I going to need to get my template right vs. what you're doing?

Use something simple and stop wasting time on this level of micro optimization when it's probably not even necessary. Instead use templates like this (or any of several other good templating engines) and make sure that you've got expires headers turned on, you're using a CDN, you've got gzip compression turned on on your server, etc. All the stuff that YSlow is telling you to do, because that will completely swamp the effects of what you're looking at here.

Heroku shared db vs Amazon RDS Performance

5 votes

I'm in the process of moving all my data from Heroku's shared db to Amazon RDS. Before switching everything over to RDS, I ran some tests locally to make sure my app works fine with it. These tests clearly slow that query time is slower on RDS. For the exact same request, I get:

On Heroku, with heroku shared db:

Completed 200 OK in 98ms (Views: 0.7ms | ActiveRecord: 56.0ms)

Locally, with RDS db instance

Completed 200 OK in 253ms (Views: 0.7ms | ActiveRecord: 127.9ms)

The ActiveRecord times are what I'm worried about here. Am I missing something? Heroku clearly states this about their shared db:

Shared databases are suitable for staging, testing, and low-scale production applications.

And yet it seems to be faster than this RDS instance I'm paying 80$/month for. Is heroku's shared db running locally? Because it's pretty obvious to me that about any database running locally inside my heroku app is going to be faster than any db that lives outside of it. Amazon says that any query taking more than 10ms is considered as a "slow query". But right now it seems that every query is gonna take at least 25ms for the roundtrip alone from the app to amazon's server + the actual query time. Or am I missing something?

From what I understand, Heroku EC2 instances run in the East availability zone, so creating an RDS instance in that same zone is pretty much like giving it a local database (I believe that's how heroku's shared databases work as well).

After setting up a staging environment for my app directly on Heroku and connecting it to my RDS instance, query times were much faster than when I tested it locally (where each SQL query had to make a roundtrip from my local machine to the RDS servers).

The only minor thing that's left unanswered is how to determine in which particular availability sub-zone my heroku app is running, so I can match it with my RDS instance (although it probably doesn't matter as much as the global availability zone).

enter image description here

big array manipulation is very slow in ruby

5 votes

I have the following scenario:

I need to figure out the unique list of ids across a very large set.

So for example, I have 6000 arrays of ids (followers list), each can range in size between 1 and 25000 (their follower list).

I want to get the unique list of ids across all of these arrays of ids (unique followers of followers). Once that is done I need to subtract out another list (another persons follower list) of ids and get a final count.

The final set of unique ids grows to around 60,000,000 records. In ruby when adding the arrays to the big array, it starts to get very slow around a couple of million. Adding to the set takes .1 seconds at first, then grows to taking over 4 seconds at 2 million (no where near where I need to go).

I wrote a test program in java and it does the whole thing in less than a minute.

Maybe I am doing this inefficiently in ruby, or there is another way. Since my main code is proprietary I have wrote a simple test program to simulate the issue:

big_array = []
counter = 0
loop_counter = 0
start_time = Time.now
# final target size of the big array
while big_array.length < 60000000
 loop_counter+=1
 # target size of one persons follower list
 random_size_of_followers = rand(5000)
 follower_list = []
 follower_counter = 0
   while follower_counter < random_size_of_followers
     follower_counter+=1
     # make ids very large so we get good spread and only some amt of dupes
     follower_id = rand(240000000) + 100000
     follower_list << follower_id
   end
 # combine the big list with this list
 big_array = big_array | follower_list
 end_time = Time.now

 # every 100 iterations check where we are and how long each loop and combine takes.
 if loop_counter % 100 == 0
   elapsed_time = end_time - start_time
   average_time = elapsed_time.to_f/loop_counter.to_f
   puts "average time for loop is #{average_time}, total size of big_array is #{big_array.length}"
   start_time = Time.now
   loop_counter = 0
 end
end

Any suggestions, is it time to switch to jruby and move stuff like this to java?

The method you're using there is horribly inefficient, so it's no surprise this is slow. When you're trying to keep track of unique things, an Array requires way more processing than a Hash equivalent.

Here's a simple refactoring that increases the speed about 100x:

all_followers = { }
counter = 0
loop_counter = 0
start_time = Time.now

while (all_followers.length < 60000000)
  # target size of one persons follower list
  random_size_of_followers = 
  follower_list = []

  rand(5000).times do
    follower_id = rand(240000000) + 100000
    follower_list << follower_id
    all_followers[follower_id] = true
  end

 end_time = Time.now

 # every 100 iterations check where we are and how long each loop and combine takes.
 loop_counter += 1

  if (loop_counter % 100 == 0)
    elapsed_time = end_time - start_time
    average_time = elapsed_time.to_f/loop_counter.to_f
    puts "average time for loop is #{average_time}, total size of all_followers is #{all_followers.length}"
    start_time = Time.now
  end
end

The nice thing about a Hash is that it's impossible to have duplicates. If you need to list all the followers at any time, use all_followers.keys to get the IDs.

Hashes take up more memory than their Array equivalents, but this is the price you have to pay for performance. I'd also suspect one of the big memory consumers here is the many individual lists of followers that are generated and seemingly never used, so perhaps you could skip that step entirely.

The key thing here is that the Array | operator is not very efficient, especially when operating on very large arrays.

Does the JIT compiler optimize (inline) unnecessary variable declarations?

4 votes

I've read several articles and questions/answers that conclude the best practice is to let the JIT compiler do all the optimization for inline function calls. Makes sense.

What about inline variable declarations? Does the compiler optimize these as well?

That is, will this:

        Dim h = (a + b + c) / 2       'Half-Perimeter

        If maxEdgeLength / (Math.Sqrt(h * (h - a) * (h - b) * (h - c)) / h) <= MaximumTriangleAspectRatio Then
           'Do stuff here.
        End If

Have better performance than this:

        Dim perimeter = a + b + c   'Perimeter
        Dim h = perimeter / 2       'Half-Perimeter

        Dim area = Math.Sqrt(h * (h - a) * (h - b) * (h - c)) 'Heron's forumula.
        Dim inradius = area / h
        Dim aspectRatio = maxEdgeLength / inradius

        If aspectRatio <= MaximumTriangleAspectRatio Then
            'Do stuff here.
        End If

Of course I prefer the latter because it's easier to read and debug, but I can't afford the performance degradation if it exists.

Note: I have already identified this code as a bottleneck -- No need for retorts about premature optimization. :-)

Temporary variables having names or not is a non-issue.

But you can optimize that inequality significantly.

Your code was:

If maxEdgeLength / (Math.Sqrt(h * (h - a) * (h - b) * (h - c)) / h) <= MaximumTriangleAspectRatio Then

Multiply both sides by the square root, eliminating division (inequality is preserved, because square root cannot return a negative number):

If maxEdgeLength <= (Math.Sqrt(h * (h - a) * (h - b) * (h - c)) / h) * MaximumTriangleAspectRatio Then

Now, square both sides to eliminate that expensive square root:

If maxEdgeLength * maxEdgeLength <= h * (h - a) * (h - b) * (h - c) / h / h * MaximumTriangleAspectRatio * MaximumTriangleAspectRatio Then

Cancel, and multiply by h.

If maxEdgeLength * maxEdgeLength * h <= (h - a) * (h - b) * (h - c) * MaximumTriangleAspectRatio * MaximumTriangleAspectRatio Then

This will be a lot faster. If this calculation is repeated, consider caching the results of part of this expression for even more improvement.

Use comments to explain the formula. Getting rid of a Math.Sqrt call in a bottleneck function is worth writing the expression in a less-than-simple format.