Best performance questions in March 2011

17 votes

I'm currently working on a site, and somewhere in my mass of stylesheets, something is killing performance in IE. Are there any good CSS profilers out there? I'd like a tool that can pinpoint rules that are killing performance.

Before you ask, I've disabled JavaScript, opacity, and box-shadow/text-shadow rules. The page is still jumpy. :/ If I disable all CSS, it runs great.

I need a tool that can profile the page and report where the CSS bottlenecks are.

So, I finally got around to writing a JavaScript function that indexed all of my CSS classes on the page and then individually toggled them, while scrolling the page. This immediately pin-pointed the errant class, and from there, I was able to determine errant property. Turns out that border-radius on an element that contains many children (e.g. a body level div) performs incredibly poorly on IE9.

I've started a github repo for my CSS stress test: https://github.com/andyedinborough/stress-css

From there, you can install a bookmarklet to easily run the test on any page.

Is C# really slower than say C++ ?

17 votes

I've been wondering about this issue for a while now.

Of course there are things in C# that aren't optimized for speed, so using those objects or language tweaks (like LinQ) may cause the code to be slower.

But if you don't use any of those tweaks, but just compare the same pieces of code in C# and C++ (It's easy to translate one to anther). Will it really be that much slower ?

I've seen comparisons done that show that C# might be even faster in some cases, because in theory the JIT compiler should optimize the code in real time and get better results:

Managed Or Unmanaged?

We should remember that the JIT compiler compiles the code at real time, but that's a 1-time overhead, the same code (once reached and compiled) doesn't need to be compiled again at run time.

The GC doesn't add a lot of overhead either, unless you create and destroy thousands of objects (like using String instead of StringBuilder). And doing that in C++ would also be costly.

Another point that I want to bring up is the better communication between DLLs introduced in .Net. The .Net platform communicates much better than Managed COM based DLLs.

I don't see any inherit reason why the language should be slower, and I don't really think that C# is slower than C++ (both from experience and lack of a good explanation)..

So, will a piece of the same code written in C# will be slower than the same code in C++ ?
In if so, then WHY ?

Some other reference (Which talk about that a bit, but with no explanation about WHY):

Why would you want to use C# if its slower than C++?

Warning: The question you've asked is really pretty complex -- probably much more so than you realize. As a result, this is a really long answer.

From a purely theoretical viewpoint, there's probably a simple answer to this: there's (probably) nothing about C# that truly prevents it from being as fast as C++. Despite the theory, however, there are some practical reasons that it is slower at some things under some circumstances.

I'll consider three basic areas of differences: language features, virtual machine execution, and garbage collection. The latter two often go together, but can be independent, so I'll look at them separately.

Language Features

C++ places a great deal of emphasis on templates, and features in the template system that are largely intended to allow as much as possible to be done at compile time, so from the viewpoint of the program, they're "static." Template meta-programming allows completely arbitrary computations to be carried out at compile time (I.e., the template system is Turing complete). As such, essentially anything that doesn't depend on input from the user can be computed at compile time, so at runtime it's simply a constant. Input to this can, however, include things like type information, so a great deal of what you'd do via reflection at runtime in C# is normally done at compile time via template metaprogramming in C++. There is definitely a trade-off between runtime speed and versatility though -- what templates can do, they do statically, but they simply can't do everything reflection can.

The differences in language features mean that almost any attempt at comparing the two languages simply by transliterating some C# into C++ (or vice versa) is likely to produce results somewhere between meaningless and misleading (and the same would be true for most other pairs of languages as well). The simple fact is that for anything larger than a couple lines of code or so, almost nobody is at all likely to use the languages the same way (or close enough to the same way) that such a comparison tells you anything about how those languages work in real life.

Virtual Machine

Like almost any reasonably modern VM, Microsoft's for .NET can and will do JIT (aka "dynamic") compilation. This represents a number of trade-offs though. First, almost every VM (including Microsoft's, I believe) attempts to make intelligent decisions about what to compile and what to interpret. It does this by interpreting the code the first few times it's encountered. It profiles how often it's interpreting particular code, and when it exceeds a certain threshold, figures it's likely to execute enough more that it's worth compiling it to gain execution speed. This has an obvious problem: once in a while, it can guess wrong -- you have a lot of code that executes just often enough to trigger compilation, but then never gets used again, you're losing pretty badly: nearly all your execution is via the slow, interpreted path, then you pay the price of compilation, but then you get on benefit from the compilation. In fairness, I should add that this is pretty unusual in normal code, but if you actually want to, it's usually pretty easy to trigger it.

Second, optimizing code (like most other optimization problems) is largely an NP-complete problem. For anything but a truly trivial/toy program, you're pretty nearly guaranteed you won't truly "optimize" the result (i.e., you won't find the true optimum) -- the optimizer will simply make the code better than it was previously. Quite a few optimizations that are well known, however, take a substantial amount of time (and, often, memory) to execute. With a JIT compiler, the user is waiting while the compiler runs. Especially when coupled with the first problem (possibility of deriving little or no benefit from compilation), most of the more expensive optimization techniques are ruled out. Static compilation has two advantages: first of all, if it's slow (e.g., building a large system) it's typically carried out on a server, and nobody spends time waiting for it. Second, an executable can be generated once, and used many times by many people. The first minimizes the cost of optimization; the second amortizes the much smaller cost over a much larger number of executions.

As mentioned in the original question (and many other web sites) JIT compilation does have the possibility of greater awareness of the target environment, which should (at least theoretically) offset this advantage. There's no question that this factor does offset at least part of the disadvantage of static compilation. For a few rather specific types of code and target environments, it can even outweigh the advantages of static compilation. At least in my testing and experience, however, this is fairly unusual. Target dependent optimizations mostly seem to either make fairly small differences, or can only be applied (automatically, anyway) to fairly specific types of problems.

Using a VM also has a possibility of improving cache usage. Instructions for a VM are often more compact than native machine instructions. More of them can fit into a given amount of cache memory, so you stand a better chance of any given code being in cache when needed. This can help keep interpreted execution of VM code more competitive (in terms of speed) than most people would initially expect -- you can execute a lot of instructions on a modern CPU in the time taken by one cache miss.

It's also worth mentioning that this factor isn't necessarily different between the two at all. There's nothing preventing (for example) a C++ compiler from producing output intended to run on a virtual machine (with or without JIT). In fact, Microsooft's C++/CLI is nearly that -- an (almost) conforming C++ compiler (albeit, with a lot of conforming extensions) that produces output intended to run on a virtual machine. The reverse is probably also true: in theory a C# compiler that produced native code should be possible as well.

Garbage Collection

From what I've seen, I'd say garbage collection is the poorest-understood of these three factors. Just for an obvious example, the question here mentions: "GC doesn't add a lot of overhead either, unless you create and destroy thousands of objects [...]". In reality, if you create and destroy thousands of objects, the overhead from garbage collection will generally be fairly low. .NET uses a generational scavenger, which is a variety of copying collector. The garbage collector works by starting from "places" (e.g., registers and execution stack) that pointers/references are known to be accessible. It then "chases" those pointers to objects that have been allocated on the heap. It examines those objects for further pointers/references, until it has followed all of them to the ends of any chains, and found all the objects that are (at least potentially) accessible. In the next step, it takes all of the objects that are (or at least might be) in use, and compacts the heap by copying all of them into a contiguous chunk at one end of the memory being managed in the heap. The rest of the memory is then free (modulo finalizers having to be run, but at least in well-written code, they're rare enough that I'll ignore them for the moment).

What this means is that if you create and destroy lots of objects, garbage collection adds very little overhead. The time taken by a garbage collection cycle depends almost entirely on the number of objects that have been created but not destroyed. The primary consequence of creating and destroying objects in a hurry is simply that the GC has to run more often, but each cycle will still be fast. If you create objects and don't destroy them, the GC will run more often and each cycle will be substantially slower as it spends more time chasing pointers to potentially-live objects, and it spends more time copying objects that are still in use.

To combat this, generational scavenging works on the assumption that objects that have remained "alive" for quite a while are likely to continue remaining alive for quite a while longer. Based on this, it has a system where objects that survive some number of garbage collection cycles get "tenured", and the garbage collector starts to simply assume they're still in use, so instead of copying them at every cycle, it simply leaves them alone. This is a valid assumption often enough that generational scavenging typically has considerably lower overhead than most other forms of GC.

"Manual" memory management is often just as poorly understood. Just for one example, many attempts at comparison assume that all manual memory management follows one specific model as well (e.g., best-fit allocation). This is often little (if any) closer to reality than many peoples' beliefs about garbage collection (e.g., the widespread assumption that it's normally done using reference counting).

Given the variety of strategies for both garbage collection and manual memory management, it's quite difficult to compare the two in terms of overall speed. Attempting to compare the speed of allocating and/or freeing memory (by itself) is pretty nearly guaranteed to produce results that are meaningless at best, and outright misleading at worst.

Bonus Topic: Benchmarks

Since quite a few blogs, web sites, magazine articles, etc., claim to provide "objective" evidence in one direction or another, I'll put in my two-cents worth on that subject as well.

Most of these benchmarks are a bit like teenagers deciding to race their cars, and whoever wins gets to keep both cars. The web sites differ in one crucial way though: they guy who's publishing the benchmark gets to drive both cars. By some strange chance, his car always wins, and everybody else has to settle for "trust me, I was really driving your car as fast as it would go."

It's easy to write a poor benchmark that produces results that mean next to nothing. Almost anybody with anywhere close to the skill necessary to design a benchmark the produces anything meaningful, also has the skill to produce one that will give the results he's decided he wants. In fact it's probably easier to write code intended to produce a specific result than code that will really produce meaningful results.

As my friend James Kanze put it, "never trust a benchmark you didn't falsify yourself."

Conclusion

There is no simple answer. I'm reasonably certain I could flip a coin and roll a pair of dice to pick a winner and percentage by which it would win, and write a seemingly fair benchmark in which the chosen language won by the chosen percentage.

As others have pointed out, for most code, speed is almost irrelevant. The corollary to that (which is much more often ignored) is that in the little code where speed does matter, it usually matters a lot. At least in my experience, for the code where it really does matter, C++ is almost always the winner. There are definitely factors that favor C#, but in practice they seem to be outweighed by factors that favor C++. You can certainly find benchmarks that will indicate the outcome of your choice, but when you write real code, you can almost always make it faster in C++ than in C#. It might (or might not) take more skill and/or effort to write, but it's virtually always possible.

Is it better to use out for multiple output values or return a combined value type?

14 votes

For instance, along the lines of:

public bool Intersect (Ray ray, out float distance, out Vector3 normal)
{

}

vs

public IntersectResult Intersect (Ray ray)
{

}

public class IntersectResult
{
    public bool Intersects {get;set;}
    public float Distance {get;set;}
    public Vector3 Normal {get;set;}
}

Which is better both for clarity, ease of use, and most importantly performance.

I would use a combined type.

With an object you can attach behaviour, and return an arbitrarily complex object. You may wish to refactor your method in the future, and change the return values. By wrapping them in a return object and adding behaviour to that object, this refactoring can become largely transparent.

It's tempting to use tuples and the like. However the refactoring effort becomes a headache after a while (I'm speaking from experience here, having just made this mistake again)

Why don't I just build the whole web app in Javascript and Javascript HTML Templates?

12 votes

I'm getting to the point on an app where I need to start caching things, and it got me thinking...

  1. In some parts of the app, I render table rows (jqGrid, slickgrid, etc.) or fancy div rows (like in the New Twitter) by grabbing pure JSON and running it through something like Mustache, jquery.tmpl, etc.
  2. In other parts of the app, I just render the info in pure HTML (server-side HAML templates), and if there's searching/paginating, I just go to a new URL and load a new HTML page.

Now the problem is in caching and maintainability.

On one hand I'm thinking, if everything was built using Javascript HTML Templates, then my app would serve just an HTML layout/shell, and a bunch of JSON. If you look at the Facebook and Twitter HTML source, that's basically what they're doing (95% json/javascript, 5% html). This would make it so my app only needed to cache JSON (pages, actions, and/or records). Which means you'd hit the cache no matter if you were some remote api developer accessing a JSON api, or the strait web app. That is, I don't need 2 caches, one for the JSON, one for the HTML. That seems like it'd cut my cache store down in half, and streamline things a little bit.

On the other hand, I'm thinking, from what I've seen/experienced, generating static HTML server-side, and caching that, seems to be much better performance wise cross-browser; you get the graphics instantly and don't have to wait that split-second for javascript to render it. StackOverflow seems to do everything in plain HTML, so does Google, and you can tell... everything appears at once. Notice how though on twitter.com, the page is blank for .5-1 seconds, and the page chunks in: the javascript has to render the json. The downside with this is that, for anything dynamic (like endless scrolling, or grids), I'd have to create javascript templates anyway... so now I have server-side HAML templates, client-side javascript templates, and a lot more to cache.

My question is, is there any consensus on how to approach this? What are the benefits and drawbacks from your experience of mixing the two versus going 100% with one over the other?

Update:

Some reasons that factor into why I haven't yet made the decision to go with 100% javascript templating are:

  • Performance. Haven't formally tested, but from what I've seen, raw html renders faster and more fluidly than javascript-generated html cross-browser. Plus, I'm not sure how mobile devices handle dynamic html performance-wise.
  • Testing. I have a lot of integration tests that work well with static HTML, so switching to javascript-only would require 1) more focused pure-javascript testing (jasmine), and 2) integrating javascript into capybara integration tests. This is just a matter of time and work, but it's probably significant.
  • Maintenance. Getting rid of HAML. I love HAML, it's so easy to write, it prints pretty HTML... It makes code clean, it makes maintenance easy. Going with javascript, there's nothing as concise.
  • SEO. I know google handles the ajax /#!/path, but haven't grasped how this will affect other search engines and how older browsers handle it. Seems like it'd require a significant setup.

Persistant private data storage.

You need a server to store data with various levels of public/private access. You also need a server for secure closed source information. You need a server to do heavy lifting that you don't want to do on the client. Complex data querying is best left upto your database engine. Indexing and searching is not yet optimised for javascript.

Also you have the issues of older browsers being far slower. If your not running FF4/Chrome or IE9 then there is a big difference between data manipulation and page construction on the client and the server.

I myself am going to be trying to build a web application made entirely using a MVC framework and template's but still using the server to connect to secure and optimised database.

But in general the application can indeed be build entirely in javascript and using templates. The various constructs and javascript engines have advanced enough to do this. There are enough popular frameworks out there to do this. The Pure javascript web applications are no longer experiments and prototypes.

Oh, and if were recommending frameworks for this, then take a look at backbone.js.


Security


Let's not forget that we do not trust the client. We need serverside validation. JavaScript is interpreted, dynamic and can be manipulated at run time. We never trust client input.

Visual Studio 2010 very slow, unusable

12 votes

Hi, I've searched for this topic but can't seem to find posts that relate exactly to what I am experiencing.

I have a Visual Studio solution that I need to work on, its fairly large and contains 16 projects.

Everything is just so so slow and choppy (Except start-up which is actually quite fast).

Clicking a line in the text editor, it takes about 5 seconds just to move the cursor.

Switching between files ~1-2 mines (if I'm lucky)

Clicking on 'Tools' ~ 2 minutes for the drop down menu to appear.

If I right click one of the projects then its ~5-10 minutes before I get the drop down menu. During this time my entire PC locks up.

Closing Visual Studio (in rage) ~10-20mins

As for debugging and building.. well I've never managed to get that far.

Looking in task manager (opening this with visual studio going takes a long time) there is nothing running that is consuming a lot of memory/cpu.

I know Microsoft products are not renowned for being fast but this is ridiculous, there is no way I can code anything like this. Something must be wrong.

Any help would be so greatly appreciated, my head is ready to explode.

Visual Studio 2010 Ultimate SP1

Windows 7 x64

Intel i7 950 @ 3.07GHz

6GB RAM (Tri Channel)

2x nVidia GTX 470 (SLI)

Thanks for all the replies.

I've switched to using visual c++ express, it's much quicker, now I can actually do some coding.

Best solution I have for now.

What blocks Ruby, Python to get Javascript V8 speed?

11 votes

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Python is co-developed by Google guys so it shouldn't be blocked by software patents.

Or this is rather matter of resources put into the V8 project by Google.

What blocks Ruby, Python to get Javascript V8 speed?

Nothing.

Well, okay: money. (And time, people, resources, but if you have money, you can buy those.)

V8 has a team of brilliant, highly-specialized, highly-experienced (and thus highly-paid) engineers working on it, that have decades of experience (I'm talking individually – collectively it's more like centuries) in creating high-performance execution engines for dynamic OO languages. They are basically the same people who also created the Sun HotSpot JVM (among many others).

Lars Bak, the lead developer, has been literally working on V8 for 25 years, which is basically his entire (professional) life (and V8's, too). Some of the people writing Ruby VMs aren't even 25 years old.

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Given that at least IronRuby, JRuby, MagLev, MacRuby and Rubinius have either monomorphic (IronRuby) or polymorphic inline caching, the answer is obviously no.

Modern Ruby implementations already do a great deal of optimizations. For example, for certain operations, Rubinius's Hash class is faster than YARV's. Now, this doesn't sound terribly exciting until you realize that Rubinius's Hash class is implemented in 100% pure Ruby, while YARV's is implemented in 100% hand-optimized C.

So, at least in some cases, Rubinius can generate better code than GCC!

Or this is rather matter of resources put into the V8 project by Google.

Yes. Not just Google. V8 is 25 years old now. The people who are working on V8 also created the Self VM (to this day one of the fastest dynamic OO language execution engines ever created), the Animorphic Smalltalk VM (to this day one of the fastest Smalltalk execution engines ever created), the HotSpot JVM (the fastest JVM ever created, probably the fastest VM period) and OOVM (one of the most efficient Smalltalk VMs ever created).

In fact, Lars Bak, the lead developer of V8, worked on every single one of those, plus a few others.

javascript functions and arguments object, is there a cost involved

10 votes

It is common place to see code like that around the web and in frameworks:

var args = Array.prototype.slice.call(arguments);

In doing so, you convert the arguments Object into a real Array (as much as JS has real arrays anyway) and it allows for whatever array methods you have in your Array prototypes to be applied to it, etc etc.

I remember reading somewhere that accessing the arguments Object directly can be significantly slower than an array version of it. Is there any truth to that and under what circumstances / browsers does it incur a performance penalty to do so? Any articles on the subject you know of?

Cheers

update interesting find from http://bonsaiden.github.com/JavaScript-Garden/#function.arguments that invalidates what I read previously... Hoping the question gets some more answers from the likes of @Ivo Wetzel who wrote this.

At the bottom of that section it says:

Performance myths and truths

The arguments object is always created with the only two exceptions being the cases where it is declared as a name inside of a function or one of its formal parameters. It does not matter whether it is used or not.

this goes in conflict with http://www.jspatterns.com/arguments-considered-harmful/, which states:

However, it's not a good idea to use arguments for the reasons of :

  • performance
  • security

The arguments object is not automatically created every time the function is called, the JavaScript engine will only create it on-demand, if it's used. And that creation is not free in terms of performance. The difference between using arguments vs. not using it could be anywhere between 1.5 times to 4 times slower, depending on the browser

clearly, can't both be correct, so which one is it?

ECMA die-hard Dmitrty Soshnikov said:

Which exactly “JavaScript engine” is meant? Where did you get this exact info? Although, it can be true in some implementations (yep, it’s the good optimization as all needed info about the context is available on parsing the code, so there’s no need to create arguments object if it was not found on parsing), but as you know ECMA-262-3 statements, that arguments object is created each time on entering the execution context.

Here's some q&d testing. Using predefined arguments seems to be the fastest, but it's not always feasible to do this. If the arity of the function is unknown beforehand (so, if a function can or must receive a variable amount of arguments), I think calling Array.prototype.slice once would be the most efficient way, because in that case the performance loss of using the arguments object is the most minimal.

Cost of raising an Intent in android

9 votes

How much performance does it costs to broadcast intents? Is it okay to broadcast multiple per second or are intents expensive?

Intents are meant to launch different activities within the Android OS or to inform about basic actions. It seems like a bad design pattern to use them otherwise. As they travel between different processes and therefore implement the Parcelable interface, they are not the most light-weight.

If you are looking to update different activities at the same time you might consider using a common service.

According to this blog post, intents are 10 times slower than direct function calls http://andytsui.wordpress.com/2010/09/14/android-intent-performance/

Ways to improve WPF UI rendering speed

9 votes

In case a screen of a WPF application contains lots of primitive controls, its rendering becomes sluggish. What are the recommended ways to improve the responsiveness of a WPF application in such a case, apart from adding fewer controls and using more powerful videocard?

Is there a way to somehow use offscreen buffering or something like that?

Hi. Our team was faced with problems of rendering performance. In our case we have about 400 transport units and we should render chart of every unit with a lot of details (text labels, special marks, different geometries etc.).

In first our implementations we splitted each chart into primitives and composed whole unit's chart via Binding. It was very sad expirience. UI reaction was extremelly slow.

So we decided to create one UI element per each unit, and render chart with DrawingContext. Although this was much better in performance aspect, we spent about one month improving rendering.

Some advices:

  1. Cache everything. Brushes, Colors, Geometries, Formatted Texts, Glyphs. (For example we have two classes: RenderTools and TextCache. Rendering process of each unit addresses to shared instance of both classes. So if two charts have the same text, its prepation is executed just once.)
  2. Freeze Freezable, if you are planning to use it for a long time. Especially geometries. Complex unfreezed geometries execute HitTest extremelly slow.
  3. Choose the fastest ways of rendering of each primitive. For example, there is about 6 ways of text rendering, but the fastest is DrawingContext.DrawGlyphs.
  4. Use profiler to discover hot spots. For example, in our project we had geometries cache and rendered appropriate of them on demand. It seemed to be, that no improvements are possible. But one day we thought what if we will render geometries one time and cache ready visuals? In our case such approach happened acceptable. Our unit's chart has just several states. When data of chart is changed, we rebuild DrawingVisual for each state and put them into cache.

Of course, this way needs some investments, it's dull and boring work, but result is awesome.

By the way: when we turned on WPF caching option (you could find link in answers), our app hung up.

Weird performance issue with Galaxy Tab

8 votes

I am working on a 2d tutorial and was able to test my current tutorial part on a Samsung Galaxy Tab.

The tutorial simply moves the default icon randomly over the screen. With a tap I create a new moving icon. Everything works fine (constantly 60fps) on the Galaxy as long as I have 25 elements or less on the screen.

With the 26th element the frame rate drops to 25fps.

When I change the size/dimension of the image to a much bigger one, I reach less than 25fps before the 26th element. Thats ok. But at some not really reproducible number of elements the frame drops from (mostly more than) 10fps to 1fps.

On my Nexus One I can add 150 elements and still have 50fps.

What I have done: I changed the bitmap variable to a static one, so not every element has his own image but all use the same. That removed the behavior, but I doubt this solution is a good one. The magic number of 25 would suggest that I can use only 25 different images in that way.

Does someone have any idea what can cause this behavior? Is it a bug in the modified android version of Samsung?

My sample eclipse project is available. I would appreciate if some Samsung owner would check their performance with the sample.

edit

A colleague found a solution. He changed the way the bitmap is loaded from

mBitmap = BitmapFactory.decodeResource(res, R.drawable.icon);

to

mBitmap = BitmapFactory.decodeStream(new BufferedInputStream(res.openRawResource(R.drawable.icon)));

But we still don't really get it why it works this way...

Well, I've been looking on your project and everything seems to be fine, but I have one idea about what's causing you the frame rate drop.

You're allocating objects during runtime. If you don't do that, it will make you create all objects at start, and therefore you should notice a significant drop directly (if my solution doesn't solve your problem).

That said; I'm not sure whether an object pool will solve your problem, but you can try. Initialize your objects in a constructor and instead of making this call in onTouchEvent():

new Element(getResources(), (int) event.getX(), (int) event.getY())

You should have something like mElement.add(objectPool.allocate()), where the object pool finds an unused object in the pool. Also, we should have a specified amount of objects in that object pool and from there you can check if it is the allocating that is causing this error or if it is something else.

With the 26th element the frame rate drops to 25fps.

When (or if) you implements this, you should see the frame rate drop directly (if this doesn't solve your problem though), since the object pool will make you allocating a fixed amount (e.g. maybe 100 elements?) at start (but you're not using them visually).

Also, I have used the memory pool pattern (object pool) in one of my sample applications for Android. In that sample; I add a line to the Canvas on an onTouchEvent() using an object pool (without allocating during runtime). In that source code you can easily change the total amounts of objects and check it out and try it yourself. Write a comment if you want to look at my sample application (and source code) and I will gladly share it since it's not public yet. My comments are in swedish, but I think you should be able to understand, since the variables and methods are in english. :)

Side note: You wrote that you've tried (and even success) with removing the behaviour by making your Bitmap static. As it is right now, your elements have different instances of a Bitmap, which will make you allocate a new Bitmap everytime you're constructing a new object. That means that every object is pointing to a different Bitmap when they are using the same resource. static is a fully valid solution (although a magic number of 25 seems wierd).

This Bitmap case can be compared to the OpenGL system. If you have 20 objects which all should use the same resource, there are two possible solutions: They can point to the same VRAM texture or either they can point to a different VRAM texture (like your case when you're not using static), but still same resource.

EDIT:

Here is my sample application for Android that demonstrates the Memory Pool.

Regarding your solution with BitmapFactory, that's probably depending on how that class works. I'm not sure, but I think that one of the decode...() methods generates a new Bitmap even if it is the same resource. It can be the case that new BufferedInputStream(res.openRawResource(R.drawable.icon)) is reusing the BufferedInputStream from memory, although, that is a big guess.

What you should do (in that case) is to decode a resource and store a reference from it in the Panel class and pass that reference into the new Element(bitmapReference, ...) instead. In that way, you're only allocating it once and every element is pointing to the same Bitmap in memory.

Avoid creating a clustered index based on an incrementing key

7 votes

I got this hint from mssqlcity.com. However, I cannot understand its explanation.

Avoid creating a clustered index based on an incrementing key

For example, if a table has surrogate integer primary key declared as IDENTITY and the clustered index was created on this column, then every time data is inserted into this table, the rows will be added to the end of the table. When many rows will be added a "hot spot" can occur. A "hot spot" occurs when many queries try to read or write data in the same area at the same time. A "hot spot" results in I/O bottleneck. Note. By default, SQL Server creates clustered index for the primary key constraint. So, in this case, you should explicitly specify NONCLUSTERED keyword to indicate that a nonclustered index is created for the primary key constraint.

Before I read that, I thought if I pick a column that is random in nature, it's not correct because this will cause unnecessary page relocation when adding a new row. So, I think using a sorted column is preferrable.

After reading this hint, I think it's trying to say we don't really want to use a straightly sorted column to be our clustered index either because there is going to be an I/O bottleneck for those write-intensive application.

I don't really understand the cause of the I/O bottleneck that they are talking about. Are they saying too many operations sharing the same page is going to slow down the disk operations? How does this happen? Can somebody explain to me?

The Hot Spot they are referring to is not an issue in SQL Server 2005 and newer.

What USED to happen is that all your data was being written to the same area of the clustered index and the same sector(s) on the disk which caused a lot of dirty pages to be created at once (dirty pages being data pages that have been altered but not committed to disk), and when a flush or checkpoint ran this could cause issues.

Newer versions do not experience this behavior due to changes in the IO architecture (from what I understand).

Should I compress in-memory C# objects for better performance?

7 votes

I have an application (C#, WPF) that displays many financial charts with live data streaming from server. The data that is collected in-memory may grow to be a bit large, and I don't want to keep any data on disk.

Since the historical data itself doesn't change, but only added to, will it make sense to keep that data (which is stored in a collection object) in some compressed format?

Is it possible and can anyone recommend a good practice for it if so?

UPDATE

Some notes about performance and tradeoff: I am aware that compression will add a delay accessing the data, but, the user only needs fast updates on new data arriving. When accessing the data that was already rendered (for example, to study or re-render it) he doesn't require a quick response.

Compressing and decompressing will make your application slower so for performance (speed) it is not a good option. Compression is only useful when you are worried about available memory. It might be easier to store/swap the data to a temp folder.

The key to performance is measuring. Only take action when you have crunched the numbers.

What architecture to use to address this SystemOutOfMemoryException while allowing me to instantiate the cells of a sheet?

6 votes

Summary

This question is the follow-up of a a desire to architect a simple spreadsheet API while keeping it user-friendly to those who know Excel well.

To sum it up, this question is related to these below two:
1. How to implement column self-naming from its index?;
2. How to make this custom worksheet initialization faster?.

Objective

To provide a simplified Excel API used as a wrapper over the nevralgic components such as the Application, the Workbook, the Worksheet and the Range classes/interfaces while exposing only the most commonly used object properties for each of these.

Usage example

This usage example is inspired from the unit tests that allowed me to bring this solution up to where it stands now.

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.CreateWorkbook()
    wb.Sheets("Sheet1").Cells("A1").Value = 3.1415926
    wb.SaveAs(file)
End Using

And now we open it:

Dim file as String = "C:\Temp\WriteTest.xls"

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb as IWorkbook = mgr.OpenWorkbook(file)
    // Working with workbook here...
End Using

Discussion

While instantiating an Excel Workbook:

  1. An instance of a Worksheet is automatically initialized in the Workbook.Sheets collection;
  2. Upon initialization, a Worksheet initializes its Cells through the Range object that can represent one or multiple cells.

These Cells are immediately accessible with all their properties as soon as the Worksheet exists.

My wish is to reproduce this behaviour so that

  1. The Workbook class constructor initializes the Workbook.Sheets collection property with the native sheets;
  2. The Worksheet class constructor initializes the Worksheet.Cells collection property with the native cells.

My problem comes from the Worksheet class constructor while initializing the Worksheet.Cells collection property illustrated at #2.

Thoughts

Following these above-linked questions encountered issues, I wish to figure out another architecture that would allow me to:

  1. Access specific feature of a cell Range when required;
  2. Deliver most commonly used properties through my ICell interface;
  3. Having access to all of the Range cells of a worksheet from its initialization.

While keeping in mind that accessing a Range.Value property is the fastest interaction possible with the underlying Excel application instance using the Interop.

So, I thought of initializing my ReadonlyOnlyDictionary(Of String, ICell) with the name of the cells without immediately wrapping an instance of the Range interface so that I would simply generate the row and column indexes along with the cell's name to index my dictionary, then, assigning the Cell.NativeCell property only when one wants to access or format a specific cell or cell range.

That way, the data in the dictionary would be indexed with the name of the cells obtained from the column indexes generated in the Worksheet class constructor. Then, when one would do this:

Using mgr As ISpreadsheetManager = New SpreadsheetManager()
    Dim wb As IWorkbook = mgr.CreateWorkbook()
    wb.Sheet(1).Cells("A1").Value = 3.1415926 // #1:
End Using

#1: This would allow me to use the indexes from my Cell class to write the given value to the specific cell, which is faster then using its name directly against the Range.

Questions and Concerns

Besides, when working with UsedRange.get_Value() or Cells.get_Value(), this returns Object(,) arrays.

1. So should I just be happy with working with Object(,) arrays for cells, without having the possibility to format it somehow?

2. How to architect these Worksheet and Cell classes so that I have the best performance offered while working with Object(,) arrays, while keeping the possibility that a Cell instance may represent or wrap a single cell Range?

Thanks to any of you who takes the time to read my post and my sincerest thanks to those who answer.

The used architecture has gone through an object class that I named CellCollection. Here's what it does:

Based on these hypothesis:

  1. Given that an Excel worksheet has 256 columns and 65536 lines;

  2. Given that 16,777,216 (256 * 65536) cells needed to be instantiated at a time;

  3. Given that the most common use of a worksheet takes less then 1,000 lines and less than 100 columns;

  4. Given that I needed it to be able to refer to the cells with their addresses ("A1"); and

  5. Given that it is benchmarked that accessing all the values at once and load them into a object[,] in memory as being the fastest way to work with an underlying Excel worksheet,*

I have considered not to instantiate any of the cells, letting my CellCollection property within my IWorksheet interface initialized and empty upon instantiation, except for an existing workbook. So, when opening a workbook, I verify that NativeSheet.UsedRange is empty or return null (Nothing in Visual Basic), otherwise, I have already gotten the used "native cells" in memory so that only remains to add them in my internal CellCollection dictionary while indexing them with their respective address.

Finally, Lazy Initialization Design Pattern to the rescue! =)

public class Sheet : ISheet {
    public Worksheet(Microsoft.Office.Interop.Excel.Worksheet nativeSheet) {
        NativeSheet = nativeSheet;
        Cells = new CellCollection(this);
    }

    public Microsoft.Office.Interop.Excel.Worksheet NativeSheet { get; private set; }

    public CellCollection Cells { get; private set; }
}

public sealed class CellCollection {
    private IDictionary<string, ICell> _cells;
    private ReadOnlyDictionary<string, ICell> _readonlyCells;

    public CellCollection(ISheet sheet) {
        _cells = new Dictionary<string, ICell>();
        _readonlyCells = new ReadonlyDictionary<string, ICell>(_cells);
        Sheet = sheet;
    }

    public readonly ReadOnlyDictionary<string, ICell> Cells(string addresses) {
        get {
            if (string.IsNullOrEmpty(addresses) || 0 = address.Trim().Length)
                throw new ArgumentNullException("addresses");

            if (!Regex.IsMatch(addresses, "(([A-Za-z]{1,2,3}[0-9]*)[:,]*)"))
                throw new FormatException("addresses");

            foreach(string address in addresses.Split(",") {
                Microsoft.Office.Interop.Excel.Range range = Sheet.NativeSheet.Range(address)

                foreach(Microsoft.Office.Interop.Excel.Range cell in range) {
                    ICell c = null;
                    if (!_cells.TryGetValue(cell.Address(false, false), c)) { 
                        c = new Cell(cell);
                        _cells.Add(c.Name, c);
                    }
                }
            }

            return _readonlyCells;
        }
    }

    public readonly ISheet Sheet { get; private set; }
}

Obviously, this is a first try shot, and it works just fine so far, with more than acceptable performance. Humbly though, I feel like it could use some optimizations, though I will use it this way for now, and optimize it later if needed.

After having written this collection, I was able to come to the expected behaviour. Now, I shall try to implement some of the .NET interfaces to make it useable against some IEnumerable, IEnumerable<T>, ICollection, ICollection<T>, etc. so that it may respectively be considered as a true .NET collection.

Feel free to comment and bring constructive alternatives and/or changes to this code so that it may become even greater than it currently is.

I DO hope this will serve one's purpose someday.

Thanks for reading! =)

Rewriting jQuery to plain old JavaScript - are the performance gains worth it?

4 votes

Since jQuery is an incredibly easy and banal library, I've developed a rather complex project fairly quickly with it. The entire interface is jQuery based, and memory is cleaned regularly to maintain optimum performance. Everything works very well in Firefox, and exceptionally so in Chrome (other browsers are of no concern for me as this is not a commercial or publicly available product).

What I'm wondering now is - since pure plain old banal JavaScript is really not a complicated language to master, would it be performance enhancing to rewrite the whole thing in plain old JavaScript, and if so, how much of a boost would you expect to get from it?

If the answers prove positive enough, I'll go ahead and do it, run a benchmark and report back with the precise findings.

Edit: Thanks guys, valuable insight. The purpose was not to "re-invent the wheel" - it was just for experience and personal improvement. Just because something exists, doesn't mean you shouldn't explore it into greater detail, know how it works or try to recreate it. This is the same reason I seldom use frameworks, I would much rather use my own code and iron it out and gain massive experience doing it, than start off by using someone else's code, regardless of how ironed out it is. Anyway, I won't be doing it, thanks for saving me the effort :)

You say the site works from "very well" to "exceptionally" - then don't bother. It won't be worth the effort, and there is no guarantee that your end result will even be more optimal than with jQuery, as the jQuery team has had years to iron out many issues.

You also say that plain old JS is "really not a complicated language" - that's not the main issue. It's not JS that's difficult to master, and what jQuery makes up for, but all the various browser quirks.

In the end, even if you do create a site that is marginally faster without jQuery, you have one that is much harder to maintain.

In short, there are times when doing such an exercise is valuable, but not when your site already works "very well", and it's not necessarily as simple as you think.

Does the database (maximum) field length affect performance?

4 votes

At my company, we have a legacy database with various tables and therefore many, many fields. A lot of the fields seem to have large limits (ex: NVARCHAR(MAX)) that are never reached. Does arbitrarily making the fields their maximum width or 2 to 3 times larger than what is normally inputted negatively affect performance? How should one balance performance with field lengths? Is there a balance?

There's two parts to this question:

Does using NVARCHAR over VARCHAR hurt performance? Yes, storing data in unicode fields doubles the storage requirements. Your data stored in those fields is 2x the size it needs to be (until SQL Server 2008 R2 came out, which includes unicode compression. Your table scans will take twice as long and only half as much data can be stored in memory in the buffer cache.

Does using MAX hurt performance? Not directly, but when you use VARCHAR(MAX), NVARCHAR(MAX), and those kinds of fields, and if you need to index them, you won't be able to rebuild those indexes online.