Best performance questions in June 2011

Why are Ruby method calls particularly slow (in comparison to other languages)?

19 votes

I'm trying to read about Ruby performance, and came across this SO thread, where one of the answers mentions that "method calls, one of the most common operations in Ruby, are particularly slow."

Another thread mentions that "It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember names per context to allow for eval, so its frames and method calls are slower."

Can someone explain in more detail why Ruby method calls are particularly slow, and elaborate on the second thread? I'm not totally sure what late lookup is or why it's slow, and I don't know what names per context means or how it relates to frames and method calls.

My (possibly incorrect) understanding is that since methods can be added or modified at runtime, the Ruby interpreter can never "remember" how to run a particular method, so it has to lookup the method every time while the program is running, and this is what is meant by method calls being slow. But corrections and more technical explanations would be great.

Compiled languages often have fast method dispatch because the calling code knows an index into the class' vtable, which is an array of method pointers. After just a few pointer dereferences, the calling code can jump right into the method. The compiler create the vtable, and replaces every method name in the source code with the numerical index of the method in the vtable.

Dynamic languages such as Ruby often have slow method dispatch because the calling code has a name for the method, not a pointer (nor an index into an array containing the pointers). The calling code has to ask the object for its class, then has to ask the class if it has a method by that name, and if not, go on up the chain of ancestors asking each ancestor if it has a method by that name (this is what the compiler does in a compiled language, which is why the compiling is slow and the method dispatch is fast). Rather than a few pointer dereferences costing just a few machine instructions to invoke a method, a dynamic language must execute dozens to hundreds of machine instructions to search the object's class and all the object's ancestor classes for the method. Each class has a HashTable of names -> methods, but HashTables with string keys are an order of magnitude slower than arrays with integer indexes.

There are ways to optimize method dispatch in dynamic langauges, of course. In Ruby, that's what JRuby, Rubinius, and IronRuby are working on. But that's a subject for another question.

Why do .net languages vary in performance?

15 votes

I have heard that C++ .NET is fastest , C# is next, followed by VB .NET and Languages like Iron-Python and Boo come last in terms of performance. If all .NET languages compile to intermediate byte-code which is the same, why the difference in performance?

It is understandable for Boo and Python as all the types have to be evaluated at runtime. But why the difference between languages like C++ and C#?

Boo and Python perform worse because they are interpreted, not compiled. Instead of being converted to CIL (common intermediate language) before being run, they are converted at run time, which obviously will incur performance overhead. NOTE: Boo may not be interpreted, I'm not sure.

Also, since IronPython is dynamically-typed (apparently Boo is not), fewer optimizations can be made when compared to statically typed languages (which C++ and C# are).

You also have to consider the amount of effort put into making optimizations to each implementation. C# and C++.NET have huge teams at Microsoft working on making their compilers produce the fastest bytecode possible. IronPython and Boo are volunteer projects that don't have nearly as much manpower or resources, so they won't gain optimizations as quickly as something MS-funded.

Essentially, language features can have performance/memory costs at both compile-time and runtime. That is why .NET languages vary in performance; because they vary in features.

To ask permission or apologize?

12 votes

I come from a python background, where it's often said that it's easier to apologize than to ask permission. Specifically given the two snippets:

if type(A) == int:
  do_something(A)
else:
  do_something(int(A))

try:
  do_something(A)
except TypeError:
  do_something(int(A))

Then under most usage scenarios the second one will be faster when A is usually an integer (assuming do_something needs an integer as input and will raise its exception fairly swiftly) as you lose the logical test from every execution loop, at the expense of a more costly exception, but far less frequently.

What I wanted to check was whether this is true in C#, or whether logical tests are fast enough compared to exceptions to make this a small corner case?

Oh and I'm only interested in release performance, not debug.


OK my example was too vague try this one:

Naive solution:

return float(A) % 20 # coerse A to a float so it'll only fail if we actually don't
                     # have anything that can be represented as a real number.

Logic based solution:

if isinstance(A, Number): # This is cheaper because we're not creating a new
    return A % 20         # object unless we really have to.
else:
    return float(A) %20

Exception based solution:

try: # Now we're doing any logical tests in the 99% of cases where A is a number
  return A % 20
except TypeError:
  return float(A) % 20

Examples using FSOs, database connections, or stuff over a network are better but a bit long-winded for a question.

Probably not. .NET exceptions are relatively expensive.

Several .NET functions offer both variants for this reason. (int.TryParse, which returns a success code is often recommended because it is faster than int.Parse which throws an exception on failure)

But the only answer that matters is what your own profiling data tells you. If you need performance, then you need to measure, measure, measure.

Because what was fastest on my computer, with my code, with my version of the .NET framework, at this time may not be the fastest on your computer, with your code, with your version of the .NET framework at the time when you read it.

Is MySQL naturally slow at this kind of query, or do I have it misconfigured?

12 votes

The following query is intended to receive a list of unread messages by user. It involves 3 tables: recipients contains a relation of users to message IDs, messages contains the messages themselves, and message_readers contains a list of which users have read which messages.

The query reliably takes 4.9 seconds - this is seriously hurting our performance, and is especially worrisome since we hope the database will eventually be several orders of magnitude larger. Granted, it's an inherently heavy query, but the data set is tiny, and intuitively it seems that it should be much faster. The server has enough memory (32gb) that the entire database should be loaded in RAM at all times, and there's nothing else running on the box.

The tables are all tiny:

recipients: 23581
messages: 9679
message_readers: 2685

The query itself:

SELECT 
    m.*
FROM 
    messages m
INNER JOIN recipients r ON r.message_id = m.id
LEFT JOIN message_readers mr ON mr.message_id = m.id
WHERE
    r.id = $user_id
    AND (mr.read_by_id IS NULL OR mr.read_by_id <> $user_id)

The explain plan is pretty straightforward:

+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
| id | select_type | table | type   | possible_keys                     | key                               | key_len | ref                            | rows  | Extra       |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+
|  1 | SIMPLE      | r     | ref    | index_recipients_on_id            | index_recipients_on_id            | 768     | const                          | 11908 | Using where |
|  1 | SIMPLE      | m     | eq_ref | PRIMARY                           | PRIMARY                           | 4       | db.r.message_id                |     1 | Using index |
|  1 | SIMPLE      | mr    | ALL    | NULL                              | NULL                              | NULL    | NULL                           |  2498 | Using where |
+----+-------------+-------+--------+-----------------------------------+-----------------------------------+---------+--------------------------------+-------+-------------+

There IS an index on message_readers.read_by_id, but I guess it can't really use it because of the IS NULL condition.

I'm using all default settings except for the following:

key_buffer=4G
query_cache_limit = 256M
query_cache_size = 1G
innodb_buffer_pool_size=12G

Thanks!

Assuming that message_readers is a subset of recipients, I recommend making the following changes:

  1. Get rid of the message_readers table and replace it with a flag on the recipients table. This will eliminiate the null check and remove a join.

  2. It probably already is, but make sure your clustered index for recipients is id, message_id rather than message_id, id, since nearly all searches for messages will be based on the recipients.

Here is the SELECT that results:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON m.id = r.message_id
WHERE
    r.id = $user_id
    AND r.read_flag = 'N'

UPDATE

Here is the correct version of your query using the existing scheme:

SELECT
    r.whatever,
    m.whatever,
    -- ...
FROM
    recipients r
    INNER JOIN messages m ON r.message_id = m.id
    LEFT JOIN message_readers mr ON mr.read_by_id = r.id 
                                 AND mr.message_id = m.id
WHERE
    r.id = $user_id
    AND mr.read_by_id IS NULL

This assumes that your clustered indexes are what would be expected:

recipients: id, message_id
messages: id
message_readers: read_by_id, message_id

Is excessive nesting of WPF layout panels (e.g. Grid) computationally expensive?

11 votes

folks

I have heard from a coworker that I - as a designer using Microsoft Expression Blend - should avoid using excessive nesting of panel elements, because they are computationally expensive.

For example, I tend to create the mainwindow with header and custom statusbar with grid, and then take the top panel and put a grid inside it, and if I have a message inside a rectangle on the already gridded top panel I create yet another grid, etc.

As a very layout-oriented disigner (who wants to use every screen most efficiently whatever the screen dimensions are) I know this is the best way to do it considering absolute control and flexibility, which prevent the window to resize in "unpredictable" ways ;oP

BUT... ...this friend of mine said that, if you have, say, five grids nested inside one another, if you pass the mouse over them, you generate five mouse events, which is costly.

Also, if you have too many calculations due to the too many containers asking for children sizes before the actual rendering, it can also be costly.

I had some previous experience with PyGtk, and I must say I used A LOT o layout panels for all my scripts, and even the resizing of windows never seemed to me to be specially costly, except when I had some complex canvas drawing needed to be recalculated.

Does anyone have any experience or know anything about it?

Thanks a lot for reading

There's no straight-forward answer to this, but obviously the more elements you have participating in layout, the longer the measure and arrange phases are going to take for the window. Depending on which features of which Panel types you use it could be more or less costly, but for sure the more you use the more overhead there will be during the layout calculations. You can learn more about how the layout system works by reading that entire MSDN article.

In the end this is something that, unless you've gone crazy, will not often be an issue. To find out if it is causing problems for your app I suggest using the WPF Performance Suite to do some performance testing.

SQL Server aggregate performance

9 votes

I am wondering whether SQL Server knows to 'cache' if you like aggregates while in a query, if they are used again.

For example,

Select Sum(Field),
       Sum(Field) / 12
From   Table

Would SQL Server know that it has already calculated the Sum function on the first field and then just divide it by 12 for the second? Or would it run the Sum function again then divide it by 12?

Thanks

It calculates once

Select
   Sum(Price),
   Sum(Price) / 12
From
   MyTable

The plan gives:

|--Compute Scalar(DEFINE:([Expr1004]=[Expr1003]/(12.)))
  |--Compute Scalar(DEFINE:([Expr1003]=CASE WHEN [Expr1010]=(0) THEN NULL ELSE [Expr1011] END))
     |--Stream Aggregate(DEFINE:([Expr1010]=Count(*), [Expr1011]=SUM([myDB].[dbo].[MyTable].[Price])))
        |--Index Scan(OBJECT:([myDB].[dbo].[MyTable].[IX_SomeThing]))

This table has 1.35 million rows

Naming dict keys for fast lookup in python

6 votes

I'm going to have 1 small dictionary (between 5 and 20 keys) that will be referenced up to a hundred times or so for one page load in python 2.5.

I'm starting to name the keys which it will be looking up and I was wondering if there is a key naming convention I could follow to help dict lookup times.

I had to test ;-)

using

  • f1, integer key 1
  • f2 short string, "one"
  • f3 long string "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa"

as one of the keys into a dictionary of length 4. Iterating 10,000,000 times and measuring the times. I get this result:

<function f1 at 0xb779187c>
f1 3.64
<function f2 at 0xb7791bfc>
f2 3.48
<function f3 at 0xb7791bc4>
f3 3.65

I.e no difference...

My code

How to prevent rails (3.1) from firing 2 selects for same record?

6 votes

I have a CRUD users controller. When I open the "user edit" page in the browser, my log shows this:

Started GET "/users/1/edit" for 127.0.0.1 at 2011-06-21 20:09:37 +0200
  Processing by UsersController#edit as HTML
  Parameters: {"id"=>"1"}
  User Load (0.2ms)  SELECT `users`.* FROM `users` WHERE
   `users`.`id` = ? LIMIT 1  [["id", 1]]
  User Load (0.3ms)  SELECT `users`.* FROM `users` WHERE
   `users`.`id` = ? LIMIT 1  [["id", "1"]]

In the edit action, I simply call a private function user, which returns

@user ||= User.find(params[:id])

The view looks as follows:

<%= settings_title(@user.username) %>
<%= form_for @user, :html => { :multipart => true } do |f| %>
  <%= render "form", :user => @user
  <div class="action"><%= submit_tag t("users.edit.submit"), :class => "button" %></div>
<%= end %>

The route is defined as resources :users do ...

Any idea how to prevent the second db access would be greatly appreciated!

Update:

It seems like the second DB SELECT can be prevented by calling

@user ||= User.find(params[:id].to_i) # notice the .to_i

in the edit action. I now get:

User Load (0.1ms)  SELECT `users`.* FROM `users` WHERE `users`.`id` = ? LIMIT 1  [["id", 1]]
CACHE (0.0ms)      SELECT `users`.* FROM `users` WHERE `users`.`id` = ? LIMIT 1

but is this the proper way to do it? Do you see any other side-effects of this solution?

Your #to_i workaround notwithstanding, if current_user is an admin and can edit any user record, then it would seem this is the correct behavior. It's just a coincidence that in this case current_user == user_to_be_edited and you're getting two db hits for the same data. In all the other cases where the current_user is editing someone else's user data, you will have to hit the database twice by necessity.

However, if current_user only ever edits his/her own data, then in your controller instead of:

@user ||= User.find(params[:id])

you would use:

@user ||= current_user

...under the assumption that user authentication has already occurred prior to getting to the action. In this manner, you will only have the one hit on the database that happens in authentication.

As a final note, in the former case, where a current_user admin can edit any user, if you really want to get rid of that one coincidental edge case where the database gets hit twice, you can do this:

@user ||= current_user.id == params[:id].to_i ? current_user : User.find(params[:id])

In this manner, you'll avoid the extra db hit when a user is editing his/her own data.

MySQL PRIMARY KEYs: UUID / GUID vs BIGINT (timestamp+random)

5 votes

tl;dr: Is assigning rows IDs of {unixtimestamp}{randomdigits} (such as 1308022796123456) as a BIGINT a good idea if I don't want to deal with UUIDs?

Just wondering if anyone has some insight into any performance or other technical considerations / limitations in regards to IDs / PRIMARY KEYs assigned to database records across multiple servers.

My PHP+MySQL application runs on multiple servers, and the data needs to be able to be merged. So I've outgrown the standard sequential / auto_increment integer method of identifying rows.

My research into a solution brought me to the concept of using UUIDs / GUIDs. However the need to alter my code to deal with converting UUID strings to binary values in MySQL seems like a bit of a pain/work. I don't want to store the UUIDs as VARCHAR for storage and performance reasons.

Another possible annoyance of UUIDs stored in a binary column is the fact that rows IDs aren't obvious when looking at the data in PhpMyAdmin - I could be wrong about this though - but straight numbers seem a lot simpler overall anyway and are universal across any kind of database system with no conversion required.

As a middle ground I came up with the idea of making my ID columns a BIGINT, and assigning IDs using the current unix timestamp followed by 6 random digits. So lets say my random number came about to be 123456, my generated ID today would come out as: 1308022796123456

A one in 10 million chance of a conflict for rows created within the same second is fine with me. I'm not doing any sort of mass row creation quickly.

One issue I've read about with randomly generated UUIDs is that they're bad for indexes, as the values are not sequential (they're spread out all over the place). The UUID() function in MySQL addresses this by generating the first part of the UUID from the current timestamp. Therefore I've copied that idea of having the unix timestamp at the start of my BIGINT. Will my indexes be slow?

Pros of my BIGINT idea:

  • Gives me the multi-server/merging advantages of UUIDs
  • Requires very little change to my application code (everything is already programmed to handle integers for IDs)
  • Half the storage of a UUID (8 bytes vs 16 bytes)

Cons:

  • ??? - Please let me know if you can think of any.

Some follow up questions to go along with this:

  1. Should I use more or less than 6 random digits at the end? Will it make a difference to index performance?

  2. Is one of these methods any "randomer" ?: Getting PHP to generate 6 digits and concatenating them together -VS- getting PHP to generate a number in the 1 - 999999 range and then zerofilling to ensure 6 digits.

Thanks for any tips. Sorry about the wall of text.

I have run into this very problem in my professional life. We used timestamp + random number and ran into serious issues when our applications scaled up (more clients, more servers, more requests). Granted, we (stupidly) used only 4 digits, and then change to 6, but you would be surprised how often that the errors still happen.

Over a long enough period of time, you are guaranteed to get duplicate key errors. Our application is mission critical, and therefore even the smallest chance it could fail to due inherently random behavior was unacceptable. We started using UUIDs to avoid this issue, and carefully managed their creation.

Using UUIDs, your index size will increase, and a larger index will result in poorer performance (perhaps unnoticeable, but poorer none-the-less). However MySQL supports a native UUID type (never use varchar as a primary key!!), and can handle indexing, searching,etc pretty damn efficiently even compared to bigint. The biggest performance hit to your index is almost always the number of rows indexed, rather than the size of the item being index (unless you want to index on a longtext or something ridiculous like that).

To answer you question: Bigint (with random numbers attached) will be ok if you do not plan on scaling your application/service significantly. If your code can handle the change without much alteration and your application will not explode if a duplicate key error occurs, go with it. Otherwise, bite-the-bullet and go for the more substantial option.

You can always implement a larger change later, like switching to an entirely different backend (which we are now facing... :P)

Performance differences between '.find' and '.where' methods

5 votes

I am using Ruby on Rails 3.0.7 and I would like to know, regarding performance matters, what are differences between the User.find(<id>) method and the User.where(:id => <id>) method.

Under the hood, find does more or less what you're describing with your where. You can find the details in this post. That being said, if you're looking to grab a single record by id, then you might want to use find_one. That's what find winds up doing when you call it with a single argument of an id, but you'll skip past all the other code it needs to run to figure out that's what you wanted.

How to get the number of CPU cycles used by a process

4 votes

I have a need to get the number of CPU cycles used by a specific process using C# (or VB.Net). This information is available in the Process properties popup within Sysinternal's Process Explorer. For instance, the browser that I'm using the post this message has currently used 18,521,360,165 cyles (give or take a few hundred million). Does anyone know how to get this information from a .Net app? I know how to get the CPU usage (percentage), but this isn't what I'm looking for. I need a way to compare CPU usage between two different processes running at different times.

Thank you, Matt

Update:
Why do I need this? I'm the leader of the local .Net user group and we're running a code challenge where developers submit code to solve a problem. I need a way to measure the performance of one entry against another. Currently I'm using a timer to measure performance. The server is 100% dedicated to this, but that doesn't guarantee that something else might be happening at the same time. Obviously, this is frought with all kinds of potential issues, but generally speaking, it's fairly accurate. Measuring the number of CPU cycles used would be an almost fool proof way to measure how well someone's entry performs against another. I'm certain that someone can shoot holes all over this - no need to try at this point. ;-) I hope that helps explain the reason behind my question and why a timer is insufficient for solving my problem.

Process Explorer calls QueryProcessCycleTime to get that information. You will probably have to use P/Invoke to call it. I would expect its P/Invoke signature to look like:

[DllImport("kernel32.dll")]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool QueryProcessCycleTime(IntPtr ProcessHandle, out ulong CycleTime);