Best optimization questions in September 2011

Once upon a time, when > was faster than < ... Wait, what?

95 votes

I am reading a wonderful OpenGL tutorial. It's unbelievably great, trust me. The topic I am currently at is Z-buffer. Aside from explaining what's it all about, the author mentions that we can perform custom depth tests, such as GL_LESS, GL_ALWAYS, etc. He also explains that the actual meaning of depth values (which is top and which isn't) can also be customized. I understand so far. And then the author says something unbelievable:

The range zNear can be greater than the range zFar; if it is, then the window-space values will be reversed, in terms of what constitutes closest or farthest from the viewer.

Earlier, it was said that the window-space Z value of 0 is closest and 1 is farthest. However, if our clip-space Z values were negated, the depth of 1 would be closest to the view and the depth of 0 would be farthest. Yet, if we flip the direction of the depth test (GL_LESS to GL_GREATER, etc), we get the exact same result. So it's really just a convention. Indeed, flipping the sign of Z and the depth test was once a vital performance optimization for many games.

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.

Is the author making things up, am I misunderstanding something, or is it indeed the case that once < was slower (vitally, as the author says) than >?

Thanks for clarifying this quite curious matter!

Disclaimer: I am fully aware that algorithm complexity is the primary source for optimizations. Furthermore, I suspect that nowadays it definitely wouldn't make any difference and I am not asking this to optimize anything. I am just extremely, painfully, maybe prohibitively curious.

If I understand correctly, performance-wise, flipping the sign of Z and the depth test is nothing but changing a < comparison to a > comparison. So, if I understand correctly and the author isn't lying or making things up, then changing < to > used to be a vital optimization for many games.

I didn't explain that particularly well, because it wasn't important. I just felt it was an interesting bit of trivia to add. I didn't intend to go over the algorithm specifically.

However, context is key. I never said that a < comparison was faster than a > comparison. Remember: we're talking about graphics hardware depth tests, not your CPU. Not operator<.

What I was referring to was a specific old optimization where one frame you would use GL_LESS with a range of [0, 0.5]. Next frame, you render with GL_GREATER with a range of [1.0, 0.5]. You go back and forth, literally "flipping the sign of Z and the depth test" every frame.

This loses one bit of depth precision, but you didn't have to clear the depth buffer, which once upon a time was a rather slow operation. Since depth clearing is not only free these days but actually faster than this technique, people don't do it anymore.

Is this an g++ optimization bug?

58 votes

Below code works on Visual Studio 2008 with and without optimization. But it only works on g++ without optimization (O0).

#include <cstdlib>
#include <iostream>
#include <cmath>

double round(double v, double digit)
{
 double pow = std::pow(10.0, digit);
 double t = v * pow;
 //std::cout << "t:" << t << std::endl;
 double r = std::floor(t + 0.5);
 //std::cout << "r:" << r << std::endl;
 return r / pow;
}

int main(int argc, char *argv[])
{
 std::cout << round(4.45, 1) << std::endl;
 std::cout << round(4.55, 1) << std::endl;
}

Output should be: 4.5 4.6

But g++ with optimization (O1 - O3) will output 4.5 4.5

if I add volatile keyword before t, it works, so I think there might be some kind of optimization bug?

Test on g++ 4.1.2, and 4.4.4.

Edit:

Here is the result on ideone: http://ideone.com/Rz937

And the option I test on g++ is simple: g++ -O2 round.cpp

Edit2:

The more interesting result, even I turn on /fp:fast option on Visual Studio 2008, the result still is correct.

Further question:

I was wondering, should I always turn on -ffloat-store option?

Because the g++ version I tested is shipped with CentOS/Redhat 5 and CentOS/Redhat 6.

I compiled many my programs under these platforms, I am worry about that will cause unexpected bugs inside my programs. It seems a little difficult to investigate all my C++ code and used libraries whether have such problem. Any suggestion?

Edit3:

Is anyone interested in why even /fp:fast turned on, Visual Studio 2008 still works? It seems like VS2008 is more reliable at this problem than g++?

Intel x86 processors use 80-bit extended precision internally, whereas double is normally 64-bit wide. Different optimization levels affect how often floating point values from CPU get saved into memory and thus rounded from 80-bit precision to 64-bit precision.

Use the -ffloat-store gcc option to get the same floating point results with different optimization levels.

Alternatively, use the long double type, which is normally 80-bit wide on gcc to avoid rounding from 80-bit to 64-bit precision.

man gcc says it all:

   -ffloat-store
       Do not store floating point variables in registers, and inhibit
       other options that might change whether a floating point value is
       taken from a register or memory.

       This option prevents undesirable excess precision on machines such
       as the 68000 where the floating registers (of the 68881) keep more
       precision than a "double" is supposed to have.  Similarly for the
       x86 architecture.  For most programs, the excess precision does
       only good, but a few programs rely on the precise definition of
       IEEE floating point.  Use -ffloat-store for such programs, after
       modifying them to store all pertinent intermediate computations
       into variables.

Preserve code readability while optimising

15 votes

I am writing a scientific program in Python and C with some complex physical simulation algorithms. After implementing algorithm, I found that there are a lot of possible optimizations to improve performance. Common ones are precalculating values, getting calculations out of cycle, replacing simple matrix algorithms with more complex and other. But there arises a problem. Unoptimized algorithm is much slower, but its logic and connection with theory look much clearer and readable. Also, it's harder to extend and modify optimized algorithm.

So, the question is - what techniques should I use to keep readability while improving performance? Now I am trying to keep both fast and clear branches and develop them in parallel, but maybe there are better methods?

Just as a general remark (I'm not too familiar with Python): I would suggest you make sure that you can easily exchange the slow parts of the 'reference implementation' with the 'optimized' parts (e.g., use something like the Strategy pattern).

This will allow you to cross-validate the results of the more sophisticated algorithms (to ensure you did not mess up the results), and will keep the overall structure of the simulation algorithm clear (separation of concerns). You can place the optimized algorithms into separate source files / folders / packages and document them separately, in as much detail as necessary.

Apart from this, try to avoid the usual traps: don't do premature optimization (check if it is actually worth it, e.g. with a profiler), and don't re-invent the wheel (look for available libraries).

Is it *really* worth to use integer over varchar for a set of data?

6 votes

For example if I have a table User, I want to store gender or sex, I'll add a column like sex.

Is it really worth to use an integer and then map it in my favorite programming language?

Like 1 => 'Male' and 2 => 'Female'

Is there any performance reason to do that?

Or could I safely use a varchar which more meaning with 'female' or 'male' almost like I was using mysql ENUM ?

Edit: I here and there that it is sometimes better, sometimes it doesn't matter, so I more looking for benchmark or something over a "it is better" answer.

I mean I think using varchar is actually more meaningfull than an integer, and I would use an integer only if performance are more than 0.3% or something.

Ortiginal Answer:
I would suggest storing it in a CHAR(1) column as M or F
It is expressive enough for the specific purpose AND has the speed benefit of being a single character comparison

Update 4 (fixed benchmark):
All previous benchmarks had a fatal flaw that one (the CHAR(1)) table was MyISAM and all other were InnoDB. So I recreated the database with all tables using the MyISAM and the results make much more sense now.

The error creeped in as I used the MySQLWorkbench's wizard to create the tables and forgot to change the database engine in the other tables and it defaulted to InnoDB (I have MySQL 5.5)

So the corrected results are as follows, (I have removed all my previous benchmarks as they were invalid) :

// select queries
$query['char'] = "select count(*) from test_table where gender = 'M'";
$query['char_lower'] = "select count(*) from test_table where LOWER(gender) = 'm'";
$query['varchar'] = "select count(*) from test_table_2 where gender = 'Male'";
$query['varchar_lower'] = "select count(*) from test_table_2 where LOWER(gender) = 'male'";
$query['tinyint'] = "select count(*) from test_table_3 where gender = 1";

// benchmark result
array
  'char' => float 0.35457420349121
  'char_lower' => float 0.44702696800232
  'varchar' => float 0.50844311714172
  'varchar_lower' => float 0.64412498474121
  'tinyint' => float 0.26296806335449

New Conclusion : TINYINT Is fastest. But my recommendation would be still yo use CHAR(1) as it would be easier for future developers to understand the database.

If you do use TINYINT, my recommendation would be name the column ismale instead of sex and store 0 => Female and 1 => male thus making it a little more easy to understand in raw database.

The table structure for benchmark is this:

CREATE TABLE `test_table` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `gender` char(1) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=MyISAM

Only the type of the gender column is different in the 3 tables, the types are:

CHAR(1), VARCHAR(6), TINYINT

All 3 tables have 10000 entries.