I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:
#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
for(int i = 0; i < TIMES; i++)
{
for(int j = 0; j < length; j++)
x[j] += y[j];
}
}
void calcuAsm(int *x,int *y,int lengthOfArray)
{
__asm
{
mov edi,TIMES
start:
mov esi,0
mov ecx,lengthOfArray
label:
mov edx,x
push edx
mov eax,DWORD PTR [edx + esi*4]
mov edx,y
mov ebx,DWORD PTR [edx + esi*4]
add eax,ebx
pop edx
mov [edx + esi*4],eax
inc esi
loop label
dec edi
cmp edi,0
jnz start
};
}
Here's main():
int main() {
bool errorOccured = false;
setbuf(stdout,NULL);
int *xC,*xAsm,*yC,*yAsm;
xC = new int[2000];
xAsm = new int[2000];
yC = new int[2000];
yAsm = new int[2000];
for(int i = 0; i < 2000; i++)
{
xC[i] = 0;
xAsm[i] = 0;
yC[i] = i;
yAsm[i] = i;
}
time_t start = clock();
calcuC(xC,yC,2000);
// calcuAsm(xAsm,yAsm,2000);
// for(int i = 0; i < 2000; i++)
// {
// if(xC[i] != xAsm[i])
// {
// cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
// errorOccured = true;
// break;
// }
// }
// if(errorOccured)
// cout<<"Error occurs!"<<endl;
// else
// cout<<"Works fine!"<<endl;
time_t end = clock();
// cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"\n";
cout<<"time = "<<end - start<<endl;
return 0;
}
Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.
And here comes the result.
Function of assembly version:
Debug Release
---------------
732 668
733 680
659 672
667 675
684 694
Average: 677
Function of C++ version:
Debug Release
-----------------
1068 168
999 166
1072 231
1002 166
1114 183
Average: 182
The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?
I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?
Yes, yes yes. Most times.
Compilers can do optimizations that most people can't even imagine (see this short list).
They can take in account inter-procedural optimization and whole-program optimization. Assembly programmer has to make well-defined functions with a well-defined call interface. This prevents many of the optimization methods that compilers use, such
as register allocation, constant propagation, common subexpression elimination across functions, scheduling across functions, and other complex, not obvious optimizations (Polytope model, for example). It's not amazing, they can verify in one second what you'll need 2 days to calculate. On RISC architecture guys stopped to worry about this many years ago (Instruction Scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.
For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.
If you write something in assembly, I think you have to consider at least some simple optimizations (take a look at MasmCode. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again. It could demonstrate why your debug version is slower in pure C++ (no optimizations).
That said, modern compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (I'm not an assembly guru so I don't even try talk about code you wrote).
Just for loops this is a short list of loop optimizations of what is common to check for a compiler (do you think you may do it by yourself when your schedules has been decided for a C# program?)
These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs! Do you want to support them all? Each has a specific micro-architecture and some specific instruction set. For small tasks (like this) the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).
You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 200.000+ lines of C++ code). I think compilers will produce better assembly code 95% times (moreover we don't have to forget that an assembler is a compiler too and it'll do optimizations) and sometimes and only some rare times you may need to write assembly code for few, short, highly used, performance critical routines.

