Best c questions in October 2010

What do people find difficult about C pointers?

102 votes

From the number of questions posted here, it's clear that people have some pretty fundemental issues when getting their heads around pointers and pointer arithmetic.

I'm curious to know why. They've never really caused me major problems (although I first learned about them back in the Neolithic). In order to write better answers to these questions, I'd like to know what people find difficult.

So, if you're struggling with pointers, or you recently were but suddenly "got it", what were the aspects of pointers that caused you problems?

(I've tried to word this so that it's not argumentative or subjective. I'm genuinely interested in what people have problems with here)

I suspect people are going a bit too deep in their answers. An understanding of scheduling, actual CPU operations, or assembly-level memory management isn't really required.

When I was teaching, I found the following holes in students' understanding to be the most common source of problems:

  1. Heap vs Stack storage. It is simply stunning how many people do not understand this, even in a general sense.
  2. Stack frames. Just the general concept of a dedicated section of the stack for local variables, along with the reason it's a 'stack'... details such as stashing the return location, exception handler details, and previous registers can safely be left till someone tries to build a compiler.
  3. "Memory is memory is memory" Casting just changes which versions of operators or how much room the compiler gives for a particular chunk of memory. You know you're dealing with this problem when people talk about "what (primative) variable X really is".

Most of my students were able to understand a simplified drawing of a chunk of memory, generally the local variables section of the stack at the current scope. Generally giving explicit fictional addresses to the various locations helped.

I guess in summary, I'm saying that if you want to understand pointers, you have to understand variables, and what they actually are in modern architectures.

Moving from C++ to C

57 votes

After a few years coding in C++, I was recently offered a job coding in C, in the embedded field.

Putting aside the question of whether it's right or wrong to dismiss C++ in the embedded field, there are some features/idioms in C++ I would miss a lot. Just to name a few:

  • Generic, type-safe data structures (using templates).
  • RAII. Especially in functions with multiple return points, e.g. not having to remember to release the mutex on each return point.
  • Destructors in general. I.e. you write a d'tor once for MyClass, then if a MyClass instance is a member of MyOtherClass, MyOtherClass doesn't have to explicitly deinitialize the MyClass instance - its d'tor is called automatically.
  • Namespaces.

What are your experiences moving from C++ to C?
What C substitutes did you find for your favorite C++ features/idioms? Did you discover any C features you wish C++ had?

Working on an embedded project, I tried working in all C once, and just couldn't stand it. It was just so verbose that it made it hard to read anything. Also, I liked the optimized-for-embedded containers I had written, which had to turn into much less safe and harder to fix #define blocks.

Code that in C++ looked like:

if(uart[0]->Send(pktQueue.Top(), sizeof(Packet)))
    pktQueue.Dequeue(1);

turns into:

if(UART_uchar_SendBlock(uart[0], Queue_Packet_Top(pktQueue), sizeof(Packet)))
    Queue_Packet_Dequeue(pktQueue, 1);

which many people will probably say is fine but gets ridiculous if you have to do more than a couple "method" calls in a line. Two lines of C++ would turn into five of C (due to 80-char line length limits). Both would generate the same code, so it's not like the target processor cared!

One time (back in 1995), I tried writing a lot of C for a multiprocessor data-processing program. The kind where each processor has its own memory and program. The vendor-supplied compiler was a C compiler (some kind of HighC derivative), their libraries were closed source so I couldn't use GCC to build, and their APIs were designed with the mindset that your programs would primarily be the initialize/process/terminate variety, so inter-processor communication was rudimentary at best.

I got about a month in before I gave up, found a copy of cfront, and hacked it into the makefiles so I could use C++. Cfront didn't even support templates, but the C++ code was much, much clearer.

Generic, type-safe data structures (using templates).

The closest thing C has to templates is to declare a header file with a lot of code that looks like:

TYPE * Queue_##TYPE##_Top(Queue_##TYPE##* const this)
{ /* ... */ }

then pull it in with something like:

#define TYPE Packet
#include "Queue.h"
#undef TYPE

Note that this won't work for compound types (e.g. no queues of unsigned char) unless you make a typedef first.

Oh, and remember, if this code isn't actually used anywhere, then you don't even know if it's syntactically correct.

EDIT: One more thing: you'll need to manually manage instantiation of code. If your "template" code isn't all inline functions, then you'll have to put in some control to make sure that things get instantiated only once so your linker doesn't spit out a pile of "multiple instances of Foo" errors.

To do this, you'll have to put the non-inlined stuff in an "implementation" section in your header file:

#ifdef implementation_##TYPE

/* Non-inlines, "static members", global definitions, etc. go here. */

#endif

And then, in one place in all your code per template variant, you have to:

#define TYPE Packet
#define implementation_Packet
#include "Queue.h"
#undef TYPE

Also, this implementation section needs to be outside the standard #ifndef/#define/#endif litany, because you may include the template header file in another header file, but need to instantiate afterward in a .c file.

Yep, it gets ugly fast. Which is why most C programmers don't even try.

RAII.

Especially in functions with multiple return points, e.g. not having to remember to release the mutex on each return point.

Well, forget your pretty code and get used to all your return points (except the end of the function) being gotos:

TYPE * Queue_##TYPE##_Top(Queue_##TYPE##* const this)
{
    TYPE * result;
    Mutex_Lock(this->lock);
    if(this->head == this->tail)
    {
        result = 0;
        goto Queue_##TYPE##_Top_exit:;
    }

    /* Figure out `result` for real, then fall through to... */

Queue_##TYPE##_Top_exit:
    Mutex_Lock(this->lock);
    return result;
}

Destructors in general.

I.e. you write a d'tor once for MyClass, then if a MyClass instance is a member of MyOtherClass, MyOtherClass doesn't have to explicitly deinitialize the MyClass instance - its d'tor is called automatically.

Object construction has to be explicitly handled the same way.

Namespaces.

That's actually a simple one to fix: just tack a prefix onto every symbol. This is the primary cause of the source bloat that I talked about earlier (since classes are implicit namespaces). The C folks have been living this, well, forever, and probably won't see what the big deal is.

YMMV

Why do most C developers use define instead of const?

43 votes

In many programs a #define serves the same purpose as a constant. For example.

#define FIELD_WIDTH 10
const int fieldWidth = 10;

I commonly see the first form preferred over the other, relying on the pre-processor to handle what is basically an application decision. Is there a reason for this tradition?

There is a very solid reason for this: const in C does not mean something is constant. It just means a variable is read-only.

In places where the compiler requires a true constant (such as for array sizes for non-VLA arrays), using a const variable, such as fieldWidth is just not possible.

In C99, is f()+g() undefined or merely unspecified?

31 votes

I used to think that in C99, even if the side-effects of functions f and g interfered, and although the expression f() + g() does not contain a sequence point, f and g would contain some, so the behavior would be unspecified: either f() would be called before g(), or g() before f().

I am no longer so sure. What if the compiler inlines the functions (which the compiler may decide to do even if the functions are not declared inline) and then reorders instructions? May one get a result different of the above two? In other words, is this undefined behavior?

This is not because I intend to write this kind of thing, this is to choose the best label for such a statement in a static analyzer.

The expression f() + g() contains a minimum of 4 sequence points; one before the call to f() (after all zero of its arguments are evaluated); one before the call to g() (after all zero of its arguments are evaluated); one as the call to f() returns; and one as the call to g() returns. Further, the two sequence points associated with f() occur either both before or both after the two sequence points associated with g(). What you cannot tell is which order the sequence points will occur in - whether the f-points occur before the g-points or vice versa.

Even if the compiler inlined the code, it has to obey the 'as if' rule - the code must behave the same as if the functions were not interleaved. That limits the scope for damage (assuming a non-buggy compiler).

So the sequence in which f() and g() are evaluated is unspecified. But everything else is pretty clean.


In a comment, supercat asks:

I would expect function calls in the source code remain as sequence points even if a compiler decides on its own to inline them. Does that remain true of functions declared "inline", or does the compiler get extra latitude?

I believe the 'as if' rule applies and the compiler doesn't get extra latitude to omit sequence points because it uses an explicitly inline function. The main reason for thinking that (being too lazy to look for the exact wording in the standard) is that the compiler is allowed to inline or not inline a function according to its rules, but the behaviour of the program should not change (except for performance).

Also, what can be said about the sequencing of (a(),b()) + (c(),d())? Is it possible for c() and/or d() to execute between a() and b(), or for a() or b() to execute between c() and d()?

  • Clearly, a executes before b, and c executes before d. I believe it is possible for c and d to be executed between a and b, though it is fairly unlikely that it the compiler would generate the code like that; similarly, a and b could be executed between c and d. And although I used 'and' in 'c and d', that could be an 'or' - that is, any of these sequences of operation meet the constraints:

    • abcd
    • acbd
    • acdb
    • cadb
    • cdab
    • cabd

    I'm not certain that's an exhaustive listing, but it covers most of the variants.

If such a thing would be possible, that would imply a significant difference between inline functions and macros.

There are significant differences between inline functions and macros, but I don't think the ordering in the expression is one of them. That is, any of the functions a, b, c or d could be replaced with a macro, and the same sequencing of the macro bodies could occur. The primary difference, it seems to me, is that with the inline functions, there are guaranteed sequence points at the function calls - as outlined in the main answer - as well as at the comma operators. With macros, you lose the function-related sequence points. (So, maybe that is a significant difference...) However, in so many ways the issue is rather like questions about how many angels can dance on the head of a pin - it isn't very important in practice. If someone presented me with the expression (a(),b()) + (c(),d()) in a code review, I would tell them to rewrite the code to make it clear:

a();
c();
x = b() + d();

And that assumes there is no critical sequencing requirement on b() vs d().

Do global variables mean faster code?

30 votes

I read recently, in an article on game programming written in 1996, that using global variables is faster than passing parameters.

Was this ever true, and if so, is this still true today?

Short answer - No, good programmers make code go faster by knowing and using the appropriate tools for the job, and then optimizing in a methodical way where their code does not meet their requirements.

Longer answer - This article, which in my opinion is not especially well-written, is not in any case general advice on program speedup but '15 ways to do faster blits'. Extrapolating this to the general case is missing the writer's point, whatever you think of the merits of the article.

If I was looking for performance advice, I would place zero credence in an article that does not identify or show a single concrete code change to support the assertions in the sample code, and without suggesting that measuring the code might be a good idea. If you are not going to show how to make the code better, why include it?

Some of the advice is years out of date - FAR pointers stopped being an issue on the PC a long time ago.

A serious game developer (or any other professional programmer, for that matter) would have a good laugh about advice like this:

You can either take out the assert's completely, or you can just add a #define NDEBUG when you compile the final version.

My advice to you, if you really wish to evaluate the merit of any of these 15 tips, and since the article is 14 years old, would be to compile the code in a modern compiler (Visual C++ 10 say) and try to identify any area where using a global variable (or any of the other tips) would make it faster.

[Just joking - my real advice would be to ignore this article completely and ask specific performance questions on Stack Overflow as you hit issues in your work that you cannot resolve. That way the answers you get will be peer reviewed, supported by example code or good external evidence, and current.]

Is i = 0, ++i defined ?

Asked on Mon, 18 Oct 2010 by ereOn c++ c
22 votes

I recently learned about the , operator and the fact that it introduces a sequence point.

I also learned that the following code led to undefined behavior:

i = ++i;

Because i was modified twice between two sequence points.

But what about the following codes ?

i = 0, ++i;
i = (0, ++i);

While I know the rules, I can't get to a conclusion. So is it defined behavior or not ?

edit: Just as @paxdiablo mentions, defined or not, this is really a bad practice which should be avoided. This question is asked solely for educational purposes and better understanding of the "rules".

Yes. = has higher precedence than ,, so this expression is equivalent to (i = 0), ++i. , is a sequence point, so it's guaranteed that the ++i occurs after the assignment.

I'm not sure whether i = (0, ++i) is defined though. My guess would be no; there's no sequence point between the increment and the assignment.

How to comment a few lines, with comments inside

21 votes

I have a program like this

int main(){ 

    char c;
    int i; /* counter */
    double d;

    return 0;
}

if I want to comment out char, int and double, and just have return uncommented, can I do it? the comment that's already there stops the comment.. Is there an easy/fast way to comment that out?

int main(){ 
#if 0
    char c;
    int i; /* counter */
    double d;
#endif
    return 0;
}

Not strictly a comment, but the effect is what you want and it's easy to revert.

This also scales well to larger code blocks, especially if you have an editor that can match the start and end of the #if..#endif.

Detecting signed overflow in C/C++

20 votes

At first glance, this question may seem like a duplicate of http://stackoverflow.com/questions/199333/best-way-to-detect-integer-overflow-in-c-c, however it is actually significantly different.

I've found that while detecting an unsigned integer overflow is pretty trivial, detecting a signed overflow in C/C++ is actually more difficult than most people think.

The most obvious, yet naive, way to do it would be something like:

int add(int lhs, int rhs)
{
 int sum = lhs + rhs;
 if ((lhs >= 0 && sum < rhs) || (lhs < 0 && sum > rhs)) {
  /* an overflow has occurred */
  abort();
 }
 return sum; 
}

The problem with this is that according to the C standard, signed integer overflow is undefined behavior. In other words, according to the standard, as soon as you even cause a signed overflow, your program is just as invalid as if you dereferenced a null pointer. So you can't cause undefined behavior, and then try to detect the overflow after the fact, as in the above post-condition check example.

Even though the above check is likely to work on many compilers, you can't count on it. In fact, because the C standard says signed integer overflow is undefined, some compilers (like GCC) will optimize away the above check when optimization flags are set, because the compiler assumes a signed overflow is impossible. This totally breaks the attempt to check for overflow.

So, another possible way to check for overflow would be:

int add(int lhs, int rhs)
{
 if (lhs >= 0 && rhs >= 0) {
  if (INT_MAX - lhs <= rhs) {
   /* overflow has occurred */
   abort();
  }
 }
 else if (lhs < 0 && rhs < 0) {
  if (lhs <= INT_MIN - rhs) {
   /* overflow has occurred */
   abort();
  }
 }

 return lhs + rhs;
}

This seems more promising, since we don't actually add the two integers together until we make sure in advance that performing such an add will not result in overflow. Thus, we don't cause any undefined behavior.

However, this solution is unfortunately a lot less efficient than the initial solution, since you have to perform a subtract operation just to test if your addition operation will work. And even if you don't care about this (small) performance hit, I'm still not entirely convinced this solution is adequate. The expression lhs <= INT_MIN - rhs seems exactly like the sort of expression the compiler might optimize away, thinking that signed overflow is impossible.

So is there a better solution here? Something that is guaranteed to 1) not cause undefined behavior, and 2) not provide the compiler with an opportunity to optimize away overflow checks? I was thinking there might be some way to do it by casting both operands to unsigned, and performing checks by rolling your own two's-complement arithmetic, but I'm not really sure how to do that.

Your approach with subtraction is correct and well-defined. A compiler cannot optimize it away.

Another correct approach, if you have a larger integer type available, is to perform the arithmetic in the larger type and then check that the result fits in the smaller type when converting it back

int sum(int a, int b)
{
    long long c;
    assert(LLONG_MAX>INT_MAX);
    c = (long long)a + b;
    if (c < INT_MIN || c > INT_MAX) abort();
    return c;
}

A good compiler should convert the entire addition and if statement into an int-sized addition and a single conditional jump-on-overflow and never actually perform the larger addition.

Edit: As Stephen pointed out, I'm having trouble getting a (not-so-good) compiler, gcc, to generate the sane asm. The code it generates is not terribly slow, but certainly suboptimal. If anyone knows variants on this code that will get gcc to do the right thing, I'd love to see them.

Can sizeof(int) ever be 1 on a hosted implementation?

19 votes

My view is that a C implementation cannot satisfy the specification of certain stdio functions (particularly fputc/fgetc) if sizeof(int)==1, since the int needs to be able to hold any possible value of unsigned char or EOF (-1). Is this reasoning correct?

(Obviously sizeof(int) cannot be 1 if CHAR_BIT is 8, due to the minimum required range for int, so we're implicitly only talking about implementations with CHAR_BIT>=16, for instance DSPs, where typical implementations would be a freestanding implementation rather than a hosted implementation, and thus not required to provide stdio.)

Edit: After reading the answers and some links references, some thoughts on ways it might be valid for a hosted implementation to have sizeof(int)==1:

First, some citations:

7.19.7.1(2-3):

If the end-of-file indicator for the input stream pointed to by stream is not set and a next character is present, the fgetc function obtains that character as an unsigned char converted to an int and advances the associated file position indicator for the stream (if defined).

If the end-of-file indicator for the stream is set, or if the stream is at end-of-file, the endof-file indicator for the stream is set and the fgetc function returns EOF. Otherwise, the fgetc function returns the next character from the input stream pointed to by stream. If a read error occurs, the error indicator for the stream is set and the fgetc function returns EOF.

7.19.8.1(2):

The fread function reads, into the array pointed to by ptr, up to nmemb elements whose size is specified by size, from the stream pointed to by stream. For each object, size calls are made to the fgetc function and the results stored, in the order read, in an array of unsigned char exactly overlaying the object. The file position indicator for the stream (if defined) is advanced by the number of characters successfully read.

Thoughts:

  • Reading back unsigned char values outside the range of int could simply have undefined implementation-defined behavior in the implementation. This is particularly unsettling, as it means that using fwrite and fread to store binary structures (which while it results in nonportable files, is supposed to be an operation you can perform portably on any single implementation) could appear to work but silently fail. essentially always results in undefined behavior. I accept that an implementation might not have a usable filesystem, but it's a lot harder to accept that an implementation could have a filesystem that automatically invokes nasal demons as soon as you try to use it, and no way to determine that it's unusable. Now that I realize the behavior is implementation-defined and not undefined, it's not quite so unsettling, and I think this might be a valid (although undesirable) implementation.

  • An implementation sizeof(int)==1 could simply define the filesystem to be empty and read-only. Then there would be no way an application could read any data written by itself, only from an input device on stdin which could be implemented so as to only give positive char values which fit in int.

It is possible for an implementation to meet the interface requirements for fgetc and fputc even if sizeof(int) == 1.

The interface for fgetc says that it returns the character read as an unsigned char converted to an int. Nowhere does it say that this value cannot be EOF even though the expectation is clearly that valid reads "usually" return positive values. Of course, fgetc returns EOF on a read failure or end of stream but in these cases the file's error indicator or end-of-file indicator (respectively) is also set.

Similarly, nowhere does it say that you can't pass EOF to fputc so long as that happens to coincide with the value of an unsigned char converted to an int.

Obviously the programmer has to be very careful on such platforms. This is might not do a full copy:

void Copy(FILE *out, FILE *in)
{
    int c;
    while((c = fgetc(in)) != EOF)
        fputc(c, out);
}

Instead, you would have to do something like (not tested!):

void Copy(FILE *out, FILE *in)
{
    int c;
    while((c = fgetc(in)) != EOF || (!feof(in) && !ferror(in)))
        fputc(c, out);
}

Of course, platforms where you will have real problems are those where sizeof(int) == 1 and the conversion from unsigned char to int is not an injection. I believe that this would necessarily the case on platforms using sign and magnitude or ones complement for representation of signed integers.

Why can we delete arrays, but not know the length in C/C++?

17 votes

Possible Duplicate:
C programming : How does free know how much to free?

How is is that it is possible for us to delete dynamically allocated arrays, but we can't find out how many elements they have? Can't we just divide the size of the memory location by the size of each object?

In C++, both...

  • the size (bytes) requested by a new, new[] or malloc call, and
  • the number of array elements requested in a new[] dynamic allocation

...are implementation details that the Standard doesn't require be made available programatically, even though the memory allocation library must remember the former and the compiler the latter so it can invoke the destructor on the correct number of elements.

Sometimes the compiler may see there's a constant-sized allocation and be able to associate it reliably with the corresponding deallocation, so it could generate code customised for these compile-time-known values (e.g. inlining and loop unrolling), but in complex usage (and when handling external inputs) a compiler may need to store and retrieve the # elements at run-time: enough space for the #element counter might be put - for example - immediately before or after the address returned for the array content, with delete[] knowing about this convention. In practice, a compiler may choose to always handle this at run-time just for the simplicity that comes with consistency. Other run-time possibilities exist: e.g. the # elements might be derivable from some insight into the specific memory pool from which the allocation was satisfied combined with the object size.

The Standard doesn't provide programmatic access to ensure implementations are unfettered in the optimisations (in speed and/or space) they may use.

(The size of the memory location may be greater than the exact size required for the requested number of elements - that size is remembered by the memory allocation library, which may be a black-box library independent of the C++ compiler).

Where can I find a C99 front end with good error message for students

16 votes

I'm teaching a course in which students get their first experience programming in C. We're using gcc on Linux and (for beginning students) the user experience is terrible. I'm looking for a C front end that will do one or more of the following:

  1. If it cannot find a .h file, announce the fact and shut up. Don't spray screenfuls of garbage so that the information about the missing .h file is lost.

  2. If an known name appears in a type-like position, it should trigger a sane, sensible error message. For example, the source code

    typedef struct Queue {
        Array_T elements;
        int first, last, length;
    } Queue;
    

    Should trigger a message like

    Line 2: I saw `Array_T` used as a type, but it 
    has never been declared as a type.
    

    In this same situation, or especially in function prototypes, gcc produces incomprehensible gobbledegook.

  3. In an ideal world, the tool would also detect attempts to link libraries out of dependency order. After all, we know how to do topological sort now.

There is no requirement that the tool be able to compile any code. I am looking only for a way to get better error messages. The only hard constraint is that it has to be able to run on 64-bit Red Hat Enterprise Linux, because that's our teaching platform. Also, it has to understand C99.

I'm aware of research front ends like CIL and an older effort at AT&T, but I don't know how they are on error messages. Similar comments on the EDG front end, which is free for educational use. I'm also aware that Gimpel Software may have some commercial offerings that might fit the bill. I don't know what budget we have, but I'm willing to go to my department head and ask for money.

I ask the collective wisdom of StackOverflow: what C99 front ends are available which issue error messages that beginners can understand?

Clang generates far better error messages. Get it and try compiling

#include <bogus.h>

It gives

aho.c:1:10: fatal error: 'bogus.h' file not found
#include <bogus.h>
         ^
1 error generated.

It even uses colors on ANSI terminals.

Try

typedef struct Queue {
    Array_T elements;
    int first, last, length;
} Queue;

which gives

aho.c:2:5: error: unknown type name 'Array_T'
    Array_T elements;
    ^
1 error generated.

These features are not just good for beginners. It's good for every programmer!

Was there a specific reason garbage collection was not designed for C?

15 votes

I have heard that it was suboptimal for C to automatically collect garbage -- is there any truth to this?

Was there a specific reason garbage collection was not implemented for C?

Don't listen to the "C is old and that's why it doesn't have GC" folks. There are fundamental problems with GC that cannot be overcome which make it incompatible with C.

The biggest problem is that accurate garbage collection requires the ability to scan memory and identify any pointers encountered. Some higher level languages limit integers not to use all the bits available, so that high bits can be used to distinguish object references from integers. Such languages may then store strings (which could contain arbitrary octet sequences) in a special string zone where they can't be confused with pointers, and all is well. A C implementation, however, cannot do this because bytes, larger integers, pointers, and everything else can be stored together in structures, unions, or as part of chunks returned by malloc.

What if you throw away the accuracy requirement and decide you're okay with a few objects never getting freed because some non-pointer data in the program has the same bit pattern as these objects' addresses? Now suppose your program receives data from the outside world (network/files/etc.). I claim I can make your program leak an arbitrary amount of memory, and eventually run out of memory, as long as I can guess enough pointers and emulate them in the strings I feed your program. This gets a lot easier if you apply De Bruijn Sequences.

Aside from that, garbage collection is just plain slow. You can find hundreds of academics who like to claim otherwise, but that won't change the reality. The performance issues of GC can be broken down into 3 main categories:

  • Unpredictability
  • Cache pollution
  • Time spent walking all memory

The people who will claim GC is fast these days are simply comparing it to the wrong thing: poorly written C and C++ programs which allocate and free thousands or millions of objects per second. Yes, these will also be slow, but at least predictably slow in a way you can measure and fix if necessary. A well-written C program will spend so little time in malloc/free that the overhead is not even measurable.

Why does man 2 open say this?

14 votes

I ran into this question while typing man 2 open. It says that there are two kinds of open, one with two args, and one with three! last time i checked we could not overload functions in C. How did they do this? did they write in C++?

int open(const char * pathname, int flags);
int open(const char * pathname, int flags, mode_t mode);

No, they just used variadic function.

int open(const char * pathname, int flags, ...);

This makes the last argument mode optional. The prototypes only show how the function should be used, not the actual interface.

Of course, unlike real overloading, the compiler cannot type-check the mode argument, so the user have to be extra careful to ensure only 2 or 3 arguments are passed, and the 3rd argument must be a mode_t.


BTW, if you check the man 2 open for BSD (including OS X) it shows the correct prototype as above.

How does C-- compare to LLVM ?

13 votes

After learning a bit of how LLVM work I'm really excited about how portable low-level code can be generated and how modular this 'things' is built.

But I discovered today the existence of C-- that seems to share some concepts with LLVM.

So I'm looking for some informations helping me to understand the main differences between this two projects... and why both exist.

For me LLVM look a bit like the ultimate swiss-knife for compiler infrastructure, and C-- look far least advanced.

They differ in how expressive the low level machine type system is.

The LLVM machine is pretty expressive, but the C-- machine puts a lot of responsibility on the language front end, quoting from the C-- home page: "simply, C-- has no high-level types---it does not even distinguish floating-point variables from integer variables. This model gives the front end total control of representation and type system"

Also visually they look a lot different. C-- looks a lot like C, LLVM looks a lot like assembler.

Pragmatically, LLVM has a lot more momentum right now. It has a JIT compiler, Apple is using it for 3D pipeline things and people are using it to connect to GCC and all sorts of weird and wonderful things. Someone called it "almost absurdly easy to work with".

On the other hand C-- is much smaller and probably easier to entirely comprehend.

Is there any alternative for printf ?

13 votes

I have to create a software that must work on several *nix platforms (Linux, AIX, ...).

I need to handle internationalization and my translation strings are in the following form:

"Hi %1, you are %2." // English
"Vous êtes %2, bonjour %1 !" // French

Here %1 stand for the name, and %2 for another word. I may change the format, that's not an issue.

I tried to use printf() but you cannot specify the order of the parameters, you just specify their types.

"Hi %s, you are %s"
"Vous êtes %s, bonjour %s !"

Now there is no way to know which parameter to use for replacement of %s: printf() just uses the first one, then the next.

Is there any alternative to printf() that deals with this ?

Note: gettext() is not an option.

POSIX printf() supports positional arguments.

printf("Hi %1$s, you are %2$s.", name, status);
printf("Vous êtes %2$s, bonjour %1$s !", name, status);

Is rebasing DLLs (or providing an appropriate default load address) worth the trouble?

13 votes

Rebasing a DLL means to fix up the DLL such, that it's preferred load adress is the load address that the Loader is actually able to load the DLL at.

This can either be achieved by a tool such as Rebase.exe or by specifying default load addresses for all your (own) dlls so that they "fit" in your executable process.

The whole point of managing the DLL base addresses this way is to speed up application loads. (Or so I understand.)

The question is now: Is it worth the trouble?

I have the book Windows via C/C++ by Richter/Nazarre and they strongly recommend[a] making sure that the load addresses all match up so that the Loader doesn't have to rebase the loaded DLLs.

They fail to argue however, if this speeds up application load times to any significant amount.

Also, with ASLR it seems dubious that this has any value at all, since the load addresses will be randomized anyway.

Are there any hard facts on the pro/cons of this?

[a]: In my WvC++/5th ed it is in the sections titled Rebasing Modules and Binding Modules on pages 568ff. in Chapter 20, DLL Advanced Techniques.

I'd like to provide one answer myself, although the answers of Hans Passant and others are describing the tradeoffs already pretty well.

After recently fiddling with DLL base adresses in our application, I will here give my conclusion:

I think that, unless you can prove otherwise, providing DLLs with a non-default Base Address is an exercise in futility. This includes rebasing my DLLs.

  • For the DLLs I control, given the average application, each DLL will be loaded into memory only once anyway, so the load on the paging file should be minimal. (But see the comment of Michal Burr in another answer about Terminal Server environment.)

  • If DLLs are provided with a fixed base address (without rebasing) it will actually increase address space fragmentation, as sooner or later these addresses won't match anymore. In our app we had given all DLLs a fixed base address (for other legacy reasons, and not because of address space fragmentation) without using rebase.exe and this significantly increased address space fragmentation for us because you really can't get this right manually.

  • Rebasing (via rebase.exe) is not cheap. It is another step in the build process that has to be maintained and checked, so it has to have some benefit.

  • A large application will always have some DLLs loaded where the base address does not match, because of some hook DLLs (AV) and because you don't rebase 3rd party DLLs (or at least I wouldn't).

  • If you're using a RAM disk for the paging file, you might actually be better of if loaded DLLs get paged out :-)

So to sum up, I think that rebasing isn't worth the trouble except for special cases like the system DLLs.

First programming language to be taught - C or Python?

11 votes

I know that there is a long debate regarding this matter. I also understand that this is strictly not a programming question. But I am asking here as this platform contains wide range of experts from different realms.

When we got admitted in a Computer Science and Engineering(CSE) course in university, we were first taught C. The course was actually structured programming language but we used C as the language. And on next semester we were taught C++ and Java as OOP. Recently I have heard that the department is going to introduce Python as the first language. I strongly oppose the idea for the following reasons:

  1. Python is a super high language. In the first course the students should become familiar with the basics of programming concepts like data type, pointer, by value or by reference etc. You can write lots of things in Python without understanding these in details.

  2. Python has a wide range of build in data structures and library. In first language students should become familiar with basic algorithms like sorting or searching. I know there is sorting library in C too, but that is not as widely used as Python's sorting methods.

  3. Python is OOP. How can you teach someone OOP when (s)he does not have the basic knowledge of structured programming. If Python is the first language, then they might not differ OOP with non-OOP concepts.

  4. Memory is crucial. If you allocate, then you need to release the memory. These concepts are not necessary with a language with garbage collector.

So what is your opinion? What do you prefer as the first teaching language?

Please don't start a flamewar or something similar. Whatever you suggests, please explain why you think so. And also please keep in mind that the course is for university level. It's not for kids and so trying to make things simple is not much helpful.

And also I know that Python is a great language. I am personally a fan of it. But the question is whether Python should be first teaching language instead of C.

Thanks in advance.

EDIT :

  1. When I asked this, I was not aware about programmers.stackexchange.com. It can be moved there if that is better.

  2. The question contains my opinion. That does not mean I don't wanna hear others. In fact that is exactly what I want. Please don't get me wrong. I am not designing the curriculum. So my opinion has no effect on it. The thing is I think this and this, and I want to hear what others think.

  3. I am well aware that this is not a question in that sense. My first para tells that.

Bottom-up learning is often considered the "better" way to learn. Start from first principles and make your way up to more advanced ideas. The problem with this approach is that much of what we humans learn in life doesn't follow that model at all. Children, in fact, are the fastest learners and they do so by pattern-matching, extrapolation, interpolation, etc., all of which would be thoroughly frowned upon by anyone promoting the classical bottom up system. And yet somehow they run rings around adults in learning, say, the language in a new country, not by reading the bottom-up text books faster than the grown-ups, but by talking to other kids.

I don't know whether Python is the best language to use, but I do believe that any language that gets people writing code and solving interesting problems quickly can't be too bad a choice.

In regards to for(), why use i++ rather than ++i?

10 votes

Perhaps it doesn't matter to the compiler once it optimizes, but in C/C++, I see most people make a for loop in the form of:

for (i = 0; i < arr.length; i++)

where the incrementing is done with the post fix ++. I get the difference between the two forms. i++ returns the current value of i, but then adds 1 to i on the quiet. ++i first adds 1 to i, and returns the new value (being 1 more than i was).

I would think that i++ takes a little more work, since a previous value needs to be stored in addition to a next value: Push *(&i) to stack (or load to register); increment *(&i). Versus ++i: Increment *(&i); then use *(&i) as needed.

(I get that the "Increment *(&i)" operation may involve a register load, depending on CPU design. In which case, i++ would need either another register or a stack push.)

Anyway, at what point, and why, did i++ become more fashionable?


I'm inclined to believe azheglov: It's a pedagogic thing, and since most of us do C/C++ on a Window or *nix system where the compilers are of high quality, nobody gets hurt.

If you're using a low quality compiler or an interpreted environment, you may need to be sensitive to this. Certainly, if you're doing advanced C++ or device driver or embedded work, hopefully you're well seasoned enough for this to be not a big deal at all. (Do dogs have Buddah-nature? Who really needs to know?)

My theory (why i++ is more fashionable) is that when people learn C (or C++) they eventually learn to code iterations like this:

while( *p++ ) {
    ...
}

Note that the post-fix form is important here (using the infix form would create a one-off type of bug).

When the time comes to write a for loop where ++i or i++ doesn't really matter, it may feel more natural to use the postfix form.

ADDED: What I wrote above applies to primitive types, really. When coding something with primitive types, you tend to do things quickly and do what comes naturally. That's the important caveat that I need to attach to my theory.

If ++ is an overloaded operator on a C++ class (the possibility Rich K. suggested in the comments) then of course you need to code loops involving such classes with extreme care as opposed to doing simple things that come naturally.

print call stack in C or C++

9 votes

Is there any way to dump the call stack in a running process in C or C++ every time a certain function is called? What I have in mind is something like this:

void foo()
{
   print_stack_trace();

   // foo's body

   return
}

Where print_stack_trace works similarly to caller in Perl.

Or something like this:

int main (void)
{
    // will print out debug info every time foo() is called
    register_stack_trace_function(foo); 

    // etc...
}

where register_stack_trace_function puts some sort of internal breakpoint that will cause a stack trace to be printed whenever foo is called.

Does anything like this exist in some standard C library?

I am working on Linux, using GCC.


Background

I have a test run that behaves differently based on some commandline switches that shouldn't affect this behavior. My code has a pseudo-random number generator that I assume is being called differently based on these switches. I want to be able to run the test with each set of switches and see if the random number generator is called differently for each one.

For a linux-only solution you can use backtrace(3) that simply returns an array of void * (in fact each of these point to the return address from the corresponding stack frame). To translate these to something of use, there's backtrace_symbols(3).

Pay attention to the notes section in backtrace(3):

The symbol names may be unavailable without the use of special linker options. For systems using the GNU linker, it is necessary to use the -rdynamic linker option. Note that names of "static" functions are not exposed, and won't be available in the backtrace.

I want to learn Objective-c, but...

6 votes

...I am not sure which book to go with.

I came to conclusion and it's between those two: [Programming in Objective-C 2.0] vs. [Learn Objective-C on the Mac].

My programming experience/skill is fairly strict, as I just know a little C. I know some loops (if, if-else, while, for, do, case) and how algorithm work. That's pretty much it, I have no experience in pointers, I/O or more advanced stuff. I read nearly half of the book [Learn C on the Mac], until Chapter 7: Pointers and Parameters. I liked the book, it gave me a push inside the World of Programming and learned me the basics and thought that was enough to go on with Objective-C.

But I am not sure what to do from here: which book to continue reading in. I have started reading a little in [Programming in Objective-C 2.0] until chapter 4, but I have had a hard time follow him. I am not sure if I get the syntax correctly and if I understand the different class, objects, methods and instances properly. But that's not the question for now, sorry for off-topic :P

What would you recommend for me to read, based on the knowledge I have in programming (with only a procedural language like C)? Would you even recommend me to read a back or start somewhere else? I read that Kochan's book is the single best, but I am not sure if it's too advanced stuff for me. What is your impression on the book?

Thank you all so very much in advance!

  • P.S. My native language is Danish, so sorry for typos and grammar mistakes
  • P.P.S. This is my first post to this board, I just signed up. Please correct me if I did something wrong. Thank you.

Cocoa Programming for Mac OS X, by Aaron Hillegas.