## Which is better option to use for dividing an integer number by 2?

Which of the following techniques is the best option for dividing an integer by 2 and why?

Technique 1:

``````x = x >> 1;
``````

Technique 2:

``````x = x / 2;
``````

Here `x` is an integer.

Use the operation that best describes what you are trying to do.

• If you are treating the number as a sequence of bits, use bitshift.
• If you are treating it as a numerical value, use division.

Note that they are not exactly equivalent. They can give different results for negative integers. For example:

``````-5 / 2  = -2
-5 >> 1 = -3
``````

(ideone)

## Why would code actively try to prevent tail-call optimization?

The title of the question might be a bit strange, but the thing is that, as far as I know, there is nothing that speaks against tail call optimization at all. However, while browsing open source projects, I already came across a few functions that actively try to stop the compiler from doing a tail call optimization, for example the implementation of CFRunLoopRef which is full of such hacks. For example:

``````static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__() __attribute__((noinline));
static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__(CFRunLoopObserverCallBack func, CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
if (func) {
func(observer, activity, info);
}
getpid(); // thwart tail-call optimization
}
``````

I would love to know why this is seemingly so important, and are there any cases were I as a normal developer should keep this is mind too? Eg. are there common pitfalls with tail call optimization?

This is only a guess, but maybe to avoid an infinite loop vs bombing out with a stack overflow error.

Since the method in question doesn't put anything on the stack it would seem possible for the tail-call recursion optimization to produce code that would enter an infinite loop as opposed to the non-optimized code which would put the return address on the stack which would eventually overflow in the event of misuse.

The only other thought I have is related to preserving the calls on the stack for debugging and stacktrace printing.

## What's the result of += in C and C++?

I've got the following code:

``````#include <stdio.h>
int main(int argc, char **argv) {
int i = 0;
(i+=10)+=10;
printf("i = %d\n", i);
return 0;
}
``````

If I try to compile it as a C source using gcc I get an error:

``````error: lvalue required as left operand of assignment
``````

But if I compile it as a `C++` source using `g++` I get no error and when i run the executable:

``````i = 20
``````

Why the different behaviour?

Semantics of the add-assign operators is different in C and C++:

C99 standard, 6.5.16, part 3:

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

In C++ 5.17.1:

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modiﬁable lvalue as their left operand and return an lvalue with the type and value of the left operand after the assignment has taken place.

EDIT : The behavior of `(i+=10)+=10` in C++ is undefined in C++98, but well defined in C++11. See this answer to the question by aix for the relevant portions of the standards.

## Single and double quotes in C/C++

I was looking at the question Single quotes vs. double quotes in C. I couldn't completely understand the explanation given so I wrote a program

``````#include <stdio.h>
int main()
{
char ch = 'a';
printf("sizeof(ch) :%d\n", sizeof(ch));
printf("sizeof(\'a\') :%d\n", sizeof('a'));
printf("sizeof(\"a\") :%d\n", sizeof("a"));
printf("sizeof(char) :%d\n", sizeof(char));
printf("sizeof(int) :%d\n", sizeof(int));
return 0;
}
``````

I compiled them using both gcc and g++ and these are my outputs

gcc:

``````sizeof(ch)   : 1
sizeof('a')  : 4
sizeof("a")  : 2
sizeof(char) : 1
sizeof(int)  : 4
``````

g++:

``````sizeof(ch)   : 1
sizeof('a')  : 1
sizeof("a")  : 2
sizeof(char) : 1
sizeof(int)  : 4
``````

The g++ output makes sense to me and I don't have any doubt regarding that. In gcc what is the need to have `sizeof('a')` to be different from `sizeof(char)`. Is there some actual reason behind it or is it just historical?

Also in C if `char` and `'a'` have different size does that mean when we are doing `char ch = 'a';` we are doing implicit type-conversion?

In C, character constants such as `'a'` have type `int`, in C++ it's `char`.

Regarding the last question, yes,

``````char ch = 'a';
``````

causes an implicit conversion of the `int` to `char`.

## What are the incompatible differences betweeen C(99) and C++(11)?

This question was triggered by replie(s) to a post by Herb Sutter where he explained MS's decision to not support/make a C99 compiler but just go with the C(99) features that are in the C++(11) standard anyway.

(...) C is important and deserves at least a little bit of attention.

There is a LOT of existing code out there that is valid C but is not valid C++. That code is not likely to be rewritten (...)

Since I only program in MS C++, I really don't know "pure" C that well, i.e. I have no ready picture of what details of the C++-language I'm using are not in C(99) and I have little clues where some C99 code would not work as-is in a C++ compiler.

Note that I know about the C99 only `restrict` keyword which to me seems to have very narrow application and about variable-length-arrays (of which I'm not sure how widespread or important they are).

Also, I'm very interested whether there are any important semantic differences or gotchas, that is, C(99) code that will compiler under C++(11) but do something differently with the C++ compiler than with the C compiler.

If you start from the common subset of C and C++, sometimes called clean C (which is not quite C90), you have to consider 3 types of incompatibilities:

1. Additional C++ featues which make legal C illegal C++

Examples for this are C++ keywords which can be used as identifiers in C or conversions which are implicit in C but require an explicit cast in C++.

This is probably the main reason why Microsoft still ships a C frontend at all: otherwise, legacy code that doesn't compile as C++ would have to be rewritten.

2. Additional C features which aren't part of C++

The C language did not stop evolving after C++ was forked. Some examples are variable-length arrays, designated initializers and `restrict`. These features can be quite handy, but aren't part of any C++ standard, and some of them will probably never make it in.

3. Features which are available in both C and C++, but have different semantics

An example for this would be the linkage of `const` objects or `inline` functions.

A list of incompatibilities between C99 and C++98 can be found here (which has already been mentioned by Mat).

While C++11 and C11 got closer on some fronts (variadic macros are now available in C++, variable-length arrays are now an optional C language feature), the list of incompatibilities has grown as well (eg generic selections in C and the `auto` type-specifier in C++).

As an aside, while Microsoft has taken some heat for the decision to abandon C (which is not a recent one), as far as I know no one in the open source community has actually taken steps to do something about it: It would be quite possible to provide many features of modern C via a C-to-C++ compiler, especially if you consider that some of them are trivial to implement. This is actually possible right now using Comeau C/C++, which does support C99.

However, it's not really a pressing issue: Personally, I'm quite comfortable with using GCC and Clang on Windows, and there are proprietary alternatives to MSVC as well, eg Pelles C or Intel's compiler.

## Is function call an effective memory barrier for modern platforms?

In a codebase I reviewed, I found the following idiom.

``````void notify(struct actor_t act) {
write(act.pipe, "M", 1);
}
void send(byte *data) {
global.data = data;
}
// in thread B event loop
switch (cmd) {
case 'M': use_data(global.data);break;
...
}
``````

"Hold it", I said to the author, a senior member of my team, "there's no memory barrier here! You don't guarantee that `global.data` will be flushed from the cache to main memory. If thread A and thread B will run in two different processors - this scheme might fail".

The senior programmer grinned, and explained slowly, as if explaining his five years old boy how to tie his shoelaces: "listen young boy, we've seen here many thread related bugs, in high load testing, and in real clients", he paused to scratch his longish beard, "but we've never had bug with this idiom".

"but, it says in the book..."

"quite!", he hushed me promptly,

"maybe theoretically, it's not guaranteed, in practice, the fact you used a function call is effectively a memory barrier, the compiler will not reorder the instruction `global.data = data`, since it can't know if anyone using it in the function call, and the x86 architecture will ensure that the other CPUs will see this piece of global data by the time thread B reads the command from the pipe. Rest assure, we have ample of real world problems to worry at, we don't need to invest extra effort in bogus theoretical problems.

Rest assure my boy, in time you'll understand to separate the real problem from the I-need-to-get-a-PhD non problems."

Is he correct? is that really a non-issue in practice (say x86, x64 and ARM)?

It's against everything I learned, but he does have a long beard and a really smart looks!

Extra points if you can show me a piece of code proving him wrong!

Memory barriers aren't just to prevent instruction reordering. Even if instructions aren't reordered it can still cause problems with cache coherence. As for the reordering - it depends on your compiler and settings. ICC is particularly agressive with reordering. MSVC w/ whole program optimization can be, too.

If your shared data variable is declared as `volatile`, even though it's not in the spec most compilers will generate a memory variable around reads and writes from the variable and prevent reordering. This is not the correct way of using `volatile`, nor what it was meant for.

## Cast performance from size_t to double

TL;DR: Why is multiplying/casting data in `size_t` slow and why does this vary per platform?

I'm having some performance issues that I don't fully understand. The context is a camera frame grabber where a 128x128 uint16_t image is read and post-processed at a rate of several 100 Hz.

In the post-processing I generate a histogram `frame->histo` which is of `uint32_t` and has `thismaxval` = 2^16 elements, basically I tally all intensity values. Using this histogram I calculate the sum and squared sum:

``````double sum=0, sumsquared=0;
size_t thismaxval = 1 << 16;

for(size_t i = 0; i < thismaxval; i++) {
sum += (double)i * frame->histo[i];
sumsquared += (double)(i * i) * frame->histo[i];
}
``````

Profiling the code with profile I got the following (samples, percentage, code):

`````` 58228 32.1263 :  sum += (double)i * frame->histo[i];
116760 64.4204 :  sumsquared += (double)(i * i) * frame->histo[i];
``````

or, the first line takes up 32% of CPU time, the second line 64%.

I did some benchmarking and it seems to be the datatype/casting that's problematic. When I change the code to

``````uint_fast64_t isum=0, isumsquared=0;

for(uint_fast32_t i = 0; i < thismaxval; i++) {
isum += i * frame->histo[i];
isumsquared += (i * i) * frame->histo[i];
}
``````

it runs ~10x faster. However, this performance hit also varies per platform. On the workstation, a Core i7 CPU 950 @ 3.07GHz the code is 10x faster. On my Macbook8,1, which has a Intel Core i7 Sandy Bridge 2.7 GHz (2620M) the code is only 2x faster.

Now I am wondering:

1. Why is the original code so slow and easily sped up?
2. Why does this vary per platform so much?

Update:

I compiled the above code with

``````g++ -O3  -Wall cast_test.cc -o cast_test
``````

Update2:

I ran the optimized codes through a profiler (Instruments on Mac, like Shark) and found two things:

1) The looping itself takes a considerable amount of time in some cases. `thismaxval` is of type `size_t`.

1. `for(size_t i = 0; i < thismaxval; i++)` takes 17% of my total runtime
2. `for(uint_fast32_t i = 0; i < thismaxval; i++)` takes 3.5%
3. `for(int i = 0; i < thismaxval; i++)` does not show up in the profiler, I assume it's less than 0.1%

2) The datatypes and casting matter as follows:

1. `sumsquared += (double)(i * i) * histo[i];` 15% (with `size_t i`)
2. `sumsquared += (double)(i * i) * histo[i];` 36% (with `uint_fast32_t i`)
3. `isumsquared += (i * i) * histo[i];` 13% (with `uint_fast32_t i`, `uint_fast64_t isumsquared`)
4. `isumsquared += (i * i) * histo[i];` 11% (with `int i`, `uint_fast64_t isumsquared`)

Surprisingly, `int` is faster than `uint_fast32_t`?

Update4:

I ran some more tests with different datatypes and different compilers, on one machine. The results are as follows.

For testd 0 -- 2 the relevant code is

``````    for(loop_t i = 0; i < thismaxval; i++)
sumsquared += (double)(i * i) * histo[i];
``````

with `sumsquared` a double, and `loop_t` `size_t`, `uint_fast32_t` and `int` for tests 0, 1 and 2.

For tests 3--5 the code is

``````    for(loop_t i = 0; i < thismaxval; i++)
isumsquared += (i * i) * histo[i];
``````

with `isumsquared` of type `uint_fast64_t` and `loop_t` again `size_t`, `uint_fast32_t` and `int` for tests 3, 4 and 5.

The compilers I used are gcc 4.2.1, gcc 4.4.7, gcc 4.6.3 and gcc 4.7.0. The timings are in percentages of total cpu time of the code, so they show relative performance, not absolute (although the runtime was quite constant at 21s). The cpu time is for both two lines, because I'm not quite sure if the profiler correctly separated the two lines of code.

```gcc:    4.2.1  4.4.7  4.6.3  4.7.0
----------------------------------
test 0: 21.85  25.15  22.05  21.85
test 1: 21.9   25.05  22     22
test 2: 26.35  25.1   21.95  19.2
test 3: 7.15   8.35   18.55  19.95
test 4: 11.1   8.45   7.35   7.1
test 5: 7.1    7.8    6.9    7.05
```

or:

Based on this, it seems that casting is expensive, regardless of what integer type I use.

Also, it seems gcc 4.6 and 4.7 are not able to optimize loop 3 (size_t and uint_fast64_t) properly.

1. The code is slow because it involves the conversion from integer to float data types. That's why it's easily sped up when you use also an integer datatype for the sum-variables because it doesn't require a float-conversion anymore.
2. The difference is the result of several factors. For example it depends on how efficient a platform is able to perform an int->float conversion. Furthermore this conversion could also mess up processor-internal optimizations in the program flow and prediction engine, caches, ... and also the internal parallelizing-features of the processors can have a huge influence in such calculations.

• "Surprisingly int is faster than uint_fast32_t"? What's the sizeof(size_t) and sizeof(int) on your platform? One guess I can make is, that both are probably 64bit and therefore a cast to 32bit not only can give you calculation errors but also includes a different-size-casting penalty.

In general try to avoid visible and hidden casts as good as possible if these aren't really necessary. For example try to find out what real datatype is hidden behind "size_t" on your environment (gcc) and use that one for the loop-variable. In your example the square of uint's cannot be a float datatype so it makes no sense to use double here. Stick to integer types to achieve maximum performance.

## C and C++ : Partial initialization of automatic structure

For example, if `somestruct` has three integer members, I had always thought that it was OK to do this in C (or C++) function:

``````somestruct s = {123,};
``````

The first member would be initialized to 123 and the last two would be initialized to 0. I often do the same thing with automatic arrays, writing `int arr[100] = {0,};` so that all integers in an array are initialized to zero.

Recently I read in the GNU C Reference Manual that:

If you do not initialize a structure variable, the effect depends on whether it is has static storage (see Storage Class Specifiers) or not. If it is, members with integral types are initialized with 0 and pointer members are initialized to NULL; otherwise, the value of the structure's members is indeterminate.

Can someone please tell me what the C and C++ standards say regarding partial automatic structure and automatic array initialization? I do the above code in Visual Studio without a problem but I want to be compatible with gcc/g++, and maybe other compilers as well. Thanks

The linked gcc documentation does not talk of Partial Initialization it just talks of (Complete)Initialization or No Initialization.

What is partial Initialization?

Partial Initialization occurs when you provide some initializers but not all i.e: Fewer initializers than the size of the array or the number of structure elements being initialized.

Example:

``````int array[10] = {1,2};                    //Case 1:Partial Initialization
``````

What is (Complete)Initialization or No Initialization?

Initialization means providing some initial value to the variable being created at the same time when it is being created. ie: in the same code statement.

Example:

``````int array[10] = {0,1,2,3,4,5,6,7,8,9};    //Case 2:Complete Initialization
int array[10];                            //Case 3:No Initialization
``````

The quoted paragraph describes the behavior for `Case 3`.

The rules regarding Partial Initialization(`Case 1`) are well defined by the standard and these rules do not depend on the storage type of the variable being initialized.
AFAIK, All mainstream compilers have 100% compliance to these rules.

Can someone please tell me what the C and C++ standards say regarding partial automatic structure and automatic array initialization?

The C and C++ standards guarantee that even if an integer array is located on automatic storage and if there are fewer initializers in a brace-enclosed list then the uninitialized elements must be initialized to `0`.

C99 Standard 6.7.8.21

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.

In C++ the rules are stated with a little difference.

C++03 Standard 8.5.1 Aggregates
Para 7:

If there are fewer initializers in the list than there are members in the aggregate, then each member not explicitly initialized shall be value-initialized (8.5). [Example:

`````` struct S { int a; char* b; int c; };
S ss = { 1, "asdf" };
``````

initializes `ss.a` with `1`, `ss.b` with `"asdf"`, and `ss.c` with the value of an expression of the form `int()`, that is,`0`. ]

While Value Initialization is defined in,
C++03 8.5 Initializers
Para 5:

To value-initialize an object of type T means:
— if T is a class type (clause 9) with a user-declared constructor (12.1), then the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
— if T is a non-union class type without a user-declared constructor, then every non-static data member and base-class component of T is value-initialized;
— if T is an array type, then each element is value-initialized;
— otherwise, the object is zero-initialized

## Do C & C++ compilers optimize comparisons with function calls?

Do C and C++ compilers generally optimize comparisons with functions?

For example, this page suggests that the `size` function on std::lists in C++ can have a linear complexity O(N) in some standard library implementations (which makes sense for a linked list).

But in that case, if `myList` is a huge list, what would something like this do?

``````    if (myList.size() < 5) return 1;
else return 2;
``````

Would the size() function find and count all N list members, or would it be optimized to short circuit after finding 5 members?

Theoretically the possibility exists if `size()` was inlined, but to perform the optimization the compiler would have to

1. Detect that you are testing specifically a "less than" condition
2. Prove that the loop (assume one exists for the purposes of this discussion) results in a variable increasing monotonically
3. Prove that there are no observable side effects from the loop body

That's a big bunch of things to count on IMHO, and it includes features which are not "obviously useful" in other contexts as well. Keep in mind that compiler vendors have limited resources so there has to be really good justification for implementing these prerequisites and having the compiler bring all the parts together to optimize this case.

Seeing as even if this is a perf issue for someone the problem can be easily solved in code, I don't feel that there would be such justification. So no, generally you should not expect cases like this to be optimized.

## How to write your own code generator backend for gcc?

I have created my very own (very simple) byte code language, and a virtual machine to execute it. It works fine, but now I'd like to use gcc (or any other freely available compiler) to generate byte code for this machine from a normal c program. So the question is, how do I modify or extend gcc so that it can output my own byte code? Note that I do NOT want to compile my byte code to machine code, I want to "compile" c-code to (my own) byte code.

I realize that this is a potentially large question, and it is possible that the best answer is "go look at the gcc source code". I just need some help with how to get started with this. I figure that there must be some articles or books on this subject that could describe the process to add a custom generator to gcc, but I haven't found anything by googling.

It is hard work.

For example I also design my own "architecture" with my own byte code and wanted to generate C/C++ code with GCC for it. This is the way how I make it:

1. At first you should read everything about porting in the manual of GCC.
2. Also not forget too read GCC Internals.
4. Also look at this question and the answers here.
7. Be sure to have a very good cafe machine... you will need it.
8. Start to add machine dependet files to gcc.
9. Compile gcc in a cross host-target way.
10. Check the code results in the Hex-Editor.
11. Do more tests.
12. Now have fun with your own architecture :D

When you are finished you can use c or c++ only without os-dependet libraries (you have currently no running OS on your architecture) and you should now (if you need it) compile many other libraries with your cross compiler to have a good framework.

PS: LLVM (Clang) is easier to port... maybe you want to start there?

## What would be an ideal buffer size?

Possible Duplicate:
How do you determine the ideal buffer size when using FileInputStream?

When reading raw data from a file (or any input stream) using either the C++'s `istream` family's `read()` or C's `fread()`, a buffer has to be supplied, and a number of how much data to read. Most programs I have seen seem to arbitrarily chose a power of 2 between 512 and 4096.

1. Is there a reason it has to/should be a power of 2, or this just programer's natural inclination to powers of 2?
2. What would be the "ideal" number? By "ideal" I mean that it would be the fastest. I assume it would have to be a multiple of the underlying device's buffer size? Or maybe of the underlying stream object's buffer? How would I determine what the size of those buffers is, anyway? And once I do, would using a multiple of it give any speed increase over just using the exact size?

EDIT
Most answers seem to be that it can't be determined at compile time. I am fine with finding it at runtime.

Optimum buffer size is related to a number of things: file system block size, CPU cache size and cache latency.

Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.

This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads.

Ensuring this also typically results in other performance friendly parameters affecting both reading and subsequent processing: data bus width alignment, DMA alignment, memory cache line alignment, whole number of virtual memory pages.

## Initializing circular data in C. Is this valid C code according to any standard?

I wanted to see if I could initialize a global variable to point to itself:

``````#include <stdio.h>
struct foo { struct foo *a, *b; } x = { &x, &x };
int main()
{
printf("&x = %p, x.a = %p, x.b = %p\n", &x, x.a, x.b);
return 0;
}
``````

This code compiles and runs as expected with `gcc` (all three pointers print identically).

I want to know:

1. Is this reliable?
2. Is this standard?
3. Is this portable?

EDIT: Just to clarify, I am questioning the availability of the address of `x` in its own initializer.

This is standard C code.

This paragraph of the mighty Standard permits it (emphasis mine):

(C99, 6.2.1p7) "Structure, union, and enumeration tags have scope that begins just after the appearance of the tag in a type specifier that declares the tag. Each enumeration constant has scope that begins just after the appearance of its defining enumerator in an enumerator list. Any other identifier has scope that begins just after the completion of its declarator."

For information, note that to illustrate the last sentence of 6.2.1p7, the book "The New C Standard" by Derek M. Jones uses an example similar to yours:

``````struct T {struct T *m;} x = /* declarator complete here. */ {&x};
``````

## "int" really required to be at least as large as "short" in C?

I've read a couple of times in different sources (e.g. Wikipedia: http://en.wikipedia.org/wiki/C_variable_types_and_declarations#Size), that in C, a long long is not smaller than a long, which is not smaller than an int, which is not smaller than a short.

However, I've looked this up in the C90 and C99 standards, and haven't found a corresponding clause. I've found only that C90 and C99 specifiy the minimum type sizes (Section 5.2.4.2.1 in C90 and C99 standards), but not their sizes in relation to each other. Have I missed something in the standards?

6.3.1.1 defines the relative integer conversion ranks of any two integer types. This is an abstract concept that's meant only to define the relationship between two types; there is no value defined as the rank of any type.

6.2.5p8 says:

For any two integer types with the same signedness and different integer conversion rank (see 6.3.1.1), the range of values of the type with smaller integer conversion rank is a subrange of the values of the other type.

It doesn't say anything about their relative sizes, and in fact it's theoretically possible for a conforming (but deliberately perverse) implementation to have `sizeof (short) > sizeof (int)`. This is possible only if `short` has more padding bits (bits that don't contribute to the value) than `int`. This is very unlikely; most implementations don't have padding bits at all, and I know of no implementations where the relationships of the ranges of integer types differ from the relationships of their sizes.

Reference: either N1256, the latest C99 draft, or N1570, the latest C2011 draft.

## safe C programming

I've noticed that my C compiler (gcc) will let me do stuff like:

``````#include <stdio.h>
main(){
short m[32768];
short y = -1;
short z = -1;
printf("%u\n", y);
m[y] = 12;
printf("%d\n%d\n", y, m[z]);
}
``````

When I run it it spits out:

``````4294967295
12
12
``````

Which seems a little baffling to me.

First of all, is it safe for me to run programs like this? Is there any chance I might accidentally write over the operating system (I'm running OS X in case it's relevant)?

Also, I had expected at least some kind of segfault error like I have encountered in the past, but quietly ignoring an error like this really scares me. How come this program doesn't segfault on me?

And finally, out of curiosity (this might be the silliest question), is there a method to the madness? Can I expect all ANSI C compilers to work this way? How about gcc on different platforms? Is the layout of memory well defined that it is exploitable (perhaps if you were out to write cross-platform obfuscated code)?

The C language defines the behavior of certain programs as "undefined". They can do anything. We'll call such programs erroneous.

One of them is a program that accesses outside the declared/allocated bounds of an array, which your program very carefully does.

You program is erroneous; the thing your erroneous program happens to do is what you see :-} It could "overwrite the OS"; as a practical matter, most modern OSes prevent you from doing that, but you can overwrite critical values in your process space, and your process could crash, die or hang.

The simple response is, "don't write erroneous programs". Then the behavior you see will make "C" sense.

In this particular case, with your particular compiler, the array indexing "sort of" works: you index outside the array and it picks up some value. The space allocated to m is in the stack frame; m[0] is at some location in the stack frame and so is "m[-1]" based on machine arithmetic combining the array address and the index, so a segfault does not occur and a memory location is accessed. This lets the compiled program read and write that memory location ... as an erroneous program. Basically, compiled C programs don't check to see if your array access is out of bounds.

Our CheckPointer tool when applied to this program will tell you the array index is illegal at execution time. So, you can either eyeball the program yourself to see if you've made a mistake, or let CheckPointer tell you when you make a mistake. I strongly suggest you do the eyeballing in any case.

## Is it possible to call a non-exported function that resides in an exe?

I'd like to call a function that resides in a 3rd-party .exe and obtain its result. It seems like there should be a way, as long as I know the function address, calling-convention, etc... but I don't know how.

Does anyone know how I would do this?

I realize that any solution would be a non-standard hack, but there must be a way!

My non-nefarious use-case: I'm reverse engineering a file-format for my software. The calculations in this function are too complex for my tiny brain to figure out; I've been able to pull the assembly-code directly into my own DLL for testing, but of course I can't release that, as that would be stealing. I will be assuming users already have this particular application pre-installed so my software will run.

OK, I've put together a prototype.

This program creates another instance of itself as a debugged child process.

An automatic breakpoint will be encountered before main() and CRT initialization code. This is when we can change the memory and registers of the debugged process to make it execute a function of interest. And that's what the program does.

It tries to catch and handle all the bad situations (e.g. unexpected exceptions) and reports them as errors.

One bad situation is actually a good one. It's the #UD exception from the UD2 instruction that the program places into the debugged process. It uses this #UD to stop the process execution after the function of interest has returned.

A few more notes:

1. This code is 32-bit only. I didn't even try to make it 64-bit compilable or support 64-bit child processes.

2. This code will likely leak handles. See the Windows Debug API function descriptions on MSDN to find out where they need to be closed.

3. This code is a proof of concept only and does not support passing and returning data via pointers or registers other than EAX, ECX and EDX. You'll have to extend it as necessary.

4. This code requires some privileges in order to be able to create and fully debug a process. You may have to worry about this if your program's users aren't admins.

Enjoy.

Code:

``````// file: unexported.c
//
// compile with Open Watcom C/C++: wcl386 /q /wx /we /s unexported.c
//   (Note: "/s" is needed to avoid stack check calls from the "unexported"
//    functions, these calls are through a pointer, and it'll be
//    uninitialized in our case.)
//
// compile with MinGW gcc 4.6.2: gcc unexported.c -o unexported.exe
#include <windows.h>
#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <limits.h>

#ifndef C_ASSERT
#define C_ASSERT(expr) extern char CAssertExtern[(expr)?1:-1]
#endif

// Compile as a 32-bit app only.
C_ASSERT(sizeof(void*) * CHAR_BIT == 32);

#define EXC_CODE_AND_NAME(X) { X, #X }

const struct
{
DWORD Code;
PCSTR Name;
} ExcCodesAndNames[] =
{
EXC_CODE_AND_NAME(EXCEPTION_ACCESS_VIOLATION),
EXC_CODE_AND_NAME(EXCEPTION_ARRAY_BOUNDS_EXCEEDED),
EXC_CODE_AND_NAME(EXCEPTION_BREAKPOINT),
EXC_CODE_AND_NAME(EXCEPTION_DATATYPE_MISALIGNMENT),
EXC_CODE_AND_NAME(EXCEPTION_FLT_DENORMAL_OPERAND),
EXC_CODE_AND_NAME(EXCEPTION_FLT_DIVIDE_BY_ZERO),
EXC_CODE_AND_NAME(EXCEPTION_FLT_INEXACT_RESULT),
EXC_CODE_AND_NAME(EXCEPTION_FLT_INVALID_OPERATION),
EXC_CODE_AND_NAME(EXCEPTION_FLT_OVERFLOW),
EXC_CODE_AND_NAME(EXCEPTION_FLT_STACK_CHECK),
EXC_CODE_AND_NAME(EXCEPTION_FLT_UNDERFLOW),
EXC_CODE_AND_NAME(EXCEPTION_ILLEGAL_INSTRUCTION),
EXC_CODE_AND_NAME(EXCEPTION_IN_PAGE_ERROR),
EXC_CODE_AND_NAME(EXCEPTION_INT_DIVIDE_BY_ZERO),
EXC_CODE_AND_NAME(EXCEPTION_INT_OVERFLOW),
EXC_CODE_AND_NAME(EXCEPTION_INVALID_DISPOSITION),
EXC_CODE_AND_NAME(EXCEPTION_NONCONTINUABLE_EXCEPTION),
EXC_CODE_AND_NAME(EXCEPTION_PRIV_INSTRUCTION),
EXC_CODE_AND_NAME(EXCEPTION_SINGLE_STEP),
EXC_CODE_AND_NAME(EXCEPTION_STACK_OVERFLOW),
EXC_CODE_AND_NAME(EXCEPTION_GUARD_PAGE),
EXC_CODE_AND_NAME(DBG_CONTROL_C),
{ 0xE06D7363, "C++ EH exception" }
};

PCSTR GetExceptionName(DWORD code)
{
DWORD i;

for (i = 0; i < sizeof(ExcCodesAndNames) / sizeof(ExcCodesAndNames[0]); i++)
{
if (ExcCodesAndNames[i].Code == code)
{
return ExcCodesAndNames[i].Name;
}
}

return "?";
}

typedef enum tCallConv
{
CallConvCdecl,    // Params on stack; caller removes params
CallConvStdCall,  // Params on stack; callee removes params
CallConvFastCall  // Params in ECX, EDX and on stack; callee removes params
} tCallConv;

DWORD Execute32bitFunctionFromExe(PCSTR ExeName,
tCallConv CallConvention,
DWORD CodeDataStackSize,
ULONG64* ResultEdxEax,
DWORD DwordParamsCount,
.../* DWORD params */)
{
STARTUPINFO startupInfo;
PROCESS_INFORMATION processInfo;
DWORD dwContinueStatus = DBG_CONTINUE; // exception continuation
DEBUG_EVENT dbgEvt;
UCHAR* procMem = NULL;
DWORD breakPointCount = 0;
DWORD err = ERROR_SUCCESS;
DWORD ecxEdxParams[2] = { 0, 0 };
DWORD imageBase = 0;
CONTEXT ctx;
va_list ap;

va_start(ap, DwordParamsCount);

*ResultEdxEax = 0;

memset(&startupInfo, 0, sizeof(startupInfo));
startupInfo.cb = sizeof(startupInfo);
memset(&processInfo, 0, sizeof(processInfo));

if (!CreateProcess(
NULL,
(LPSTR)ExeName,
NULL,
NULL,
FALSE,
DEBUG_ONLY_THIS_PROCESS, // DEBUG_PROCESS,
NULL,
NULL,
&startupInfo,
&processInfo))
{
printf("CreateProcess() failed with error 0x%08X\n",
err = GetLastError());
goto Cleanup;
}

printf("Process 0x%08X (0x%08X) \"%s\" created,\n"
processInfo.dwProcessId,
processInfo.hProcess,
ExeName,

procMem = VirtualAllocEx(
processInfo.hProcess,
NULL,
CodeDataStackSize,
MEM_COMMIT | MEM_RESERVE,

if (procMem == NULL)
{
printf("VirtualAllocEx() failed with error 0x%08X\n",
err = GetLastError());
goto Cleanup;
}

printf("Allocated RWX memory in process 0x%08X (0x%08X) "
processInfo.dwProcessId,
processInfo.hProcess,
procMem);

while (dwContinueStatus)
{
// Wait for a debugging event to occur. The second parameter indicates
// that the function does not return until a debugging event occurs.
if (!WaitForDebugEvent(&dbgEvt, INFINITE))
{
printf("WaitForDebugEvent() failed with error 0x%08X\n",
err = GetLastError());
goto Cleanup;
}

// Process the debugging event code.
switch (dbgEvt.dwDebugEventCode)
{
case EXCEPTION_DEBUG_EVENT:
// Process the exception code. When handling
// exceptions, remember to set the continuation
// status parameter (dwContinueStatus). This value
// is used by the ContinueDebugEvent function.

printf("%s (%s) Exception in process 0x%08X, thread 0x%08X\n"
"  Exc. Code = 0x%08X (%s), Instr. Address = 0x%08X",
dbgEvt.u.Exception.dwFirstChance ?
"First Chance" : "Last Chance",
dbgEvt.u.Exception.ExceptionRecord.ExceptionFlags ?
"non-continuable" : "continuable",
dbgEvt.dwProcessId,
dbgEvt.u.Exception.ExceptionRecord.ExceptionCode,
GetExceptionName(dbgEvt.u.Exception.ExceptionRecord.ExceptionCode),

if (dbgEvt.u.Exception.ExceptionRecord.ExceptionCode ==
EXCEPTION_ACCESS_VIOLATION)
{
ULONG_PTR* info = dbgEvt.u.Exception.ExceptionRecord.ExceptionInformation;
printf(",\n  Access Address = 0x%08X, Access = 0x%08X (%s)",
(DWORD)info[1],
(DWORD)info[0],
(info[0] == 0) ?
"read" : ((info[0] == 1) ? "write" : "execute")); // 8 = DEP
}

printf("\n");

// Get the thread context (register state).
// We'll need to either display it (in case of unexpected exceptions) or
// modify it (to execute our code) or read it (to get the results of
// execution).
memset(&ctx, 0, sizeof(ctx));
ctx.ContextFlags = CONTEXT_INTEGER | CONTEXT_CONTROL;
{
err = GetLastError());
goto Cleanup;
}

#if 0
printf("  EAX=0x%08X EBX=0x%08X ECX=0x%08X EDX=0x%08X EFLAGS=0x%08X\n"
"  ESI=0x%08X EDI=0x%08X EBP=0x%08X ESP=0x%08X EIP=0x%08X\n",
ctx.Eax, ctx.Ebx, ctx.Ecx, ctx.Edx, ctx.EFlags,
ctx.Esi, ctx.Edi, ctx.Ebp, ctx.Esp, ctx.Eip);
#endif

if (dbgEvt.u.Exception.ExceptionRecord.ExceptionCode == EXCEPTION_BREAKPOINT &&
breakPointCount == 0)
{
// Update the context so our code can be executed
DWORD mem, i, data;
SIZE_T numberOfBytesCopied;

mem = (DWORD)procMem + CodeDataStackSize;

// Child process memory layout (inside the procMem[] buffer):
//
//    higher
//      .
//      .     UD2 instruction (causes #UD, indicator of successful
//      .
//      .     last on-stack parameter for FunctionAddress()
//      .     ...
//      .     first on-stack parameter for FunctionAddress()
//      .
//      .     executed just before it and is going to return to UD2)
//      .     (ESP will point here)
//      .
//      .
//    lower

mem -= 2;
data = 0x0B0F; // 0x0F, 0x0B = UD2 instruction
if (!WriteProcessMemory(processInfo.hProcess,
(PVOID)mem,
&data,
2,
&numberOfBytesCopied))
{
ErrWriteMem1:
printf("WriteProcessMemory() failed with error 0x%08X\n",
err = GetLastError());
goto Cleanup;
}
else if (numberOfBytesCopied != 2)
{
ErrWriteMem2:
printf("WriteProcessMemory() failed with error 0x%08X\n",
goto Cleanup;
}

// Copy function parameters.

mem &= 0xFFFFFFFC; // align the address for the stack

for (i = 0; i < DwordParamsCount; i++)
{
if (CallConvention == CallConvFastCall && i < 2)
{
ecxEdxParams[i] = va_arg(ap, DWORD);
}
else
{
data = va_arg(ap, DWORD);
if (!WriteProcessMemory(processInfo.hProcess,
(DWORD*)mem - DwordParamsCount + i,
&data,
sizeof(data),
&numberOfBytesCopied))
{
goto ErrWriteMem1;
}
else if (numberOfBytesCopied != sizeof(data))
{
goto ErrWriteMem2;
}
}
}

// Adjust what will become ESP according to the number of on-stack parameters.
for (i = 0; i < DwordParamsCount; i++)
{
if (CallConvention != CallConvFastCall || i >= 2)
{
mem -= 4;
}
}

// Store the function return address.
mem -= 4;
data = (DWORD)procMem + CodeDataStackSize - 2; // address of UD2
if (!WriteProcessMemory(processInfo.hProcess,
(PVOID)mem,
&data,
sizeof(data),
&numberOfBytesCopied))
{
goto ErrWriteMem1;
}
else if (numberOfBytesCopied != sizeof(data))
{
goto ErrWriteMem2;
}

// Last-minute preparations for execution...
// Set up the registers (ECX, EDX, EFLAGS, EIP, ESP).

if (CallConvention == CallConvFastCall)
{
if (DwordParamsCount >= 1) ctx.Ecx = ecxEdxParams[0];
if (DwordParamsCount >= 2) ctx.Edx = ecxEdxParams[1];
}

ctx.EFlags &= ~(1 << 10); // clear DF for string instructions
ctx.Esp = mem;

{
err = GetLastError());
goto Cleanup;
}

printf("Copied code/data to the process\n");

#if 0
for (i = esp; i < (DWORD)procMem + CodeDataStackSize; i++)
{
data = 0;
(void*)i,
&data,
1,
&numberOfBytesCopied);
printf("E[SI]P = 0x%08X: 0x%02X\n", i, data);
}
#endif

breakPointCount++;
dwContinueStatus = DBG_CONTINUE; // continue execution of our code
}
else if (dbgEvt.u.Exception.ExceptionRecord.ExceptionCode == EXCEPTION_ILLEGAL_INSTRUCTION &&
breakPointCount == 1 &&
ctx.Eip == (DWORD)procMem + CodeDataStackSize - 2/*UD2 size*/)
{
// The code has finished execution as expected.
// Collect the results.

*ResultEdxEax = ((ULONG64)ctx.Edx << 32) | ctx.Eax;

printf("Copied code/data from the process\n");

dwContinueStatus = 0; // stop debugging
}
else
{
// Unexpected event. Do not continue execution.

printf("  EAX=0x%08X EBX=0x%08X ECX=0x%08X EDX=0x%08X EFLAGS=0x%08X\n"
"  ESI=0x%08X EDI=0x%08X EBP=0x%08X ESP=0x%08X EIP=0x%08X\n",
ctx.Eax, ctx.Ebx, ctx.Ecx, ctx.Edx, ctx.EFlags,
ctx.Esi, ctx.Edi, ctx.Ebp, ctx.Esp, ctx.Eip);

err = dbgEvt.u.Exception.ExceptionRecord.ExceptionCode;
goto Cleanup;
}
break; // case EXCEPTION_DEBUG_EVENT:

case CREATE_PROCESS_DEBUG_EVENT:
// As needed, examine or change the registers of the
// process's virtual memory with the ReadProcessMemory and
// WriteProcessMemory functions; and suspend and resume
// functions. Be sure to close the handle to the process image
// file with CloseHandle.
printf("Process 0x%08X (0x%08X) "
"created, base = 0x%08X,\n"
"  Thread 0x%08X (0x%08X) created, start = 0x%08X\n",
dbgEvt.dwProcessId,
dbgEvt.u.CreateProcessInfo.hProcess,
dbgEvt.u.CreateProcessInfo.lpBaseOfImage,
// Found image base!
imageBase = (DWORD)dbgEvt.u.CreateProcessInfo.lpBaseOfImage;
dwContinueStatus = DBG_CONTINUE;
break;

case EXIT_PROCESS_DEBUG_EVENT:
// Display the process's exit code.
printf("Process 0x%08X exited, exit code = 0x%08X\n",
dbgEvt.dwProcessId,
dbgEvt.u.ExitProcess.dwExitCode);
// Unexpected event. Do not continue execution.
err = ERROR_PROC_NOT_FOUND;
goto Cleanup;

case OUTPUT_DEBUG_STRING_EVENT:
dwContinueStatus = DBG_CONTINUE;
break;

case RIP_EVENT:
printf("RIP: Error = 0x%08X, Type = 0x%08X\n",
dbgEvt.u.RipInfo.dwError,
dbgEvt.u.RipInfo.dwType);
// Unexpected event. Do not continue execution.
err = dbgEvt.u.RipInfo.dwError;
goto Cleanup;
} // end of switch (dbgEvt.dwDebugEventCode)

// Resume executing the thread that reported the debugging event.
if (dwContinueStatus)
{
if (!ContinueDebugEvent(dbgEvt.dwProcessId,
dwContinueStatus))
{
printf("ContinueDebugEvent() failed with error 0x%08X\n",
err = GetLastError());
goto Cleanup;
}
}
} // end of while (dwContinueStatus)

err = ERROR_SUCCESS;

Cleanup:

if (processInfo.hProcess != NULL)
{
if (procMem != NULL)
{
VirtualFreeEx(processInfo.hProcess, procMem, 0, MEM_RELEASE);
}
TerminateProcess(processInfo.hProcess, 0);
CloseHandle(processInfo.hProcess);
}

va_end(ap);

return err;
}

int __cdecl FunctionCdecl(int x, int y, int z)
{
return x + y + z;
}

int __stdcall FunctionStdCall(int x, int y, int z)
{
return x * y * z;
}

ULONG64 __fastcall FunctionFastCall(DWORD x, DWORD y, DWORD z)
{
return (ULONG64)x * y + z;
}

int main(int argc, char** argv)
{
DWORD err;
ULONG64 resultEdxEax;

err = Execute32bitFunctionFromExe(argv[0]/*ExeName*/,
(DWORD)&FunctionCdecl -
(DWORD)GetModuleHandle(NULL),
CallConvCdecl,
4096/*CodeDataStackSize*/,
&resultEdxEax,
3/*DwordParamsCount*/,
2, 3, 4);
if (err == ERROR_SUCCESS)
printf("2 + 3 + 4 = %d\n", (int)resultEdxEax);

err = Execute32bitFunctionFromExe(argv[0]/*ExeName*/,
(DWORD)&FunctionStdCall -
(DWORD)GetModuleHandle(NULL),
CallConvStdCall,
4096/*CodeDataStackSize*/,
&resultEdxEax,
3/*DwordParamsCount*/,
-2, 3, 4);
if (err == ERROR_SUCCESS)
printf("-2 * 3 * 4 = %d\n", (int)resultEdxEax);

err = Execute32bitFunctionFromExe(argv[0]/*ExeName*/,
(DWORD)&FunctionFastCall -
(DWORD)GetModuleHandle(NULL),
CallConvFastCall,
4096/*CodeDataStackSize*/,
&resultEdxEax,
3/*DwordParamsCount*/,
-1, -1, -1);
if (err == ERROR_SUCCESS)
printf("0xFFFFFFFF * 0xFFFFFFFF + 0xFFFFFFFF = 0x%llX\n",
(unsigned long long)resultEdxEax);

return 0;
}
``````

Output:

``````Process 0x00001514 (0x00000040) "C:\MinGW\msys\1.0\home\Alex\unexported.exe" cre
ated,
Allocated RWX memory in process 0x00001514 (0x00000040) at address 0x002B0000
Process 0x00001514 (0x00000044) created, base = 0x00400000,
Thread 0x00000CB0 (0x00000048) created, start = 0x0040126C
First Chance (continuable) Exception in process 0x00001514, thread 0x00000CB0
Exc. Code = 0x80000003 (EXCEPTION_BREAKPOINT), Instr. Address = 0x77090FAB
Copied code/data to the process
First Chance (continuable) Exception in process 0x00001514, thread 0x00000CB0
Exc. Code = 0xC000001D (EXCEPTION_ILLEGAL_INSTRUCTION), Instr. Address = 0x002
B0FFE
Copied code/data from the process
2 + 3 + 4 = 9
Process 0x00001828 (0x0000003C) "C:\MinGW\msys\1.0\home\Alex\unexported.exe" cre
ated,
Allocated RWX memory in process 0x00001828 (0x0000003C) at address 0x002B0000
Process 0x00001828 (0x0000006C) created, base = 0x00400000,
Thread 0x00001690 (0x00000074) created, start = 0x0040126C
First Chance (continuable) Exception in process 0x00001828, thread 0x00001690
Exc. Code = 0x80000003 (EXCEPTION_BREAKPOINT), Instr. Address = 0x77090FAB
Copied code/data to the process
First Chance (continuable) Exception in process 0x00001828, thread 0x00001690
Exc. Code = 0xC000001D (EXCEPTION_ILLEGAL_INSTRUCTION), Instr. Address = 0x002
B0FFE
Copied code/data from the process
-2 * 3 * 4 = -24
Process 0x00001388 (0x00000040) "C:\MinGW\msys\1.0\home\Alex\unexported.exe" cre
ated,
Allocated RWX memory in process 0x00001388 (0x00000040) at address 0x002B0000
Process 0x00001388 (0x0000008C) created, base = 0x00400000,
Thread 0x00001098 (0x00000090) created, start = 0x0040126C
First Chance (continuable) Exception in process 0x00001388, thread 0x00001098
Exc. Code = 0x80000003 (EXCEPTION_BREAKPOINT), Instr. Address = 0x77090FAB
Copied code/data to the process
First Chance (continuable) Exception in process 0x00001388, thread 0x00001098
Exc. Code = 0xC000001D (EXCEPTION_ILLEGAL_INSTRUCTION), Instr. Address = 0x002
B0FFE
Copied code/data from the process
0xFFFFFFFF * 0xFFFFFFFF + 0xFFFFFFFF = 0xFFFFFFFF00000000
``````

## Benefits of compiling C code with gcc's C++ front-end

I am very interrogative and perplexed by this commit on android's dalvik platform pushed a year ago.

File extensions were changed to C++ extensions in order to "move the interpreter into C++" - use the compiler's C++ front-end.

What could be the benefits of this change ? Dalvik Platform is a 100% C & asm project and not any C++ feature is used.

I can only speculate, but considering how the Android system has grown in complexity, the scoping features of C++ (classes and namespaces) might make the code base more manageable.

EDIT

Even if the project doesn't currently make use of any C++ features, they may simply be planning ahead.

Apart from some minor differences (namely some parameter conventions most people avoid anyway), C source code compiles as C++ without modification. That being said, in some areas C++ syntax is stricter than C (C allows you to assign a void pointer to another pointer type without a cast; in C++, this is an error), and enforcing this strictness avoids problems down the road. *

*) (That's an overly simplistic view, see comment)

One further reason for the change may be that because most modern development favors C++ over C, a richer set of tools is available.

Speculating again, but at the birth of Android C may have been the only viable option for embedded device development, and now that restriction is no longer an issue.

## How does PHP handle variables?

I'm a PHP developer since many years but I don't know just one detail of how PHP handles variables and their types behind the scenes. I mean: in PHP - in theory - I could use the same variable to store an integer, and then a string, and then a boolean, and then an array... etc...

Personally, I loathe this way of "poorly-casted" programming, but I'm wondering how can PHP store and manage the variables and their types as I asked. I imagine the interpreter creates and handles C variables in behind, but I can't figure out how.

Thank you.

Behind the scenes, PHP variables are stored in a "zval" structure, which consists of a union between all of the types of data which the variable could store (e.g, a long, a double, a string pointer/length, an object pointer...), and a couple of other fields outside the union which indicate which type it is and keep track of a reference count.

There's some further discussion of this at:

http://devzone.zend.com/317/extension-writing-part-ii-parameters-arrays-and-zvals/

## How can I replay a multithreaded application?

I want to record synchronization operations, such as locks, sempahores, barriers of a multithreaded application, so that I can replay the recorded application later on, for the purpose of debugging.

On way is to supply your own lock, sempaphore, condition variables etc.. functions that also do logging, but I think that is an overkill, because down under they must be using some common synchronization operations.

So my question is which synchronization operations I should log so that I require minimum modifications to my program. In other words, which are the functions or macros in glibc and system calls over which all these synchronization operations are built? So that I only modify those for logging and replaying.

The best I can think of is debugging with gdb in 'record' mode:

According to this page: GDB Process Record threading support is underway, but it might not be complete yet.

On other platforms, several other threading checkers exist, but I haven't got much experience with them.

## Which function in glibc calls the main function

I am trying to understand how Linux launches a program. I read somewhere that some function in glibc calls the main function. Profiling with callgrind and looking at the call-graphs in Kcachegrind, I see `below main` which calls main. But I don't understand this, a function can't be named such. So my question is which function in the glibc actually starts the main function.

Following valgrind's own help you'll find this explanation for the option --show-below-main:

By default, stack traces for errors do not show any functions that appear beneath main because most of the time it's uninteresting C library stuff and/or gobbledygook. Alternatively, if main is not present in the stack trace, stack traces will not show any functions below main-like functions such as glibc's __libc_start_main. Furthermore, if main-like functions are present in the trace, they are normalised as (below main), in order to make the output more deterministic.

As such, below main is not the function which calls main itself, but __libc_start_main.

## XOPEN_SOURCE and signal handling

In the following program, if I uncomment `_XOPEN_SOURCE` line, my program terminates when I hit `C-c`, same program doesn't terminate If I don't comment that line. Anyone knows in what ways does `_XOPEN_SOURCE` affect signal handling? I am on linux with gcc (4.6.3) and glibc (2.15).

``````/* #define _XOPEN_SOURCE 700 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <signal.h>

typedef void (*sighandler_t)(int);

void handle_signal(int signo)
{
printf("\n[MY_SHELL] ");
fflush(stdout);
}

int main()
{
int c;
signal(SIGINT, SIG_IGN);
signal(SIGINT, handle_signal);
printf("[MY_SHELL] ");
while ((c = getchar()) != EOF) {
if (c == '\n')
printf("[MY_SHELL] ");
}
printf("\n");
return 0;
}
``````

The problem is that the `signal()` function can have two different forms of behaviour when installing a signal handling function:

• System V semantics, where the signal handler is "one-shot" - that is, after the signal handling function is called, the signal's disposition is reset to `SIG_DFL` - and system calls that are interrupted by the signal are not restarted; or
• BSD semantics, where the signal handler is not reset when the signal fires, the signal is blocked whilst the signal handler is executing, and most interrupted system calls are automatically restarted.

On Linux with glibc, you get the BSD semantics if `_BSD_SOURCE` is defined, and the System V semantics if it is not. The `_BSD_SOURCE` macro is defined by default, but this default definition is suppressed if you define `_XOPEN_SOURCE` (or a few other macros too, like `_POSIX_SOURCE` and `_SVID_SOURCE`).

Under System V semantics, if the `read()` system call underlying `getchar()` is interrupted by `SIGINT` then `getchar()` will return `EOF` with `errno` set to `EINTR` (this will cause your program to exit normally). In addition, after the first `SIGINT` the disposition of this signal is reset to the default, and the default action for `SIGINT` is to terminate the process (so even if your program survived the first `SIGINT`, the second would cause it to exit abnormally).

The solution is not to use `signal()` at all for installing signal-handling functions; instead, you should use `sigaction()`, which is portable - it gives the same semantics everywhere. With `sa_flags` set to `SA_RESTART`, `sigaction()` it will give the BSD semantics, which is what you want.