Best c++ questions in May 2012

Which is better option to use for dividing an integer number by 2?

151 votes

Which of the following techniques is the best option for dividing an integer by 2 and why?

Technique 1:

x = x >> 1;

Technique 2:

x = x / 2;

Here x is an integer.

Use the operation that best describes what you are trying to do.

  • If you are treating the number as a sequence of bits, use bitshift.
  • If you are treating it as a numerical value, use division.

Note that they are not exactly equivalent. They can give different results for negative integers. For example:

-5 / 2  = -2
-5 >> 1 = -3

(ideone)

What is the correct answer for cout << c++ << c;?

Asked on Mon, 28 May 2012 by pravs c++
65 votes

Recently in an interview there was a following objective type question.

int c = 0;
cout << c++ << c;

Answers:

a. 10
b. 01
c. undefined behavior

I answered choice b, i.e. output would be "01".

But to my surprise later I was told by an interviewer that the correct answer is option c: undefined.

Now, I do know the concept of sequence points in C++. The behavior is undefined for the following statement:

int i = 0;
i += i++ + i++;

but as per my understanding for the statement cout << c++ << c , the ostream.operator<<() would be called twice, first with ostream.operator<<(c++) and later ostream.operator<<(c).

I also checked the result on VS2010 compiler and its output is also '01'.

You can think of:

cout<<c++<<c;

As:

std::operator<<(std::operator<<(std::cout, c++), c);

C++ guarantees that all side effects of previous evaluations will have been performed at sequence points. There are no sequence points in between function arguments evaluation which means that argument c can be evaluated before argument std::operator<<(std::cout, c++) or after. So the result of the above is undefined.

Is this a known pitfall of C++11 for loops?

64 votes

Let's imagine we have a struct for holding 3 doubles with some member functions:

struct Vector {
  double x, y, z;
  // ...
  Vector &negate() {
    x = -x; y = -y; z = -z;
    return *this;
  }
  Vector &normalize() {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  // ...
};

This is a little contrived for simplicity, but I'm sure you agree that similar code is out there. The methods allow you to conveniently chain, for example:

Vector v = ...;
v.normalize().negate();

Or even:

Vector v = Vector{1., 2., 3.}.normalize().negate();

Now if we provided begin() and end() functions, we could use our Vector in a new-style for loop, say to loop over the 3 coordinates x, y, and z (you can no doubt construct more "useful" examples by replacing Vector with e.g. String):

Vector v = ...;
for (double x : v) { ... }

We can even do:

Vector v = ...;
for (double x : v.normalize().negate()) { ... }

and also:

for (double x : Vector{1., 2., 3.}) { ... }

However, the following (it seems to me) is broken:

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

While it seems like a logical combination of the previous two usages, I think this last usage creates a dangling reference while the previous two are completely fine.

  • Is this correct and Widely appreciated?
  • Which part of the above is the "bad" part, that should be avoided?
  • Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

Is this correct and Widely appreciated?

Yes, your understanding of things is correct.

Which part of the above is the "bad" part, that should be avoided?

The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:

auto &&t = Vector{1., 2., 3.}.normalize();

The temporary Vector{1., 2., 3.}'s lifetime cannot be extended because the compiler has no idea that the return value from normalize references it.

Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

That would be highly inconsistent with how C++ works.

Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.

A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.

Presently (if the compiler supports it), you can make it so that normalize cannot be called on a temporary:

struct Vector {
  double x, y, z;
  // ...
  Vector &normalize() & {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  Vector &normalize() && = delete;
};

This will cause Vector{1., 2., 3.}.normalize() to give a compile error, while v.normalize() will work fine. Obviously you won't be able to do correct things like this:

Vector t = `Vector{1., 2., 3.}.normalize()`;

But you also won't be able to do incorrect things.

Why would code actively try to prevent tail-call optimization?

58 votes

The title of the question might be a bit strange, but the thing is that, as far as I know, there is nothing that speaks against tail call optimization at all. However, while browsing open source projects, I already came across a few functions that actively try to stop the compiler from doing a tail call optimization, for example the implementation of CFRunLoopRef which is full of such hacks. For example:

static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__() __attribute__((noinline));
static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__(CFRunLoopObserverCallBack func, CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
    if (func) {
        func(observer, activity, info);
    }
    getpid(); // thwart tail-call optimization
}

I would love to know why this is seemingly so important, and are there any cases were I as a normal developer should keep this is mind too? Eg. are there common pitfalls with tail call optimization?

This is only a guess, but maybe to avoid an infinite loop vs bombing out with a stack overflow error.

Since the method in question doesn't put anything on the stack it would seem possible for the tail-call recursion optimization to produce code that would enter an infinite loop as opposed to the non-optimized code which would put the return address on the stack which would eventually overflow in the event of misuse.

The only other thought I have is related to preserving the calls on the stack for debugging and stacktrace printing.

What's the result of += in C and C++?

54 votes

I've got the following code:

#include <stdio.h>
int main(int argc, char **argv) {
    int i = 0;
    (i+=10)+=10;
    printf("i = %d\n", i);
    return 0;
}

If I try to compile it as a C source using gcc I get an error:

error: lvalue required as left operand of assignment

But if I compile it as a C++ source using g++ I get no error and when i run the executable:

i = 20

Why the different behaviour?

Semantics of the add-assign operators is different in C and C++:

C99 standard, 6.5.16, part 3:

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

In C++ 5.17.1:

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue with the type and value of the left operand after the assignment has taken place.

EDIT : The behavior of (i+=10)+=10 in C++ is undefined in C++98, but well defined in C++11. See this answer to the question by aix for the relevant portions of the standards.

Single and double quotes in C/C++

42 votes

I was looking at the question Single quotes vs. double quotes in C. I couldn't completely understand the explanation given so I wrote a program

#include <stdio.h>
int main()
{
  char ch = 'a';
  printf("sizeof(ch) :%d\n", sizeof(ch));
  printf("sizeof(\'a\') :%d\n", sizeof('a'));
  printf("sizeof(\"a\") :%d\n", sizeof("a"));
  printf("sizeof(char) :%d\n", sizeof(char));
  printf("sizeof(int) :%d\n", sizeof(int));
  return 0;
}

I compiled them using both gcc and g++ and these are my outputs

gcc:

sizeof(ch)   : 1  
sizeof('a')  : 4  
sizeof("a")  : 2  
sizeof(char) : 1  
sizeof(int)  : 4  

g++:

sizeof(ch)   : 1  
sizeof('a')  : 1  
sizeof("a")  : 2  
sizeof(char) : 1  
sizeof(int)  : 4  

The g++ output makes sense to me and I don't have any doubt regarding that. In gcc what is the need to have sizeof('a') to be different from sizeof(char). Is there some actual reason behind it or is it just historical?

Also in C if char and 'a' have different size does that mean when we are doing char ch = 'a'; we are doing implicit type-conversion?

In C, character constants such as 'a' have type int, in C++ it's char.

Regarding the last question, yes,

char ch = 'a';

causes an implicit conversion of the int to char.

What are the incompatible differences betweeen C(99) and C++(11)?

36 votes

This question was triggered by replie(s) to a post by Herb Sutter where he explained MS's decision to not support/make a C99 compiler but just go with the C(99) features that are in the C++(11) standard anyway.

One commenter replied:

(...) C is important and deserves at least a little bit of attention.

There is a LOT of existing code out there that is valid C but is not valid C++. That code is not likely to be rewritten (...)

Since I only program in MS C++, I really don't know "pure" C that well, i.e. I have no ready picture of what details of the C++-language I'm using are not in C(99) and I have little clues where some C99 code would not work as-is in a C++ compiler.

Note that I know about the C99 only restrict keyword which to me seems to have very narrow application and about variable-length-arrays (of which I'm not sure how widespread or important they are).

Also, I'm very interested whether there are any important semantic differences or gotchas, that is, C(99) code that will compiler under C++(11) but do something differently with the C++ compiler than with the C compiler.


Quick links: External resources from the answers:

If you start from the common subset of C and C++, sometimes called clean C (which is not quite C90), you have to consider 3 types of incompatibilities:

  1. Additional C++ featues which make legal C illegal C++

    Examples for this are C++ keywords which can be used as identifiers in C or conversions which are implicit in C but require an explicit cast in C++.

    This is probably the main reason why Microsoft still ships a C frontend at all: otherwise, legacy code that doesn't compile as C++ would have to be rewritten.

  2. Additional C features which aren't part of C++

    The C language did not stop evolving after C++ was forked. Some examples are variable-length arrays, designated initializers and restrict. These features can be quite handy, but aren't part of any C++ standard, and some of them will probably never make it in.

  3. Features which are available in both C and C++, but have different semantics

    An example for this would be the linkage of const objects or inline functions.

A list of incompatibilities between C99 and C++98 can be found here (which has already been mentioned by Mat).

While C++11 and C11 got closer on some fronts (variadic macros are now available in C++, variable-length arrays are now an optional C language feature), the list of incompatibilities has grown as well (eg generic selections in C and the auto type-specifier in C++).

As an aside, while Microsoft has taken some heat for the decision to abandon C (which is not a recent one), as far as I know no one in the open source community has actually taken steps to do something about it: It would be quite possible to provide many features of modern C via a C-to-C++ compiler, especially if you consider that some of them are trivial to implement. This is actually possible right now using Comeau C/C++, which does support C99.

However, it's not really a pressing issue: Personally, I'm quite comfortable with using GCC and Clang on Windows, and there are proprietary alternatives to MSVC as well, eg Pelles C or Intel's compiler.

Does new char actually guarantee aligned memory for a class type?

36 votes

Is allocating a buffer via new char[sizeof(T)] guaranteed to allocate memory which is properly aligned for the type T, where all members of T has their natural, implementation defined, alignment (that is, you have not used the alignas keyword to modify their alignment).

I have seen this guarantee made in a few answers around here but I'm not entirely clear how the standard arrives at this guarantee. 5.3.4-10 of the standard gives the basic requirement: essentially new char[] must be aligned to max_align_t.

What I'm missing is the bit which says alignof(T) will always be a valid alignment with a maximum value of max_align_t. I mean, it seems obvious, but must the resulting alignment of a structure be at most max_align_t? Even point 3.11-3 says extended alignments may be supported, so may the compiler decide on its own a class is an over-aligned type?

What I'm missing is the bit which says alignof(T) will always be a valid alignment with a maximum value of max_align_t. I mean, it seems obvious, but must the resulting alignment of a structure be at most max_align_t ? Even point 3.11-3 says extended alignments may be supported, so may the compiler decide on its own a class is an over-aligned type ?

As noted by Mankarse, the best quote I could get is from [basic.align]/3:

A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). —end note ]

which seems to imply that extended alignment must be explicitly required (and then propagates) but cannot

I would have prefer a clearer mention; the intent is obvious for a compiler-writer, and any other behavior would be insane, still...

Why am I observing multiple inheritance to be faster than single?

32 votes

I have the following two files :-

single.cpp :-

#include <iostream>
#include <stdlib.h>

using namespace std;

unsigned long a=0;

class A {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; } 
};

class B : public A {                                                                              
  public:                                                                                                                                                                        
    virtual int f() __attribute__ ((noinline)) { return a; }                                      
    void g() __attribute__ ((noinline)) { return; }                                               
};                                                                                                

int main() {                                                                                      
  cin>>a;                                                                                         
  A* obj;                                                                                         
  if (a>3)                                                                                        
    obj = new B();
  else
    obj = new A();                                                                                

  unsigned long result=0;                                                                         

  for (int i=0; i<65535; i++) {                                                                   
    for (int j=0; j<65535; j++) {                                                                 
      result+=obj->f();                                                                           
    }                                                                                             
  }                                                                                               

  cout<<result<<"\n";                                                                             
}

And

multiple.cpp :-

#include <iostream>
#include <stdlib.h>

using namespace std;

unsigned long a=0;

class A {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; }
};

class dummy {
  public:
    virtual void g() __attribute__ ((noinline)) { return; }
};

class B : public A, public dummy {
  public:
    virtual int f() __attribute__ ((noinline)) { return a; }
    virtual void g() __attribute__ ((noinline)) { return; }
};


int main() {
  cin>>a;
  A* obj;
  if (a>3)
    obj = new B();
  else
    obj = new A();

  unsigned long result=0;

  for (int i=0; i<65535; i++) {
    for (int j=0; j<65535; j++) {
      result+=obj->f();
    }
  }

  cout<<result<<"\n";
}

I am using gcc version 3.4.6 with flags -O2

And this is the timings results I get :-

multiple :-

real    0m8.635s
user    0m8.608s
sys 0m0.003s

single :-

real    0m10.072s
user    0m10.045s
sys 0m0.001s

On the other hand, if in multiple.cpp I invert the order of class derivation thus :-

class B : public dummy, public A {

Then I get the following timings (which is slightly slower than that for single inheritance as one might expect thanks to 'thunk' adjustments to the this pointer that the code would need to do) :-

real    0m11.516s
user    0m11.479s
sys 0m0.002s

Any idea why this may be happening? There doesn't seem to be any difference in the assembly generated for all three cases as far as the loop is concerned. Is there some other place that I need to look at?

Also, I have bound the process to a specific cpu core and I am running it on a real-time priority with SCHED_RR.

EDIT:- This was noticed by Mysticial and reproduced by me. Doing a

cout << "vtable: " << *(void**)obj << endl;

just before the loop in single.cpp leads to single also being as fast as multiple clocking in at 8.4 s just like public A, public dummy.

I think I got at least some further lead on why this may be happening. The assembly for the loops is exactly identical but the object files are not!

For the loop with the cout at first (i.e.)

cout << "vtable: " << *(void**)obj << endl;

for (int i=0; i<65535; i++) {
  for (int j=0; j<65535; j++) {
    result+=obj->f();
  }
}

I get the following in the object file :-

40092d:       bb fe ff 00 00          mov    $0xfffe,%ebx                                       
400932:       48 8b 45 00             mov    0x0(%rbp),%rax                                     
400936:       48 89 ef                mov    %rbp,%rdi                                          
400939:       ff 10                   callq  *(%rax)                                            
40093b:       48 98                   cltq                                                      
40093d:       49 01 c4                add    %rax,%r12                                          
400940:       ff cb                   dec    %ebx                                               
400942:       79 ee                   jns    400932 <main+0x42>                                 
400944:       41 ff c5                inc    %r13d                                              
400947:       41 81 fd fe ff 00 00    cmp    $0xfffe,%r13d                                      
40094e:       7e dd                   jle    40092d <main+0x3d>                                 

However, without the cout, the loops become :- (.cpp first)

for (int i=0; i<65535; i++) {
  for (int j=0; j<65535; j++) {
    result+=obj->f();
  }
}

Now, .obj :-

400a54:       bb fe ff 00 00          mov    $0xfffe,%ebx
400a59:       66                      data16                                                    
400a5a:       66                      data16 
400a5b:       66                      data16                                                    
400a5c:       90                      nop                                                       
400a5d:       66                      data16                                                    
400a5e:       66                      data16                                                    
400a5f:       90                      nop                                                       
400a60:       48 8b 45 00             mov    0x0(%rbp),%rax                                     
400a64:       48 89 ef                mov    %rbp,%rdi                                          
400a67:       ff 10                   callq  *(%rax)
400a69:       48 98                   cltq   
400a6b:       49 01 c4                add    %rax,%r12                                          
400a6e:       ff cb                   dec    %ebx                                               
400a70:       79 ee                   jns    400a60 <main+0x70>                                 
400a72:       41 ff c5                inc    %r13d                                              
400a75:       41 81 fd fe ff 00 00    cmp    $0xfffe,%r13d
400a7c:       7e d6                   jle    400a54 <main+0x64>                          

So I'd have to say it's not really due to false aliasing as Mysticial points out but simply due to these NOPs that the compiler/linker is emitting.

The assembly in both cases is :-

.L30:
        movl    $65534, %ebx
        .p2align 4,,7                   
.L29:
        movq    (%rbp), %rax            
        movq    %rbp, %rdi
        call    *(%rax)
        cltq    
        addq    %rax, %r12                                                                        
        decl    %ebx
        jns     .L29
        incl    %r13d 
        cmpl    $65534, %r13d
        jle     .L30

Now, .p2align 4,,7 will insert data/NOPs until the instruction counter for the next instruction has the last four bits 0's for a maximum of 7 NOPs. Now the address of the instruction just after p2align in the case without cout and before padding would be

0x400a59 = 0b101001011001

And since it takes <=7 NOPs to align the next instruction, it will in fact do so in the object file.

On the other hand, for the case with the cout, the instruction just after .p2align lands up at

0x400932 = 0b100100110010

and it would take > 7 NOPs to pad it to a divisible by 16 boundary. Hence, it doesn't do that.

So the extra time taken is simply due to the NOPs that the compiler pads the code with (for better cache alignment) when compiling with the -O2 flag and not really due to false aliasing.

I think this resolves the issue. I am using http://sourceware.org/binutils/docs/as/P2align.html as my reference for what .p2align actually does.

How can i efficiently select a Standard library container in C++11?

27 votes

There's a well known image (cheat sheet) called "C++ Container choice". It's a flow chart to choose the best container for the wanted usage.

Does anybody know if there's already a C++11 version of it?

This is the previous one: C++ Container choice

Not that I know of, however it can be done textually I guess. Also, the chart is slightly off, because list is not such a good container in general, and neither is forward_list; both lists are very specialized containers for niche applications.

To build such a chart, you just need two simple guidelines:

  • Choose for semantics first
  • When several choices are available, go for the simplest

Worrying about performance is usually useless at first, the big O only really kick in when you start handling a few thousands (or more) of items.

There are two big categories of containers:

  • Associative containers: they have a find operation
  • Simple Sequence containers

and then you can build several adapters on top of them: stack, queue, priority_queue. I will leave the adapters out here, they are sufficiently specialized to be recognizable.


Question 1: Associative ?

  • If you need to easily search by one key, then you need an associative container
  • If you need to have the elements sorted, then you need an ordered associative container
  • Otherwise, jump to the question 2.

Question 1.1: Ordered ?

  • If you do not need a specific order, use an unordered_ container, otherwise use its traditional ordered counterpart.

Question 1.2: Separate Key ?

  • If the key is separate from the value, use a map, otherwise use a set

Question 1.3: Duplicates ?

  • If you want to keep duplicates, use a multi, otherwise do not.

Example:

Suppose that I have several persons with a unique ID associated to them, and I would like to retrieve a person data from its ID as simply as possible.

  1. I want a find function, thus an associative container

    1.1. I could care less about order, thus an unordered_ container

    1.2. My key (ID) is separate from the value it is associated with, thus a map

    1.3. The ID is unique, thus no duplicate should creep in.

The final answer is: std::unordered_map<ID, PersonData>.


Question 2: Memory stable ?

  • If the elements should be stable in memory (ie, they should not move around when the container itself is modified), then use some list
  • Otherwise, jump to question 3.

Question 2.1: Which ?

  • Settle for a list; a forward_list is only useful for lesser memory footprint.

Question 3: Dynamically sized ?

  • If the container has a known size (at compilation time), and this size will not be altered during the course of the program, and the elements are default constructible or you can provide a full initialization list (using the { ... } syntax), then use an array. It replaces the traditional C-array, but with convenient functions.
  • Otherwise, jump to question 4.

Question 4: Double-ended ?

  • If you wish to be able to remove items from both the front and back, then use a deque, otherwise use a vector.

You will note that, by default, unless you need an associative container, your choice will be a vector.

When Should I Really Use `noexcept`?

26 votes

The noexcept keyword can be appropriately applied to many function signatures, but I am unsure as to when I should consider using it in practice. Based on what I have read so far, the last-minute addition of noexcept seems to address some important issues that arise when move constructors throw. However, I am still unable to provide satisfactory answers some practical questions that led me to read more about noexcept in the first place.

  1. There are many examples of functions that I know will never throw, but for which the compiler cannot determine so on its own. Should I append noexcept to the function declaration in all such cases?

    Having to think about whether or not I need to append noexcept after every function declaration would greatly reduce programmer productivity (and frankly, would be a pain in the ass). For which situations should I be more careful about the use of noexcept, and for which situations can I get away with the implied noexcept(false)?

  2. When can I realistically except to observe a performance improvement after using noexcept? In particular, give an example of code for which a C++ compiler is able to generate better machine code after the addition of noexcept.

    Personally, I care about noexcept because the of increased freedom provided to the compiler to safely apply certain kinds of optimizations. Do modern compilers take advantage of noexcept in this way? If not, can I excect some of them to do so in the near future?

I think it is too early to give a "best practices" answer for this as there hasn't been enough time to use it in practice. If this was asked about throw specifiers right after they came out then the answers would be very different to now.

Having to think about whether or not I need to append noexcept after every function declaration would greatly reduce programmer productivity (and frankly, would be a pain in the ass).

Well then use it when it's obvious that the function will never throw.

When can I realistically except to observe a performance improvement after using noexcept? ... Personally, I care about noexcept because the of increased freedom provided to the compiler to safely apply certain kinds of optimizations.

It seems like the biggest optimization gains are from user optimizations, not compiler ones due to possibility of checking noexcept and overloading on it. Most compilers follow a no-penalty-if-you-don't-throw exception handling method so I doubt it would change much (or anything) on the machine code level of your code, although perhaps reduce the binary size by removing the handling code.

Using noexcept in the big 4 (constructors, assignment, not destructors ar they'll already noexcept) will likely cause the best improvements as noexcept checks are 'common' in template code such as in std containers. For instance, std::vector won't use your class's move unless it's marked noexcept (or the compiler can deduce it otherwise).

What is the difference between typedef and using in C++11?

23 votes

I know that in C++11 we can now use using to write type alias, like typedefs:

typedef int MyInt;

Is, from what I understand, equivalent to:

using MyInt = int;

And that new syntax emerged from the effort to have a way to express "template typedef":

template< class T > using MyType = AnotherType< T, MyAllocatorType >;

But, with the first two non-template exemples, are there any other subtile differences in the standard? For example, typedefs does aliasing in "weak" way, that is it don't create a new type but only a new name (conversions are implicit between those names).

Is it the same with using or does it generate a new type?
Are there any differences?

They are equivalent, from the standard (emphasis mine) (7.1.3.2):

A typedef-name can also be introduced by an alias-declaration. The identifier following the using keyword becomes a typedef-name and the optional attribute-specifier-seq following the identifier appertains to that typedef-name. It has the same semantics as if it were introduced by the typedef specifier. In particular, it does not define a new type and it shall not appear in the type-id.

C and C++ : Partial initialization of automatic structure

Asked on Thu, 31 May 2012 by test c++ c
21 votes

For example, if somestruct has three integer members, I had always thought that it was OK to do this in C (or C++) function:

somestruct s = {123,};

The first member would be initialized to 123 and the last two would be initialized to 0. I often do the same thing with automatic arrays, writing int arr[100] = {0,}; so that all integers in an array are initialized to zero.


Recently I read in the GNU C Reference Manual that:

If you do not initialize a structure variable, the effect depends on whether it is has static storage (see Storage Class Specifiers) or not. If it is, members with integral types are initialized with 0 and pointer members are initialized to NULL; otherwise, the value of the structure's members is indeterminate.


Can someone please tell me what the C and C++ standards say regarding partial automatic structure and automatic array initialization? I do the above code in Visual Studio without a problem but I want to be compatible with gcc/g++, and maybe other compilers as well. Thanks

The linked gcc documentation does not talk of Partial Initialization it just talks of (Complete)Initialization or No Initialization.

What is partial Initialization?

Partial Initialization occurs when you provide some initializers but not all i.e: Fewer initializers than the size of the array or the number of structure elements being initialized.

Example:

int array[10] = {1,2};                    //Case 1:Partial Initialization

What is (Complete)Initialization or No Initialization?

Initialization means providing some initial value to the variable being created at the same time when it is being created. ie: in the same code statement.

Example:

int array[10] = {0,1,2,3,4,5,6,7,8,9};    //Case 2:Complete Initialization
int array[10];                            //Case 3:No Initialization

The quoted paragraph describes the behavior for Case 3.

The rules regarding Partial Initialization(Case 1) are well defined by the standard and these rules do not depend on the storage type of the variable being initialized.
AFAIK, All mainstream compilers have 100% compliance to these rules.


Can someone please tell me what the C and C++ standards say regarding partial automatic structure and automatic array initialization?

The C and C++ standards guarantee that even if an integer array is located on automatic storage and if there are fewer initializers in a brace-enclosed list then the uninitialized elements must be initialized to 0.

C99 Standard 6.7.8.21

If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration.


In C++ the rules are stated with a little difference.

C++03 Standard 8.5.1 Aggregates
Para 7:

If there are fewer initializers in the list than there are members in the aggregate, then each member not explicitly initialized shall be value-initialized (8.5). [Example:

 struct S { int a; char* b; int c; };
 S ss = { 1, "asdf" };

initializes ss.a with 1, ss.b with "asdf", and ss.c with the value of an expression of the form int(), that is,0. ]

While Value Initialization is defined in,
C++03 8.5 Initializers
Para 5:

To value-initialize an object of type T means:
— if T is a class type (clause 9) with a user-declared constructor (12.1), then the default constructor for T is called (and the initialization is ill-formed if T has no accessible default constructor);
— if T is a non-union class type without a user-declared constructor, then every non-static data member and base-class component of T is value-initialized;
— if T is an array type, then each element is value-initialized;
— otherwise, the object is zero-initialized

Do C & C++ compilers optimize comparisons with function calls?

20 votes

Do C and C++ compilers generally optimize comparisons with functions?

For example, this page suggests that the size function on std::lists in C++ can have a linear complexity O(N) in some standard library implementations (which makes sense for a linked list).

But in that case, if myList is a huge list, what would something like this do?

    if (myList.size() < 5) return 1;
    else return 2;

Would the size() function find and count all N list members, or would it be optimized to short circuit after finding 5 members?

Theoretically the possibility exists if size() was inlined, but to perform the optimization the compiler would have to

  1. Detect that you are testing specifically a "less than" condition
  2. Prove that the loop (assume one exists for the purposes of this discussion) results in a variable increasing monotonically
  3. Prove that there are no observable side effects from the loop body

That's a big bunch of things to count on IMHO, and it includes features which are not "obviously useful" in other contexts as well. Keep in mind that compiler vendors have limited resources so there has to be really good justification for implementing these prerequisites and having the compiler bring all the parts together to optimize this case.

Seeing as even if this is a perf issue for someone the problem can be easily solved in code, I don't feel that there would be such justification. So no, generally you should not expect cases like this to be optimized.

C++ Boolean evaluation

20 votes

So I'm curious as to why this happens.

int main()
{
   bool answer = true;
   while(answer)
   {
      cout << "\nInput?\n";
      cin >> answer;
   }
return 0;
}

Expected behavior: 0 - Exits program, 1 - Prompts again, Any non-zero integer other than 1 - Prompts again

Actual behavior: 0 - As expected, 1 - As expected, Any non-zero integer other than 1 - Infinite loop

From http://www.learncpp.com/cpp-tutorial/26-boolean-values/

One additional note: when converting integers to booleans, 
the integer zero resolves to boolean false, 
whereas non-zero integers all resolve to true.

Why does the program go into an infinite loop?

In effect, the operator>> overload used for reading a bool only allows a value of 0 or 1 as valid input. The operator overload defers to the num_get class template, which reads the next number from the input stream and then behaves as follows (C++11 §22.4.2.1/6):

  • If the value to be stored is 0 then false is stored.

  • If the value is 1 then true is stored.

  • Otherwise true is stored and ios_base::failbit is assigned to err.

(err here is the error state of the stream from which you are reading; cin in this case. Note that there is additional language specifying the behavior when the boolalpha manipulator is used, which allows booleans to be inserted and extracted using their names, true and false; I have omitted these other details for brevity.)

When you input a value other than zero or one, the fail state gets set on the stream, which causes further extractions to fail. answer is set to true and remains true forever, causing the infinite loop.

You must test the state of the stream after every extraction, to see whether the extraction succeeded and whether the stream is still in a good state. For example, you might rewrite your loop as:

bool answer = true;
while (std::cin && answer)
{
    std::cout << "\nInput?\n";
    std::cin >> answer;
}

Does this C++ code leak memory?

19 votes
struct Foo
{
    Foo(int i)
    {
        ptr = new int(i);
    }
    ~Foo()
    {
        delete ptr;
    }
    int* ptr;
};

int main()
{
    {
        Foo a(8);
        Foo b(7);
        a = b;
    }
    //Do other stuff
}

If I understand correctly, the compiler will automatically create an assignment operator member function for Foo. However, that just takes the value of ptr in b and puts it in a. The memory allocated by a originally seems lost. I could do a call a.~Foo(); before making the assignment, but I heard somewhere that you should rarely need to explicitly call a destructor. So let's say instead I write an assignment operator for Foo that deletes the int pointer of the left operand before assigning the r-value to the l-value. Like so:

Foo& operator=(const Foo& other) 
{
    //To handle self-assignment:
    if (this != &other) {
        delete this->ptr;
        this->ptr = other.ptr;
    }
    return *this;
}

But if I do that, then when Foo a and Foo b go out of scope, don't both their destructors run, deleting the same pointer twice (since they both point to the same thing now)?

Edit:

If I understand Anders K correctly, this is the proper way to do it:

Foo& operator=(const Foo& other) 
{
    //To handle self-assignment:
    if (this != &other) {
        delete this->ptr;
        //Clones the int
        this->ptr = new int(*other.ptr);
    }
    return *this;
}

Now, a cloned the int that b pointed to, and sets its own pointer to it. Perhaps in this situation, the delete and new were not necessary because it just involves ints, but if the data member was not an int* but rather a Bar* or whatnot, a reallocation could be necessary.

Edit 2: The best solution appears to be the copy-and-swap idiom.

Whether this leaks memory?
No it doesn't.

It seems most of the people have missed the point here. So here is a bit of clarification.

The initial response of "No it doesn't leak" in this answer was Incorrect but the solution that was and is suggested here is the only and the most appropriate solution to the problem.


The solution to your woes is:

Not use a pointer to integer member(int *) but to use just an integer (int), You don't really need dynamically allocated pointer member here. You can achieve the same functionality using an int as member.
Note that in C++ You should use new as little as possible.

If for some reason(which I can't see in the code sample) You can't do without dynamically allocated pointer member read on:

You need to follow the Rule of Three!


Why do you need to follow Rule of Three?

The Rule of Three states:

If your class needs either

a copy constructor,
an assignment operator,
or a destructor,

then it is likely to need all three of them.

Your class needs an explicit destructor of its own so it also needs an explicit copy constructor and copy assignment operator.
Since copy constructor and copy assignment operator for your class are implicit, they are implicitly public as well, Which means the class design allows to copy or assign objects of this class. The implicitly generated versions of these functions will only make a shallow copy of the dynamically allocated pointer member, this exposes your class to:

  • Memory Leaks &
  • Dangling pointers &
  • Potential Undefined Behavior of double deallocation

Which basically means you cannot make do with the implicitly generated versions, You need to provide your own overloaded versions and this is what Rule of Three says to begin with.

The explicitly provided overloads should make a deep copy of the allocated member and it thus prevents all your problems.

How to implement the Copy assignment operator correctly?

In this case the most efficient and optimized way of providing a copy assignment operator is by using:
copy-and-swap Idiom
@GManNickG's famous answer provides enough detail to explain the advantages it provides.


Suggestion:

Also, You are much better off using smart pointer as an class member rather than a raw pointer which burdens you with explicit memory management. A smart pointer will implicitly manage the memory for you. What kind of smart pointer to use depends on lifetime and ownership semantics intended for your member and you need to choose an appropriate smart pointer as per your requirement.

What would be an ideal buffer size?

15 votes

Possible Duplicate:
How do you determine the ideal buffer size when using FileInputStream?

When reading raw data from a file (or any input stream) using either the C++'s istream family's read() or C's fread(), a buffer has to be supplied, and a number of how much data to read. Most programs I have seen seem to arbitrarily chose a power of 2 between 512 and 4096.

  1. Is there a reason it has to/should be a power of 2, or this just programer's natural inclination to powers of 2?
  2. What would be the "ideal" number? By "ideal" I mean that it would be the fastest. I assume it would have to be a multiple of the underlying device's buffer size? Or maybe of the underlying stream object's buffer? How would I determine what the size of those buffers is, anyway? And once I do, would using a multiple of it give any speed increase over just using the exact size?

EDIT
Most answers seem to be that it can't be determined at compile time. I am fine with finding it at runtime.

SOURCE:
How do you determine the ideal buffer size when using FileInputStream?

Optimum buffer size is related to a number of things: file system block size, CPU cache size and cache latency.

Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, the you pay the price of the disk->RAM latency as well.

This is why you see most buffers sized as a power of 2, and generally larger than (or equal to) the disk block size. This means that one of your stream reads could result in multiple disk block reads - but those reads will always use a full block - no wasted reads.

Ensuring this also typically results in other performance friendly parameters affecting both reading and subsequent processing: data bus width alignment, DMA alignment, memory cache line alignment, whole number of virtual memory pages.

Benefits of compiling C code with gcc's C++ front-end

13 votes

I am very interrogative and perplexed by this commit on android's dalvik platform pushed a year ago.

File extensions were changed to C++ extensions in order to "move the interpreter into C++" - use the compiler's C++ front-end.

What could be the benefits of this change ? Dalvik Platform is a 100% C & asm project and not any C++ feature is used.

I can only speculate, but considering how the Android system has grown in complexity, the scoping features of C++ (classes and namespaces) might make the code base more manageable.

EDIT

Even if the project doesn't currently make use of any C++ features, they may simply be planning ahead.

Apart from some minor differences (namely some parameter conventions most people avoid anyway), C source code compiles as C++ without modification. That being said, in some areas C++ syntax is stricter than C (C allows you to assign a void pointer to another pointer type without a cast; in C++, this is an error), and enforcing this strictness avoids problems down the road. *

*) (That's an overly simplistic view, see comment)

One further reason for the change may be that because most modern development favors C++ over C, a richer set of tools is available.

Speculating again, but at the birth of Android C may have been the only viable option for embedded device development, and now that restriction is no longer an issue.

WaitForSingleObject - do threads waiting form a queue?

6 votes

If I set 3 threads to wait for a mutex to be release, do they form a queue based on the order they requested it in or is it undefined behaviour (i.e. we don't know which one will pick it up first)?

It is explicitly documented in the SDK article:

If more than one thread is waiting on a mutex, a waiting thread is selected. Do not assume a first-in, first-out (FIFO) order. External events such as kernel-mode APCs can change the wait order.

These kind of events are entirely out of your control. So "undefined behavior" is an appropriate way to describe it.

Use memory region as stack space?

6 votes

In Linux is it possible to start a process (e.g. with execve) and make it use a particular memory region as stack space?

Background:

I have a C++ program and a fast allocator that gives me "fast memory". I can use it for objects that make use of the heap and create them in fast memory. Fine. But I also have a lot of variable living on the stack. How can I make them use the fast memory as well?

Idea: Implement a "program wrapper" that allocates fast memory and then starts the actual main program, passing a pointer to the fast memory and the program uses it as stack. Is that possible?

[Update]

The pthread setup seems to work.

With pthreads, you could use a secondary thread for your program logic, and set its stack address using pthread_attr_setstack():

NAME
       pthread_attr_setstack,  pthread_attr_getstack  -  set/get stack
       attributes in thread attributes object

SYNOPSIS
       #include <pthread.h>

       int pthread_attr_setstack(pthread_attr_t *attr,
                                 void *stackaddr, size_t stacksize);

DESCRIPTION
       The pthread_attr_setstack() function sets the stack address and
       stack  size attributes of the thread attributes object referred
       to by attr to the values specified in stackaddr and  stacksize,
       respectively.   These  attributes specify the location and size
       of the stack that should be used by a thread  that  is  created
       using the thread attributes object attr.

       stackaddr should point to the lowest addressable byte of a buf‐
       fer of stacksize bytes that was allocated by the  caller.   The
       pages  of  the  allocated  buffer  should  be both readable and
       writable.

What I don't follow is how you're expecting to get any performance improvements out of doing something like this (I assume the purpose of your "fast" memory is better performance).