Best c++ questions in August 2011

int a[] = {1,2,}; Weird comma allowed. Any particular reason?

126 votes

Maybe I am not from this planet, but it would seem to me that the following should be a syntax error:

int a[] = {1,2,}; //extra comma in the end

But it's not. I was surprised when this code compiled on Visual Studio, but I have learnt not to trust MSVC compiler as far as C++ rules are concerned, so I checked the standard and it is allowed by the standard as well. You can see 8.5.1 for the grammar rules if you don't believe me.

enter image description here

Why is this allowed? This may be a stupid useless question but I want you to understand why I am asking. If it were a sub-case of a general grammar rule, I would understand - they decided not to make the general grammar any more difficult just to disallow a redundant comma at the end of an initializer list. But no, the additional comma is explicitly allowed. For example, it isn't allowed to have a redundant comma in the end of a function-call argument list (when the function takes ...), which is normal.

So, again, is there any particular reason this redundant comma is explicitly allowed?

It makes it easier to generate source code, and also to write code which can be easily extended at a later date. Consider what's required to add an extra entry to:

int a[] = {
   1,
   2,
   3
};

... you have to add the comma to the existing line and add a new line. Compare that with the case where the three already has a comma after it, where you just have to add a line. Likewise if you want to remove a line you can do so without worrying about whether it's the last line or not, and you can reorder lines without fiddling about with commas. Basically it means there's a uniformity in how you treat the lines.

Now think about generating code. Something like (pseudo-code):

output("int a[] = {");
for (int i = 0; i < items.length; i++) {
    output("%s, ", items[i]);
}
output("};");

No need to worry about whether the current item you're writing out is the first or the last. Much simpler.

Why does this code crash?

59 votes

I went to a job interview today and was given this interesting question.

Besides the memory leak and the fact there is no virtual dtor, why does this code crash?

#include <iostream>

//besides the obvious mem leak, why does this code crash?

class Shape
{
public:
    virtual void draw() const = 0;
};

class Circle : public Shape
{
public:
    virtual void draw() const { }

    int radius;
};

class Rectangle : public Shape
{
public:
    virtual void draw() const { }

    int height;
    int width;
};

int main()
{
    Shape * shapes = new Rectangle[10];
    for (int i = 0; i < 10; ++i)
        shapes[i].draw();
}

You cannot index like that. You have allocated an array of Rectangles and stored a pointer to the first in shapes. When you do shapes[1] you're dereferencing (shapes + 1). This will not give you a pointer to the next Rectangle, but a pointer to what would be the next Shape in a presumed array of Shape. Of course, this is undefined behaviour. In your case, you're being lucky and getting a crash.

Using a pointer to Rectangle makes the indexing work correctly.

int main()
{
   Rectangle * shapes = new Rectangle[10];
   for (int i = 0; i < 10; ++i) shapes[i].draw();
}

If you want to have different kinds of Shapes in the array and use them polymorphically you need an array of pointers to Shape.

When is an integer<->pointer cast actually correct?

51 votes

The common folklore says that:

  • The type system exists for a reason. Integers and pointers are distinct types, casting between them is a malpractice in the majority of cases, may indicate a design error and should be avoided.

  • Even when such a cast is performed, no assumptions shall be made about the size of integers and pointers (casting void* to int is the simplest way to make the code fail on x64), and instead of int one should use intptr_t or uintptr_t from stdint.h.

Knowing that, when is it actually useful to perform such casts?

(Note: having a bit shorter code for the price of portability doesn't count as "actually useful".)


One case I know:

  • Some lock-free multiprocessor algorithms exploit the fact that a 2+-byte-alligned pointer has some redundancy. They then use the lowest bits of the pointer as boolean flags, for instance. With a processor having an appropriate instruction set, this may eliminate the need for a locking mechanism (which would be necessary if the pointer and the boolean flag were separate).
    (Note: This practice is even possible to do safely in Java via java.util.concurrent.atomic.AtomicMarkableReference)

Anything more?

I sometimes cast pointers to integers when they somehow need to be part of a hashsum. Also I cast them to integers to do some bitfiddling with them on certain implemetnations where it is guaranteed that pointers always have one or two spare bits left, where I can encode AVL or RB Tree information in the left/right pointers instead of having an additional member. But this is all so implementation specific that I recommend to never think about it as any kind of common solution. Also I heard that sometimes hazard pointers can be implemented with such a thing.

In some situations I need a unique ID per object that I pass along to e.g. servers as my request id. Depending on the context when I need to save some memory, and it is worth it, I use the address of my object as such an id, and usually have to cast it to an integer.

When working with embedded systems (such as in canon cameras, see chdk) there are often magic addesses, so a (void*)0xFFBC5235 or similar is often found there too

edit:

Just stumbled (in my mind) over pthread_self() which returns a pthread_t which is usually a typedef to an unsigned integer. Internally though it is a pointer to some thread struct, representing the thread in question. In general it might used elsewhere for an opaque handle.

Automatically pick a variable type big enough to hold a specified number

38 votes

Is there any way in C++ define a type that is big enough to hold at most a specific number, presumably using some clever template code. For example I want to be able to write :-

Integer<10000>::type dataItem;

And have that type resolve to the smallest type that is big enough to hold the specified value?

Background: I need to generate some variable defintions using a script from an external data file. I guess I could make the script look at the values and then use uint8_t, uint16_t, uint32_t, etc. depending on the value, but it seems more elegant to build the size into the generated C++ code.

I can't see any way to make a template that can do this, but knowing C++ templates, I'm sure there is a way. Any ideas?

Boost.Integer already has facilities for Integer Type Selection:

boost::int_max_value_t<V>::least

The smallest, built-in, signed integral type that can hold all the values in the inclusive range 0 - V. The parameter should be a positive number.

boost::uint_value_t<V>::least

The smallest, built-in, unsigned integral type that can hold all positive values up to and including V. The parameter should be a positive number.

Are there cases where a typedef is absolutely necessary?

33 votes

Consider the following excerpt from the safe bool idiom:

typedef void (Testable::*bool_type)() const;
operator bool_type() const;

Is it possible to declare the conversion function without the typedef? The following does not compile:

operator (void (Testable::*)() const)() const;

Ah, I just remembered the identity meta-function. It is possible to write

operator typename identity<void (Testable::*)() const>::type() const;

with the following definition of identity:

template <typename T>
struct identity
{
    typedef T type;
};

You could argue that identity still uses a typedef, but this solution is "good" enough for me.

When does Endianness become a factor?

31 votes

Endianness from what I understand, is when the bytes that compose a multibyte word differ in their order, at least in the most typical case. So that an 16-bit integer may be stored as either 0xHHLL or 0xLLHH.

Assuming I don't have that wrong, what I would like to know is when does Endianness become a major factor when sending information between two computers where the Endian may or may not be different.

  • If I transmit a short integer of 1, in the form of a char array and with no correction, is it received and interpretted as 256?

  • If I decompose and recompose the short integer using the following code, will endianness no longer be a factor?

    // Sender:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        stl_bitset[n] = (value >> n) & 1;
    };
    
    // Receiver:
    for(n=0, n < sizeof(uint16)*8; ++n) {
        value |= uint16(stl_bitset[n] & 1) << n;
    };
    
  • Is there a standard way of compensating for endianness?

Thanks in advance!

Very abstractly speaking, endianness is a property of the reinterpretation of a variable as a char-array.

Practically, this matters precisely when you read() from and write() to an external byte stream (like a file or a socket). Or, speaking abstractly again, endianness matters when you serialze data (essentially because serialized data has no type system and just consists of dumb bytes); and endianness does not matter within your programming language, because the language only operates on values, not on representations. Going from one to the other is where you need to dig into the details.

To wit - writing:

uint32_t n = get_number();

unsigned char bytesLE[4] = { n, n >> 8, n >> 16, n >> 24 };  // little-endian order
unsigned char bytesBE[4] = { n >> 24, n >> 16, n >> 8, n };  // big-endian order

write(bytes..., 4);

Here we could just have said, reinterpret_cast<unsigned char *>(&n), and the result would have depended on the endianness of the system.

And reading:

unsigned char buf[4] = read_data();

uint32_t n_LE = buf[0] + buf[1] << 8 + buf[2] << 16 + buf[3] << 24; // little-endian
uint32_t n_BE = buf[3] + buf[2] << 8 + buf[1] << 16 + buf[0] << 24; // big-endian

Again, here we could have said, uint32_t n = *reinterpret_cast<uint32_t*>(buf), and the result would have depended on the machine endianness.


As you can see, with integral types you never have to know the endianness of your own system, only of the data stream, if you use algebraic input and output operations. With other data types such as double, the issue is more complicated.

What is this crazy C++0x syntax?

30 votes

What could this possibly mean in C++0x?

struct : bar {} foo {};

First, we'll take a bog-standard abstract UDT (User-Defined Type):

struct foo { virtual void f() = 0; }; // normal abstract type
foo obj;
// error: cannot declare variable 'obj' to be of abstract type 'foo'

Let's also recall that we can instantiate the UDT at the same time that we define it:

struct foo { foo() { cout << "!"; } };          // just a definition

struct foo { foo() { cout << "!"; } } instance; // so much more
// Output: "!"

Let's combine the examples, and recall that we can define a UDT that has no name:

struct { virtual void f() = 0; } instance; // unnamed abstract type
// error: cannot declare variable 'instance' to be of abstract type '<anonymous struct>'

We don't need the proof about the anonymous UDT any more, so we can lose the pure virtual function. Also renaming instance to foo, we're left with:

struct {} foo;

Getting close.


Now, what if this anonymous UDT were to derive from some base?

struct bar {};       // base UDT
struct : bar {} foo; // anonymous derived UDT, and instance thereof

Finally, C++0x introduces extended initialisers, such that we can do confusing things like this:

int x{0};

And this:

int x{};

And, finally, this:

struct : bar {} foo {};

This is an unnamed struct deriving from bar, instantiated as foo with a blank initializer.

Why should one replace default new and delete operators?

29 votes

Why should one replace the default operator new and delete with a custom new and delete operators?

This is in continuation of Overloading new and delete in the immensely illuminating C++ FAQ:
Operator overloading.

An followup entry to this FAQ is:
How should I write ISO C++ standard conformant custom new and delete operators?

Note: The answer is based on learning's from Scott Meyer's More Effective C++.
(Note: This is meant to be an entry to Stack Overflow's C++ FAQ. If you want to critique the idea of providing an FAQ in this form, then the posting on meta that started all this would be the place to do that. Answers to that question are monitored in the C++ chatroom, where the FAQ idea started out in the first place, so your answer is very likely to get read by those who came up with the idea.)

One may try to replace new and delete operators for a number of reasons, namely:

To Detect Usage Errors:

There are a number of ways in which incorrect usage of new and delete may lead to the dreaded beasts of Undefined Behavior & Memory leaks. Respective examples of each are:
Using more than one delete on newed memory & not calling delete on memory allocated using new.
An overloaded operator new can keep a list of allocated addresses and the overloaded operator delete can remove addresses from the list, then it is easy to detect such usage errors.

Similarly, a variety of programming mistakes can lead to data overruns(writing beyond the end of an allocated block) and underruns(writing prior to the beginning of an allocated block).
An Overloaded operator new can over-allocate blocks and put known byte patterns ("signatures") before and after the memory made available to clients. The overloaded operator deletes can check to see if the signatures are still intact. Thus by checking if these signatures are not intact it is possible to determine that an overrun or under-run occurred sometime during the life of the allocated block, and operator delete can log that fact, along with the value of the offending pointer, thus helping in providing a good diagnostic information.


To Improve Efficiency(speed & memory):

The new and delete operators work reasonably well for everybody, but optimally for nobody. This behavior arises from the fact that they are designed for genreal purpose use only. They have to accommodate allocation patterns ranging from the dynamic allocation of a few blocks that exist for the duration of the program to constant allocation and deallocation of a large number of short-lived objects. Eventually, the operator new and operator delete that ship with compilers take a middle-of-the-road strategy.

If you have a good understanding of your program's dynamic memory usage patterns, you can often find that custom versions of operator new and operator delete outperform (faster in performance, or require less memory up to 50%)the default ones. Ofcourse, unless you are sure of what you are doing it is not a good idea to do this(don't even try this if you don't understand the intricacies involved).


To Collect Usage Statistics:

Before thinking of replacing new and delete for improving efficieny as mentioned in #2, You should gather information about how your application/program uses dynamic allocation. You may want to collect information about:
Distribution of allocation blocks,
Distribution of lifetimes,
Order of allocations(FIFO or LIFO or random),
Understanding usage patterns changes over a period of time,maximum amoint of dynamic memory used etc.

Also, sometimes you may need to collect usage information such as:
Count the number of dynamically objects of a class,
Restrict the number of objects being created using dynamic allocation etc.

All, this information can be collected by replacing the custom new and delete and adding the diagnostic collection mechanism in the overloaded new and delete.


To compensate for suboptimal memory alignment in new:

Many computer architectures require that data of particular types be placed in memory at particular kinds of addresses. For example, an architecture might require that pointers occur at addresses that are a multiple of four (i.e., be four-byte aligned) or that doubles must occur at addresses that are a multiple of eight (i.e., be eight-byte aligned). Failure to follow such constraints can lead to hardware exceptions at run-time. Other architectures are more forgiving, and may allow it to work though reducing the performance.The operator new that ship with some compilers don't guarantee eight-byte alignment for dynamic allocations of doubles. In such cases, replacing the default operator new with one that guarantees eight-byte alignment could yield big increases in program performance & can be a good reason to replace new and delete operators.


To cluster related objects near one another:

If you know that particular data structures are generally used together and you'd like to minimize the frequency of page faults when working on the data, it can make sense to create a separate heap for the data structures so they are clustered together on as few pages as possible. custom Placement versions of new and delete can make it possible to achieve such clustering.


To obtain unconventional behavior:

Sometimes you want operators new and delete to do something that the compiler-provided versions don't offer.
For example: You might write a custom operator delete that overwrites deallocated memory with zeros in order to increase the security of application data.

Exotic architectures the standard committee cares about

28 votes

I know that the C++ standard leaves many aspects of the language implementation-defined just because if there had been an architecture with other characteristics, it would be impossible to write a standard conforming compiler for it.

I know that 40 years ago any computer had it's own unique specification. However, I don't know of any architectures used today where:

  • CHAR_BITS != 8
  • signed is not two's complement (I heard java had problems with this one).
  • Floating point is not IEEE 754 complaint.

The reason I'm asking is that I often explain to people that it's good that C++ doesn't mandate any other low-level aspects like fixed sized types. It's good because unlike 'other languages' it makes your code portable when used correctly. But I feel bad that I cannot point to any specific architecture myself.

So the question is: what architectures exhibit the above properties?

uint*_ts are optional.

Take a look at this one

Unisys ClearPath Dorado Servers

offering backward compatibility for people who have not yet migrated all their Univac software.

Key points:

  • 36 bit words
  • CHAR_BIT == 9
  • ones complement
  • 72 bit non-IEEE floating point
  • separate address space for code and data
  • sizeof(char*) != sizeof(int*)[maybe not]
  • word addressed

Don't know if they offer a C++ compiler though, but they could.

C++: When to use References vs. Pointers

27 votes

I understand the syntax and general semantics of pointers versus references, what I can't decide is when is it more-or-less appropriate to use references or pointers in an API?

Naturally some situations need one or the other (operator++ needs a reference argument), but in general I'm finding I prefer to use pointers (and const pointers) as the syntax is clear that the variables are being passed destructively.

E.g. in the following code:

void add_one(int& n) { n += 1; }
void add_one(int* const n) { *n += 1; }
int main() {
  int a = 0;
  add_one(a); // not clear that a may be modified
  add_one(&a); // a is clearly being passed destructively
}

With the pointer, it's always (more) obvious what's going on, so for APIs and the like where clarity is a big concern are pointers not more appropriate than references? Does that mean references should only be used when necessary (e.g. operator++)? Are there any performance concerns with one or the other?

EDIT (OUTDATED):

Besides allowing NULL values and dealing with raw arrays, it seems the choice comes down to personal preference. I've accepted the answer below that references Google's C++ Style Guide, as they present the view that "References can be confusing, as they have value syntax but pointer semantics.".

EDIT:

Due to the additional work required to sanitise pointer arguments that should not be NULL (e.g. add_one(0) will call the pointer version and break during runtime), it makes sense from a maintainability perspective to use references where an object MUST be present, though it is a shame to lose the syntactic clarity.

Use reference wherever you can, pointers wherever you must.

Avoid pointers until you can't.

The reason is that pointers makes things harder to follow/read, less safe and far more dangerous manipulations than any other constructs.

So the rule of thumbs is to use pointers only if there is no other choice.

For example, returning a pointer to an object is a valid option when the function can return nullptr in some cases and it is assumed it will. That said, a better option would be to use something similar to boost::optional.

Another example is to use pointers to raw memory for specific memory manipulations. That should be hidden and localized in very narrow parts of the code, to help limit the dangerous part of the whole code base.

In your example, there is no point in using a pointer as parameter because :

  1. if you provide nullptr as parameter, you're going in undefined-behaviour-land;
  2. the reference attribute version don't allow (without easy to spot tricks) the problem with 1.
  3. the reference attribute version is simpler to understand for the user : you have to provide a valid object, not something that could be null.

If the behaviour of the function would have to work with or without a given object, then using a pointer as attribute suggest that you can pass nullptr as parameter and it is fine for the function. That's a kind of contract between the user and the implementation code.

Dual emission of constructor symbols

26 votes

Today, I discovered a rather interesting thing about either g++ or nm...constructor definitions appear to have two entries in libraries.

I have a header thing.hpp:

class Thing
{
    Thing();

    Thing(int x);

    void foo();
};

And thing.cpp:

#include "thing.hpp"

Thing::Thing()
{ }

Thing::Thing(int x)
{ }

void Thing::foo()
{ }

I compile this with:

g++ thing.cpp -c -o libthing.a

Then, I run nm on it:

%> nm -gC libthing.a
0000000000000030 T Thing::foo()
0000000000000022 T Thing::Thing(int)
000000000000000a T Thing::Thing()
0000000000000014 T Thing::Thing(int)
0000000000000000 T Thing::Thing()
                 U __gxx_personality_v0

As you can see, both of the constructors for Thing are listed with two entries in the generated static library. My g++ is 4.4.3, but the same behavior happens in clang, so it isn't just a gcc issue.

This doesn't cause any apparent problems, but I was wondering:

  • Why are defined constructors listed twice?
  • Why doesn't this cause "multiple definition of symbol __" problems?

EDIT: For Carl, the output without the C argument:

%> nm -g libthing.a
0000000000000030 T _ZN5Thing3fooEv
0000000000000022 T _ZN5ThingC1Ei
000000000000000a T _ZN5ThingC1Ev
0000000000000014 T _ZN5ThingC2Ei
0000000000000000 T _ZN5ThingC2Ev
                 U __gxx_personality_v0

As you can see...the same function is generating multiple symbols, which is still quite curious.

And while we're at it, here is a section of generated assembly:

.globl _ZN5ThingC2Ev
        .type   _ZN5ThingC2Ev, @function
_ZN5ThingC2Ev:
.LFB1:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        pushq   %rbp
        .cfi_def_cfa_offset 16
        movq    %rsp, %rbp
        .cfi_offset 6, -16
        .cfi_def_cfa_register 6
        movq    %rdi, -8(%rbp)
        leave
        ret
        .cfi_endproc
.LFE1:
        .size   _ZN5ThingC2Ev, .-_ZN5ThingC2Ev
        .align 2
.globl _ZN5ThingC1Ev
        .type   _ZN5ThingC1Ev, @function
_ZN5ThingC1Ev:
.LFB2:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        pushq   %rbp
        .cfi_def_cfa_offset 16
        movq    %rsp, %rbp
        .cfi_offset 6, -16
        .cfi_def_cfa_register 6
        movq    %rdi, -8(%rbp)
        leave
        ret
        .cfi_endproc

So the generated code is...well...the same.


EDIT: To see what constructor actually gets called, I changed Thing::foo() to this:

void Thing::foo()
{
    Thing t;
}

The generated assembly is:

.globl _ZN5Thing3fooEv
        .type   _ZN5Thing3fooEv, @function
_ZN5Thing3fooEv:
.LFB550:
        .cfi_startproc
        .cfi_personality 0x3,__gxx_personality_v0
        pushq   %rbp
        .cfi_def_cfa_offset 16
        movq    %rsp, %rbp
        .cfi_offset 6, -16
        .cfi_def_cfa_register 6
        subq    $48, %rsp
        movq    %rdi, -40(%rbp)
        leaq    -32(%rbp), %rax
        movq    %rax, %rdi
        call    _ZN5ThingC1Ev
        leaq    -32(%rbp), %rax
        movq    %rax, %rdi
        call    _ZN5ThingD1Ev
        leave
        ret
        .cfi_endproc

So it is invoking the complete object constructor.

We'll start by declaring that GCC follows the Itanium C++ ABI.


According to the ABI, the mangled name for your Thing::foo() is easily parsed:

_Z     | N      | 5Thing  | 3foo | E          | v
prefix | nested | `Thing` | `foo`| end nested | parameters: `void`

You can read the constructor names similarly, as below. Notice how the constructor "name" isn't given, but instead a C clause:

_Z     | N      | 5Thing  | C1          | E          | i
prefix | nested | `Thing` | Constructor | end nested | parameters: `int`

But what's this C1? Your duplicate has C2. What does this mean?

Well, this is quite simple too:

  <ctor-dtor-name> ::= C1   # complete object constructor
                   ::= C2   # base object constructor
                   ::= C3   # complete object allocating constructor
                   ::= D0   # deleting destructor
                   ::= D1   # complete object destructor
                   ::= D2   # base object destructor

Wait, why is this simple? This class has no base. Why does it have a "complete object constructor" and a "base object constructor" for each?

  • This Q&A implies to me that this is simply a by-product of polymorphism support, even though it's not actually required in this case.

  • Note that c++filt used to include this information in its demangled output, but doesn't any more.

  • This forum post asks the same question, and the only response doesn't do any better at answering it, except for the implication that GCC could avoid emitting two constructors when polymorphism is not involved, and that this behaviour ought to be improved in the future.

  • This newsgroup posting describes a problem with setting breakpoints in constructors due to this dual-emission. It's stated again that the root of the issue is support for polymorphism.

In fact, this is listed as a GCC "known issue":

G++ emits two copies of constructors and destructors.

In general there are three types of constructors (and destructors).

  • The complete object constructor/destructor.
  • The base object constructor/destructor.
  • The allocating constructor/deallocating destructor.

The first two are different, when virtual base classes are involved.


The meaning of these different constructors seems to be as follows:

  • The "complete object constructor". It additionally constructs virtual base classes.

  • The "base object constructor". It creates the object itself, as well as data members and non-virtual base classes.

  • The "allocating object constructor". It does everything the complete object constructor does, plus it calls operator new to actually allocate the memory... but apparently this is not usually seen.

If you have no virtual base classes, [the first two] are are identical; GCC will, on sufficient optimization levels, actually alias the symbols to the same code for both.

At what moment is memory typically allocated for local variables in C++?

22 votes

I'm debugging a rather weird stack overflow supposedly caused by allocating too large variables on stack and I'd like to clarify the following.

Suppose I have the following function:

void function()
{
    char buffer[1 * 1024];
    if( condition ) {
       char buffer[1 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    } else {
       char buffer[512 * 1024];
       doSomething( buffer, sizeof( buffer ) );
    }
 }

I understand, that it's compiler-dependent and also depends on what optimizer decides, but what is the typical strategy for allocating memory for those local variables?

Will the worst case (1 + 512 kilobytes) be allocated immediately once function is entered or will 1 kilobyte be allocated first, then depending on condition either 1 or 512 kilobytes be additionally allocated?

Your local (stack) variables are allocated in the same space as stack frames. When the function is called, the stack pointer is changed to "make room" for the stack frame. It's typically done in a single call. If you consume the stack with local variables, you'll encounter a stack overflow.

~512 kbytes is really too large for the stack in any case; you should allocate this on the heap using std::vector.

Mystical restriction on std::binary_search

21 votes

Problem desicription:
Consider some structure having an std::string name member. For clearness let's suppose that it's a struct Human, representing information about people. Besides the name it can also have many other data members.
Let there be a container std::vector<Human> vec, where the objects are already sorted by name. Also for clearness suppose that all the names are unique.
The problem is: having some string nameToFind find out if there exists an element in the array having such name.

Solution and my progress:
The obvious and natural solution seems to perform a binary search using the std::binary_search function. But there is a problem: the type of the element being searched (std::string) is different from the type of the elements in the container (Human), and std::binary_search needs a rule to compare these elements. I tried to solve this in three ways, described below. First two are provided just to illustrate the evolution of my solution and the problems which I came across. My main question refers to the third one.

Attempt 1: convert std::string to Human.

Write a comparing function:

bool compareHumansByNames( const Human& lhs, const Human& rhs )
{
   return lhs.name < rhs.name;
}

Then add a constructor which constructs a Human object from std::string:

struct Human
{
   Human( const std::string& s );
   //... other methods

   std::string name;
   //... other members
};

and use the binary_search in following form:

std::binary_search( vec.begin(), vec.end(), nameToFind, compareHumansByNames );

Seems working, but turns up two big problems:
First, how to initialize other data members but Human::name, especially in the case when they don't have a default constructor ? setting magic values may lead to creation of an object which is semantically illegal.
Second, we have to declare this constructor as non explicit to allow implicit conversions during the algorithm. The bad consequences of this are well known.
Also, such a temporary Human object will be constructed at each iteration, which can turn out to be quite expensive.

Attempt 2: convert Human to std::string.

We can try to add an operator string () to the Human class which returns it's name, and then use the comparsion for two std::strings. However, this approach is also inconvenient by the following reasons:

First, the code will not compile at once because of the problem discussed here. We will have to work a bit more to make the compiler use the appropriate operator <.
Second, what does mean "convert a Human to string" ? Existence of such conversion can lead to semantically wrong usage of class Human, which is undesirable.

Attempt 3: compare without conversions.

The best solution I got so far is to create a

struct Comparator
{
   bool operator() ( const Human& lhs, const std::string& rhs )
   {
      return lhs.name < rhs;
   }
   bool operator() ( const std::string& lhs, const Human& rhs )
   {
      return lhs < rhs.name;
   }
};

and use binary search as

binary_search( vec.begin(), vec.end(), nameToFind, Comparator() );

This compiles and executes correctly, everything seems to be ok, but here is where the interesting part begins:

Have a look at http://www.sgi.com/tech/stl/binary_search.html. It's said here that "ForwardIterator's value type is the same type as T.". Quite confusing restriction, and my last solution breaks it. Let's see what does the C++ standard say about it:


25.3.3.4 binary_search

template<class ForwardIterator, class T>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value);

template<class ForwardIterator, class T, class Compare>
bool binary_search(ForwardIterator first, ForwardIterator last,
const T& value, Compare comp);

Requires: Type T is LessThanComparable (20.1.2).


Nothing is explicitly said about ForwardIterator's type. But, in definition of LessThanComparable given in 20.1.2 it is said about comparsion of two elements of the same type. And here is what I do not understand. Does it indeed mean that the type of the object being searched and the type of the container's objects must be the same, and my solution breaks this restriction ? Or it does not refer to the case when the comp comparator is used, and only is about the case when the default operator < is used for comparsion ? In first case, I'm confused about how to use std::binary_search to solve this without coming across the problems mentioned above.

Thanks in advance for help and finding time to read my question.

Note: I understand that writing a binary search by hand takes no time and will solve the problem instantly, but to avoid re-inventing a wheel I want to use the std::binary_search. Also it's very interesting to me to find out about existence of such restriction according to standard.

If your goal is to find if there is a Human with a given name, then the following should work for sure:

const std::string& get_name(const Human& h)
{
    return h.name;
}

...

bool result = std::binary_search(
    boost::make_transform_iterator(v.begin(), &get_name),
    boost::make_transform_iterator(v.end(), &get_name),
    name_to_check_against);

Floating point comparison

20 votes
int main()
{
    float a = 0.7;
    float b = 0.5;
    if (a < 0.7)
    {
       if (b < 0.5) printf("2 are right");
       else         printf("1 is right");
    }
    else printf("0 are right");
}

I would have expected the output of this code to be 0 are right. But to my dismay the output is 1 is right why?

int main()
{
    float a = 0.7, b = 0.5; // These are FLOATS
    if(a < .7)              // This is a DOUBLE
    {
      if(b < .5)            // This is a DOUBLE
        printf("2 are right");
      else
        printf("1 is right");
    }
    else
      printf("0 are right");
}

Floats get promoted to doubles during comparison, and since floats are less precise than doubles, 0.7 as float is not the same as 0.7 as double. In this case, 0.7 as float becomes inferior to 0.7 as double when it gets promoted. And as Christian said, 0.5 being a power of 2 is always represented exactly, so the test works as expected: 0.5 < 0.5 is false.

So either change float to double or .7 and .5 to .7f and .5f and you will get the expected behavior.

What can I do with a moved-from object?

20 votes

Does the standard define precisely what I can do with an object once it has been moved from? I used to think that all you can do with a moved-from object is do destruct it, but that would not be sufficient.

For example, take the function template swap as defined in the standard library:

template <typename T>
void swap(T& a, T& b)
{
    T c = std::move(a); // line 1
    a = std::move(b);   // line 2: assignment to moved-from object!
    b = std::move(c);   // line 3: assignment to moved-from object!
}

Obviously, it must be possible to assign to moved-from objects, otherwise lines 2 and 3 would fail. So what else can I do with moved-from objects? Where exactly can I find these details in the standard?

(By the way, why is it T c = std::move(a); instead of T c(std::move(a)); in line 1?)

Moved-from objects exist in an unspecified, but valid, state. That suggests that whilst the object might not be capable of doing much anymore, all of it's member functions should still exhibit defined behaviour - including operator= - and all it's members in a defined state- and it still requires destruction. The Standard gives no specific definitions because it would be unique to each UDT, but you might be able to find specifications for Standard types. Some like containers are relatively obvious- they just move their contents around and an empty container is a well-defined valid state. Primitives don't modify the moved-from object.

Side note: I believe it's T c = std::move(a) so that if the move constructor (or copy constructor if no move is provided) is explicit the function will fail.

Why can't arrays be passed as function arguments?

20 votes

Why can't you pass arrays as function arguments?

I have been reading this C++ book that says 'you can't pass arrays as function arguments', but it never explains why. Also, when I looked it up online I found comments like 'why would you do that anyway?' It's not that I would do it, I just want to know why you can't.

Why can't arrays be passed as function arguments?

They can:

void foo(const int (&myArray)[5]) {
   // `myArray` is the original array of five integers
}

In technical terms, the type of the argument to foo is "reference to array of 5 const ints"; with references, we can pass the actual object around (disclaimer: terminology varies by abstraction level).

What you can't do is pass by value, because for historical reasons we shall not copy arrays. Instead, attempting to pass an array by value into a function (or, to pass a copy of an array) leads its name to decay into a pointer. (some resources get this wrong!)


Array names decay to pointers for pass-by-value

This means:

void foo(int* ptr);

int ar[10]; // an array
foo(ar);    // automatically passing ptr to first element of ar (i.e. &ar[0])

There's also the hugely misleading "syntactic sugar" that looks like you can pass an array of arbitrary length by value:

void foo(int ptr[]);

int ar[10]; // an array
foo(ar);

But, actually, you're still just passing a pointer (to the first element of ar). foo is the same as it was above!

Whilst we're at it, the following function also doesn't really have the signature that it seems to. Look what happens when we try to call this function without defining it:

void foo(int ar[5]);
int main() {
   int ar[5];
   foo(ar);
}

// error: undefined reference to `func(int*)'

So foo takes int* in fact, not int[5]!

(Live demo.)


But you can work-around it!

You can hack around this by wrapping the array in a struct or class, because the default copy operator will copy the array:

struct Array_by_val
{
  int my_array[10];
};

void func (Array_by_val x) {}

int main() {
   Array_by_val x;
   func(x);
}

This is somewhat confusing behaviour.


Or, better, a generic pass-by-reference approach

In C++, with some template magic, we can make a function both re-usable and able to receive an array:

template <typename T, size_t N>
void foo(const T (&myArray)[N]) {
   // `myArray` is the original array of N Ts
}

But we still can't pass one by value. Something to remember.


The future...

And since C++11 is just over the horizon, and C++0x support is coming along nicely in the mainstream toolchains, you can use the lovely std::array inherited from Boost! I'll leave researching that as an exercise to the reader.

Why doesn't Java have intializer lists like in C++?

16 votes

In C++, you can use an initializer list to initialize the class's fields before the constructor begins running. For example:

Foo::Foo(string s, double d, int n) : name(s), weight(d), age(n) {
    // Empty; already handled!
}

I am curious why Java does not have a similar feature. According to Core Java: Volume 1:

C++ uses this special syntax to call field constructors. In Java, there is no need for it because objects have no subobjects, only pointers to other objects.

Here are my questions:

  1. What do they mean by "because objects have no subobjects?" I don't understand what a subobject is (I tried looking it up); do they mean an instantiation of a subclass which extends a superclass?

  2. As for why Java does not have initializer lists like C++, I would assume that the reason is because all fields are already initialized by default in Java and also because Java uses the super keyword to call the super(or base in C++ lingo)-class constructor. Is this correct?

In C++, initializer lists are necessary because of a few language features that are either not present in Java or work differently in Java:

  1. const: In C++, you can define a fields that are marked const that cannot be assigned to and must be initialized in the initializer list. Java does have final fields, but you can assign to final fields in the body of a constructor. In C++, assigning to a const field in the constructor is illegal.

  2. References: In C++, references (as opposed to pointers) must be initialized to bind to some object. It is illegal to create a reference without an initializer. In C++, the way that you specify this is with the initializer list, since if you were to refer to the reference in the body of the constructor without first initializing it you would be using an uninitialized reference. In Java, object references behave like C++ pointers and can be assigned to after created. They just default to null otherwise.

  3. Direct subobjects. In C++, an object can contain object directly as fields, whereas in Java objects can only hold references to those objects. That is, in C++, if you declare an object that has a string as a member, the storage space for that string is built directly into the space for the object itself, while in Java you just get space for a reference to some other String object stored elsewhere. Consequently, C++ needs to provide a way for you to give those subobjects initial values, since otherwise they'd just stay uninitialized. By default it uses the default constructor for those types, but if you want to use a different constructor or no default constructor is available the initializer list gives you a way to bypass this. In Java, you don't need to worry about this because the references will default to null, and you can then assign them to refer to the objects you actually want them to refer to. If you want to use a non-default constructor, then you don't need any special syntax for it; just set the reference to a new object initialized via the appropriate constructor.

In the few cases where Java might want initializer lists (for example, to call superclass constructors or give default values to its fields), this is handled through two other language features: the super keyword to invoke superclass constructors, and the fact that Java objects can give their fields default values at the point at which they're declared. C++ doesn't support the latter of these (but in C++0x they're changing this; see Bjarne's description of the change), and it can't support the former because with multiple inheritance you may need to initialize several base objects and the term super wouldn't unambiguously refer to a single base class.

Hope this helps!

Is it possible to write a program without using main() function?

15 votes

I keep getting this question asked in interviews:

Write a program without using main() function?

One of my friends showed me some code using Macros, but i could not understand it.

So the question is:

Is it really possible to write and compile a program without main()?

Within standard C++ a main function is required, so the question does not make sense for standard C++.

Outside of standard C++ you can for example write a Windows specific program and use one of Microsoft's custom startup functions (wMain, winMain, wWinmain). In Windows you can also write the program as a DLL and use rundll32 to run it.

Apart from that you can make your own little runtime library. At one time that was a common sport.

Finally, you can get clever and retort that according to the standard's ODR rule main isn't "used", so any program qualifies. Bah! Although unless the interviewers have unusual good sense of humor (and they wouldn't have asked the question if they had) they'll not think that that's a good answer.

Cheers & hth.,

Why is memset() incorrectly initializing int?

14 votes

Why is the output of the following program 84215045?

int grid[110];
int main()
{
    memset(grid, 5, 100 * sizeof(int));
    printf("%d", grid[0]);
    return 0;
}

memset sets each byte of the destination buffer to the specified value. On your system, an int is four bytes, each of which is 5 after the call to memset. Thus, grid[0] has the value 0x05050505 (hexadecimal), which is 84215045 in decimal.

Some platforms provide alternative APIs to memset that write wider patterns to the destination buffer; for example, on OS X or iOS, you could use:

int pattern = 4;
memset_pattern4(grid, &pattern, sizeof grid);

to get the behavior that you seem to expect. What platform are you targeting?

In C++, you should just use std::fill_n:

std::fill_n(grid, 100, 5);

C++ Analogue for WPF

9 votes

So I've fooled around with WPF a bit recently, and I must say that I really like the idea. I love the framework as a whole, from the GUI to the plumbing.

However, as much as I love managed land, I love my native code just as much. So I'm wondering what sort of libraries exists for C++ which capture the essence of various parts of WPF. I'm not looking for interop solution, nor do I want Managed C++ or C++/CLI solutions, but pure C++ solutions.

Now, I'm not expecting to find a "copy" of WPF for C++ - I wouldn't expect that to exist, nor would I need it to. Instead, I would expect that different libraries might capture a subset of the desired concepts. My particular interests are

  1. Hardware accelerated graphics for widget based GUI's (via DirectX or OpenGL, preferably the latter)

  2. Declarative language for GUI design (preferably an XML dialect)

  3. Data binding

  4. Resolution independence (less important)

To say a little about my reasoning, I would like to implement such a library myself, which captures a specific model that I have begun working out. I am in the process of finding some more inspiration and helpful resources before locking down my design. The library is intended to be cross-platform, so references to cross-platform ideas would be great, but not strictly necessary as I am usually capable of translating things into cross-platform solutions.

Lastly, although I am writing a C++ library, and C++ ideas would be great, I am open to ideas from any native language.

Thanks in advance for any help.

  1. There isn't really anything like this. Not cross-platform at any rate. Direct2D works reasonably well, but is obviously Windows-only. And NVIDIA recently dropped this "path" extension of OpenGL that is similar in basic functionality, but it is NVIDIA-only (and not available on Mac OSX). Cairo has an OpenGL backend, but I have no idea how good it is. It can't be that good if Mozilla dumped Cairo in favor of D2D on Windows.
  2. Many GUI toolkits have some form of language for making a GUI. Qt has one that is pre-compiled into C++.
  3. Not that I know of. Data binding requires some form of reflection (WPF-style data binding does), and C++ has no native support for reflection. So you would need to implement reflection of some sort before you can even begin to make WPF-style data binding work.
  4. That comes with #1. More or less, as any GPU-based renderer will be able to operate at arbitrary resolutions.

I love C++, but honestly, this sort of thing is best implemented for a higher level language. The lack of language-based reflection support will make implementing data binding a huge pain. Whereas, you could just implement the low-level "render stuff to area" and basic window/event management in C++, then expose it to a scripting language where data binding and such work. That way, you have native code speed where you need it, but the versatility and reflection of a scripting language for dealing with the GUI and its associated data.