Best questions in June 2011

Why doesn't GCC optimize a*a*a*a*a*a to (a*a*a)*(a*a*a)?

389 votes

I am doing some numerical optimization on a scientific application. One thing I noticed is that, GCC will not recognize pow(a,6), and will call the library function pow, (although it recognizes pow(a,2)), which slows down the performance extremely. (Intel C++ Compiler, executable icc, will recognize it anyway.) So I replaced it by a*a*a*a*a*a. With GCC 4.5.1 and options "-O3 -lm -funroll-loops -msse4" it will do 5 mulsd:

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13

while if I write (a*a*a)*(a*a*a), it will have

movapd  %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm14, %xmm13
mulsd   %xmm13, %xmm13

which reduce the multiple to 3. icc has similar behavior.

Why cannot compilers recognize such an optimization trick?

Because Floating Point Math is not Associative. The way you group the operands in floating point multiplication has an effect on the numerical accuracy of the answer.

As a result, most compilers are very conservative about reordering floating point calculations unless they can be sure that the answer will stay the same, or unless you tell them you don't care about numerical accuracy. For example: the -ffast-math option of gcc.

Can a local variable's memory be accessed outside its scope?!

281 votes

Possible Duplicate:
Returning the address of local or temporary variable

I have the following code.

int * foo()
{
    int a = 5;
    return &a;
}

int main()
{
    int* p = foo();
    cout << *p;
    *p = 8;
    cout << *p;
}

And the code is just running with no runtime exceptions!

The output was 5 8

How can it be? Isn't the memory of a local variable inaccessible outside its function?

How can it be? Isn't the memory of a local variable inaccessible outside its function?

You rent a hotel room. You put a book in the top drawer of the bedside table and go to sleep. You check out the next morning, but "forget" to give back your key. You steal the key!

A week later, you return to the hotel, do not check in, sneak into your old room with your stolen key, and look in the drawer. Your book is still there. Astonishing!

How can that be? Isn't the contents of a hotel room drawer inaccessible if you haven't rented the room?

Well, obviously that scenario can happen in the real world no problem. There is no mysterious force that causes your book to disappear when you are no longer authorized to be in the room. Nor is there a mysterious force that prevents you from entering a room with a stolen key.

The hotel management is not required to remove your book. You didn't make a contract with them that said that if you leave stuff behind, they'll shred it for you. If you illegally re-enter your room with a stolen key to get it back, the hotel security staff is not required to catch you sneaking in. You didn't make a contract with them that said "if I try to sneak back into my room later, you are required to stop me." Rather, you signed a contract with them that said "I promise not to sneak back into my room later", a contract which you broke.

In this situation anything can happen. The book can be there -- you got lucky. Someone else's book can be there and yours could be in the hotel's furnace. Someone could be there right when you come in, tearing your book to pieces. The hotel could have removed the table and book entirely and replaced it with a wardrobe. The entire hotel could be just about to be torn down and replaced with a football stadium, and you are going to die in an explosion while you are sneaking around.

You don't know what is going to happen; when you checked out of the hotel and stole a key to illegaly use later, you gave up the right to live in a predictable, safe world because you chose to break the rules of the system.

C++ is not a safe language. It will cheerfully allow you to break the rules of the system. If you try to do something illegal and foolish like going back into a room you're not authorized to be in and rummaging through a desk that might not even be there anymore, C++ is not going to stop you. Safer languages than C++ solve this problem by restricting your power -- by having much stricter control over keys, for example.

UPDATE

Holy goodness, this answer is getting a lot of attention. (I'm not sure why -- I considered it to be just a "fun" little analogy, but whatever.)

I thought it might be germane to update this a bit with a few more technical thoughts.

Compilers are in the business of generating code which manages the storage of the data manipulated by that program. There are lots of different ways of generating code to manage memory, but over time two basic techniques have become entrenched.

The first is to have some sort of "long lived" storage area where the "lifetime" of each byte in the storage -- that is, the period of time when it is validly associated with some program variable -- cannot be easily predicted ahead of time. The compiler generates calls into a "heap manager" that knows how to dynamically allocate storage when it is needed and reclaim it when it is no longer needed.

The second is to have some sort of "short lived" storage area where the lifetime of each byte in the storage is well known, and, in particular, lifetimes of storages follow a "nesting" pattern. That is, the allocation of the longest-lived of the short-lived variables strictly overlaps the allocations of shorter-lived variables that come after it.

Local variables follow the latter pattern; when a method is entered, its local variables come alive. When that method calls another method, the new method's local variables come alive. They'll be dead before the first method's local variables are dead. The relative order of the beginnings and endings of lifetimes of storages associated with local variables can be worked out ahead of time.

For this reason, local variables are usually generated as storage on a "stack" data structure, because a stack has the property that the first thing pushed on it is going to be the last thing popped off.

It's like the hotel decides to only rent out rooms sequentially, and you can't check out until everyone with a room number higher than you has checked out.

So let's think about the stack. In many operating systems you get one stack per thread and the stack is allocated to be a certain fixed size. When you call a method, stuff is pushed onto the stack. If you then pass a pointer to the stack back out of your method, as the original poster does here, that's just a pointer to the middle of some entirely valid million-byte memory block. In our analogy, you check out of the hotel; when you do, you just checked out of the highest-numbered occupied room. If no one else checks in after you, and you go back to your room illegally, all your stuff is guaranteed to still be there in this particular hotel.

We use stacks for temporary stores because they are really cheap and easy. An implementation of C++ is not required to use a stack for storage of locals; it could use the heap. It doesn't, because that would make the program slower.

An implementation of C++ is not required to leave the garbage you left on the stack untouched so that you can come back for it later illegally; it is perfectly legal for the compiler to generate code that turns back to zero everything in the "room" that you just vacated. It doesn't because again, that would be expensive.

An implementation of C++ is not required to ensure that when the stack logically shrinks, the addresses that used to be valid are still mapped into memory. The implementation is allowed to tell the operating system "we're done using this page of stack now. Until I say otherwise, issue an exception that destroys the process if anyone touches the previously-valid stack page". Again, implementations do not actually do that because it is slow and unnecessary.

Instead, implementations let you make mistakes and get away with it. Most of the time. Until one day something truly awful goes wrong and the process explodes.

This is problematic. There are a lot of rules and it is very easy to break them accidentally. I certainly have many times. And worse, the problem often only surfaces when memory is detected to be corrupt billions of nanoseconds after the corruption happened, when it is very hard to figure out who messed it up.

More memory-safe languages solve this problem by restricting your power. In "normal" C# there simply is no way to take the address of a local and return it or store it for later. You can take the address of a local, but the language is cleverly designed so that it is impossible to use it after the lifetime of the local ends. In order to take the address of a local and pass it back, you have to put the compiler in a special "unsafe" mode, and put the word "unsafe" in your program, to call attention to the fact that you are probably doing something dangerous that could be breaking the rules.

For further reading:

  • What if C# did allow returning references? Coincidentally that is the subject of today's blog post:

http://blogs.msdn.com/b/ericlippert/archive/2011/06/23/ref-returns-and-ref-locals.aspx

  • Why do we use stacks to manage memory? Are value types in C# always stored on the stack? How does virtual memory work? And many more topics in how the C# memory manager works. Many of these articles are also germane to C++ programmers:

http://blogs.msdn.com/b/ericlippert/archive/tags/memory+management/

C++0x introduces a standardized memory model. What does it mean? And how is it going to affect C++ programming?

186 votes

As we know C++0x introduces a standardized memory model, but I don't know what does it exactly mean? And how is it going to affect C++ programming?

Herb Sutter says here that,

The memory model means that C++ code now has a standardized library to call regardless of who made the compiler and on what platform it's running. There's a standard way to control how different threads talk to the processor's memory.

"When you are talking about splitting [code] across different cores that's in the standard, we are talking about the memory model. We are going to optimize it without breaking the following assumptions people are going to make in the code," Sutter said.

Well, I can memorize this and similar paragraphs available online (as I've my own memory model since birth :P) and can even post as answer to questions asked by others, but to be honest, I don't exactly understand this.

So, what I basically want to know is, C++ programmers used to develop multi-threaded applications even before, so how does it matter if its POSIX threads, or Windows threads, or C++0x threads? What are the benefits? I want to understand few low-level details.

I also get this feeling that C++0x memory model is somehow related to C++0x multi-threading, as I often see these two together. If it is, how exactly? Why should they be related?

As I don't know how internals of multi-threading works, and what memory model means in general , so please help me understanding these concepts. :-)

First, you have to learn to think like a Language Lawyer.

The C++ specification does not make reference to any particular compiler, operating system, or CPU. It makes reference to an abstract machine that is a generalization of actual systems. In the Language Lawyer world, the job of the programmer is to write code for the abstract machine; the job of the compiler is to actualize that code on a concrete machine. By coding rigidly to the spec, you can be certain that your code will compile and run without modification on any system with a compliant C++ compiler, whether today or 50 years from now.

The abstract machine in the C++98/C++03 specification is fundamentally single-threaded. So it is not possible to write multi-threaded C++ code that is "fully portable" with respect to the spec. The spec does not even say anything about the atomicity of memory loads and stores or the order in which loads and stores might happen, never mind things like mutexes.

Of course, you can write multi-threaded code in practice for particular concrete systems -- like pthreads or Windows. But there is no standard way to write multi-threaded code for C++98/C++03.

The abstract machine in C++0x is multi-threaded by design. It also has a well-defined memory model; that is, it says what the compiler may and may not do when it comes to accessing memory.

Consider the following example, where a pair of global variables are accessed concurrently by two threads:

           Global
           int x, y;

Thread 1            Thread 2
x = 17;             cout << y << " ";
y = 37;             cout << x << endl;

What might Thread 2 output?

Under C++98/C++03, this is not even Undefined Behavior; the question itself is meaningless because the standard does not contemplate anything called a "thread".

Under C++0x, the result is Undefined Behavior, because loads and stores need not be atomic in general. Which may not seem like much of an improvement... And by itself, it's not.

But with C++0x, you can write this:

           Global
           atomic<int> x, y;

Thread 1                 Thread 2
x.store(17);             cout << y.load() << " ";
y.store(37);             cout << x.load() << endl;

Now things get much more interesting. First of all, the behavior here is defined. Thread 2 could now print 0 0 (if it runs before Thread 1), 37 17 (if it runs after Thread 1), or 0 17 (if it runs after Thread 1 assigns to x but before it assigns to y).

What it cannot print is 37 0, because the default mode for atomic loads/stores in C++0x is to enforce sequential consistency. This just means all loads and stores must be "as if" they happened in the order you wrote them within each thread, while operations among threads can be interleaved however the system likes. So the default behavior of atomics provides both atomicity and ordering for loads and stores.

Now, on a modern CPU, ensuring sequential consistency can be expensive. In particular, the compiler is likely to emit full-blown memory barriers between every access here. But if your algorithm can tolerate out-of-order loads and stores; i.e., if it requires atomicity but not ordering; i.e., if it can tolerate 37 0 as output from this program, then you can write this:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_relaxed);   cout << y.load(memory_order_relaxed) << " ";
y.store(37,memory_order_relaxed);   cout << x.load(memory_order_relaxed) << endl;

The more modern the CPU, the more likely this is to be faster than the previous example.

Finally, if you just need to keep particular loads and stores in order, you can write:

           Global
           atomic<int> x, y;

Thread 1                            Thread 2
x.store(17,memory_order_release);   cout << y.load(memory_order_acquire) << " ";
y.store(37,memory_order_release);   cout << x.load(memory_order_acquire) << endl;

This takes us back to the ordered loads and stores -- so 37 0 is no longer a possible output -- but it does so with minimal overhead. (In this trivial example, the result is the same as full-blown sequential consistency; in a larger program, it would not be.)

Of course, if the only outputs you want to see are 0 0 or 37 17, you can just wrap a mutex around the original code. But if you have read this far, I bet you already know how that works, and this answer is already longer than I intended :-).

So, bottom line. Mutexes are great, and C++0x standardizes them. But sometimes for performance reasons you want lower-level primitives (e.g., the classic double-checked locking pattern). The new standard provides high-level gadgets like mutexes and condition variables, and it also provides low-level gadgets like atomic types and the various flavors of memory barrier. So now you can write sophisticated, high-performance concurrent routines entirely within the language specified by the standard, and you can be certain your code will compile and run unchanged on both today's systems and tomorrow's.

Although to be frank, unless you are an expert and working on some serious low-level code, you should probably stick to mutexes and condition variables. That's what I intend to do.

For more on this stuff, see this blog post.

In C++, why should `new` be used as little as possible?

128 votes

I stumbled upon the following question on Stack Overflow. One of the first posters says:

Stop using new so much. I can't see any reason you used new anywhere you did. You can create objects by value in C++ and it's one of the huge advantages to using the language. You do not have to allocate everything on the heap. Stop thinking like a Java programmer

I'm not really sure what he means by that. Could anyone clarify why objects should be created by value in C++ as often as possible, and what difference it makes internally? (or if I misinterpreted the answer, feel free to clarify what was meant)

There are 2 widely used memory allocation techniques: automatic allocation and dynamic allocation. There are corresponding regions of memory for each: the stack and the heap.

Stack

The stack always allocates memory in a sequential fashion. It can do so because it requires you to release the memory in the reverse order (First-In, Last-Out: FILO). This is the memory allocation technique for local variables in many programming languages. It is very, very fast because it requires minimal book keeping and the next address to allocate is implicit.

In C++, this is also called automatic storage because the storage is claimed automatically at the end of scope. As soon as execution of current code block (delimited using {}) is completed, memory for all variables in that block is automatically collected. This is also the moment where destructors are invoked to clean up resources.

Heap

The heap allows for a more flexible memory allocation mode. Bookkeeping is more complex and allocation is slower. Because there is no implicit release point, you must release the memory manually (using delete or delete[], free in C, etc.). However, the absence of an implicit release point is the key to the heap's flexibility.

Reasons to use the heap

Even if using the heap is slower and potentially leads to memory leaks or memory fragmentation, there are perfectly good use cases for the heap, as it's less limited.

Two key reasons to use the heap:

  • You don't know how much memory you need at compile-time. For instance, when reading a text file into a string, you usually don't know what size the file has, so you can't decide how much memory to allocate until you run the program.

  • You want to allocate memory which will persist after leaving the current block. For instance, you may want to write a function string readfile(string path) that returns the contents of a file. In this case, even if the stack could hold the entire file contents, you could not return from a function and keep the allocated memory block.

Why using the heap is often unnecessary.

In C++, there's a neat construct called a destructor. This mechanism allows you to manage resources by aligning the lifetime of the resource with the lifetime of a variable. It is commonly used to "wrap" resources into another object. std::string is a perfect example. This snippet:

int main ( int argc, char* argv[] )
{
    std::string program(argv[0]);
}

actually allocates a variable amount of memory. The std::string object allocates memory using the heap and releases it in its destructor. In this case, you did not need to manually manage any resources and still get the benefits of the dynamic memory allocation.

In particular, it implies that in this snippet:

int main ( int argc, char* argv[] )
{
    std::string * program = new std::string(argv[0]);
    delete program;
}

there is unneeded dynamic memory allocation. The program requires more typing (!) and introduces the risk of forgetting to deallocate the memory. It does this with no apparent benefit.

Why you should use the stack as often as possible

Basically, the last paragraph sums it up. Using the stack as often as possible makes your programs:

  • faster to type;
  • faster when run;
  • less prone to memory/resource leaks.

Bonus points

In the referenced question, there are additional concerns. In particular, the following class:

class Line {
public:
    Line();
    ~Line();
    std::string* mString;
};

Line::Line() {
    mString = new std::string("XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX");
}

Line::~Line() {
   //mString->clear(); // should not be neccessary
    delete mString;
}

Is actually a lot more risky to use than the following one:

class Line {
public:
    Line();
    std::string mString;
};

Line::Line() {
    mString = "XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX";
    // note: there is a cleaner way to write this.
}

The reason is that std::string properly defines a copy constructor. Consider the following program:

int main ()
{
    Line l1;
    Line l2 = l1;
}

Using the original version, this program will likely crash, as it uses delete on the same string twice. Using the modified version, each Line instance will own it's own string instance, each with it's own memory and both will be released at the end of the program.

Other notes

Extensive use of RAII is considered a best practice in C++ because of all the reasons above. However, there is an additional benefit which is not immediately obvious. Basically, it's better than the sum of its parts. The whole mechanism composes. It scales.

If you use the Line class as a building block:

 class Table
 {
      Line borders[4];
 };

Then

 int main ()
 {
     Table table;
 }

allocates 4 std::string instances, 4 Line instances, 1 Table instance and all the strings contents and *everything is freed automagically.

parseInt(null, 24) === 23... wait, what?

115 votes

Alright, so I was messing around with parseInt to see how it handles values not yet initialized and I stumbled upon this gem. The below happens for any radix 24 or above.

parseInt(null, 24) === 23 // true

I tested it in IE, Chrome and Firefox and they all alert true, so I'm thinking it must be in the specification somewhere. A quick Google search didn't give me any results so here I am, hoping someone can explain.

I remember listening to a Crockford speech where he was saying typeof null === "object" because of an oversight causing Object and Null to have a near identical type identifier in memory or something along those lines, but I can't find that video now.

Try it: http://jsfiddle.net/robert/txjwP/

Edit Correction, a higher radix returns diffferent results, 32 returns 785077
Edit 2 From zzzzBov: [24...30]:23, 31:714695, 32:785077, 33:859935, 34:939407, 35:1023631, 36:1112745


tl;dr

Explain why parseInt( null, 24 ) === 23 is a true statement.

It's converting null to the string "null" and trying to convert it. For radixes 0 through 23, there are no numerals it can convert, so it returns NaN. At 24, "n", the 14th letter, is added to the numeral system. At 31, "u", the 21st letter, is added and the entire string can be decoded. At 37 on there is no longer any valid numeral set that can be generated and NaN is returned.

js> parseInt(null, 36)
1112745

>>> reduce(lambda x, y: x * 36 + y, [(string.digits + string.lowercase).index(x) for x in 'null'])
1112745

Curious null-coalescing operator custom implicit conversion behaviour

83 votes

This question arose when writing my answer to this one, which talks about the associativity of the null-coalescing operator.

Just as a reminder, the idea of the null-coalescing operator is that an expression of the form

x ?? y

first evaluates x, then:

  • If the value of x is null, y is evaluated and that is the end result of the expression
  • If the value of x is non-null, y is not evaluated, and the value of x is the end result of the expression, after a conversion to the compile-time type of y if necessary

Now usually there's no need for a conversion, or it's just from a nullable type to a non-nullable one - usually the types are the same, or just from (say) int? to int. However, you can create your own implicit conversion operators, and those are used where necessary.

For the simple case of x ?? y, I haven't seen any odd behaviour. However, with (x ?? y) ?? z I see some confusing behaviour.

Here's a short but complete test program - the results are in the comments:

using System;

public struct A
{
    public static implicit operator B(A input)
    {
        Console.WriteLine("A to B");
        return new B();
    }

    public static implicit operator C(A input)
    {
        Console.WriteLine("A to C");
        return new C();
    }
}

public struct B
{
    public static implicit operator C(B input)
    {
        Console.WriteLine("B to C");
        return new C();
    }
}

public struct C {}

class Test
{
    static void Main()
    {
        A? x = new A();
        B? y = new B();
        C? z = new C();
        C zNotNull = new C();

        Console.WriteLine("First case");
        // This prints
        // A to B
        // A to B
        // B to C
        C? first = (x ?? y) ?? z;

        Console.WriteLine("Second case");
        // This prints
        // A to B
        // B to C
        var tmp = x ?? y;
        C? second = tmp ?? z;

        Console.WriteLine("Third case");
        // This prints
        // A to B
        // B to C
        C? third = (x ?? y) ?? zNotNull;
    }
}

So we have three custom value types, A, B and C, with conversions from A to B, A to C, and B to C.

I can understand both the second case and the third case... but why is there an extra A to B conversion in the first case? In particular, I'd really have expected the first case and second case to be the same thing - it's just extracting an expression into a local variable, after all.

Any takers on what's going on? I'm extremely hesistant to cry "bug" when it comes to the C# compiler, but I'm stumped as to what's going on...

EDIT: Okay, here's a nastier example of what's going on, thanks to configurator's answer, which gives me further reason to think it's a bug. EDIT: The sample doesn't even need two null-coalescing operators now...

using System;

public struct A
{
    public static implicit operator int(A input)
    {
        Console.WriteLine("A to int");
        return 10;
    }
}

class Test
{
    static A? Foo()
    {
        Console.WriteLine("Foo() called");
        return new A();
    }

    static void Main()
    {
        int? y = 10;

        int? result = Foo() ?? y;
    }
}

The output of this is:

Foo() called
Foo() called
A to int

The fact that Foo() gets called twice here is hugely surprising to me - I can't see any reason for the expression to be evaluated twice.

Thanks to everyone who contributed to analyzing this issue. It is clearly a compiler bug. It appears to only happen when there is a lifted conversion involving two nullable types on the left-hand side of the coalescing operator.

I have not yet identified where precisely things go wrong, but at some point during the "nullable lowering" phase of compilation -- after initial analysis but before code generation -- we reduce the expression

result = Foo() ?? y;

from the example above to the moral equivalent of:

A? temp = Foo();
result = temp.HasValue ? 
    new int?(A.op_implicit(Foo().Value)) : 
    y;

Clearly that is incorrect; the correct lowering is

result = temp.HasValue ? 
    new int?(A.op_implicit(temp.Value)) : 
    y;

My best guess based on my analysis so far is that the nullable optimizer is going off the rails here. We have a nullable optimizer that looks for situations where we know that a particular expression of nullable type cannot possibly be null. Consider the following naive analysis: we might first say that

result = Foo() ?? y;

is the same as

A? temp = Foo();
result = temp.HasValue ? 
    (int?) temp : 
    y;

and then we might say that

conversionResult = (int?) temp 

is the same as

A? temp2 = temp;
conversionResult = temp2.HasValue ? 
    new int?(op_Implicit(temp2.Value)) : 
    (int?) null

But the optimizer can step in and say "whoa, wait a minute, we already checked that temp is not null; there's no need to check it for null a second time just because we are calling a lifted conversion operator". We'd them optimize it away to just

new int?(op_Implicit(temp2.Value)) 

My guess is that we are somewhere caching the fact that the optimized form of (int?)Foo() is new int?(op_implicit(Foo().Value)) but that is not actually the optimized form we want; we want the optimized form of Foo()-replaced-with-temporary-and-then-converted.

Many bugs in the C# compiler are a result of bad caching decisions. A word to the wise: every time you cache a fact for use later, you are potentially creating an inconsistency should something relevant change. In this case the relevant thing that has changed post initial analysis is that the call to Foo() should always be realized as a fetch of a temporary.

We did a lot of reorganization of the nullable rewriting pass in C# 3.0. The bug reproduces in C# 3.0 and 4.0 but not in C# 2.0, which means that the bug was probably my bad. Sorry!

I'll get a bug entered into the database and we'll see if we can get this fixed up for a future version of the language. Thanks again everyone for your analysis; it was very helpful!

Is multiplication and division using shift operators in C actually faster?

74 votes

Multiplication and division can be achieved using bit operators, for example

i*2 = i<<1
i*3 = (i<<1) + i;
i*10 = (i<<3) + (i<<1)

and so on.

Is it actually faster to use say (i<<3)+(i<<1) to multiply with 10 than using i*10 directly? Is there any sort of input that can't be multiplied or divided in this way?

Short answer: Not likely.

Long answer: Your compiler has an optimizer in it that knows how to multiply as quickly as your target processor architecture is capable. Your best bet is to tell the compiler your intent clearly (i.e. i*2 rather than i << 1) and let it decide what the fastest assembly/machine code sequence is. It's even possible that the processor itself has implemented the multiply instruction as a sequence of shifts & adds in microcode.

Bottom line--don't spend a lot of time worrying about this. If you mean to shift, shift. If you mean to multiply, multiply. Do what is semantically clearest--your coworkers will thank you later. Or, more likely, curse you later if you do otherwise.

Protecting executable from reverse engineering?

68 votes

I've been contemplating how to protect my C/C++ code from disassembly and reverse engineering. Normally I would never condone this behavior myself in my code; however the current protocol I've been working on must not ever be inspected or understandable, for the security of various people.

Now this is a new subject to me, and the internet is not really resourceful for prevention against reverse engineering but rather depicts tons of information on how to reverse engineer

Some of the things I've thought of so far are:

  • Code injection (calling dummy functions before and after actual function calls)
  • Code obfustication (mangles the disassembly of the binary)
  • Write my own startup routines (harder for debuggers to bind to)

    void startup();  
    int _start()   
    {  
        startup( );  
        exit   (0)   
    }  
    void startup()  
    {  
        /* code here */  
    }
    
  • Runtime check for debuggers (and force exit if detected)

  • Function trampolines

     void trampoline(void (*fnptr)(), bool ping = false)  
     {  
       if(ping)  
         fnptr();  
       else  
         trampoline(fnptr, true);  
     }
    
  • Pointless allocations and deallocations (stack changes a lot)

  • Pointless dummy calls and trampolines (tons of jumping in disassembly output)
  • Tons of casting (for obfuscated disassembly)

I mean these are some of the things I've thought of but they can all be worked around and or figured out by code analysts given the right time frame. Is there anything else alternative I have?

What Amber said is exactly right. You can make reverse engineering harder, but you can never prevent it. You should never trust "security" that relies on the prevention of reverse engineering.

That said, the best anti-reverse-engineering techniques that I've seen focused not on obfuscating the code, but instead on breaking the tools that people usually use to understand how code works. Finding creative ways to break disassemblers, debuggers, etc is both likely to be more effective and also more intellectually satisfying than just generating reams of horrible spaghetti code. This does nothing to block a determined attacker, but it does increase the likelihood that J Random Cracker will wander off and work on something easier instead.

gcc compile error with >2GB of code

61 votes

I have a huge number of functions totaling around 2.8 GB of object code (unfortunately there's no way around, scientific computing ...)

When I try to link them, I get (expected) relocation truncated to fit: R_X86_64_32S errors, that I hoped to circumvent by specifing the compiler flag -mcmodel=medium. All libraries that are linked in addition that I have control of are compiled with the -fpic flag.

Still, the error persists and I assume that some libraries I link to are not compiled with PIC.

Here's the error:

/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x12): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_fini'     defined in .text section in /usr/lib64/libc_nonshared.a(elf-init.oS)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x19): relocation truncated to fit: R_X86_64_32S against symbol `__libc_csu_init'    defined in .text section in /usr/lib64/libc_nonshared.a(elf-init.oS)
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crt1.o: In function `_start':
(.text+0x20): undefined reference to `main'
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/../../../../lib64/crti.o: In function    `call_gmon_start':
(.text+0x7): relocation truncated to fit: R_X86_64_GOTPCREL against undefined symbol      `__gmon_start__'
/usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtbegin.o: In function `__do_global_dtors_aux':
crtstuff.c:(.text+0xb): relocation truncated to fit: R_X86_64_PC32 against `.bss' 
crtstuff.c:(.text+0x13): relocation truncated to fit: R_X86_64_32 against symbol `__DTOR_END__' defined in .dtors section in /usr/lib/gcc/x86_64-redhat-linux/4.1.2/crtend.o
crtstuff.c:(.text+0x19): relocation truncated to fit: R_X86_64_32S against `.dtors'
crtstuff.c:(.text+0x28): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x38): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x3f): relocation truncated to fit: R_X86_64_32S against `.dtors'
crtstuff.c:(.text+0x46): relocation truncated to fit: R_X86_64_PC32 against `.bss'
crtstuff.c:(.text+0x51): additional relocation overflows omitted from the output
collect2: ld returned 1 exit status
make: *** [testsme] Error 1

and those are system libraries I link against: -lgfortran -lm -lrt -lpthread

Any clues where to look for the problem??

EDIT: First of all, thank you for the discussion ... To clarify a bit, I have hundreds of functions (each approx 1~MB in size in separate object files) like this:

double func1(std::tr1::unordered_map<int, double> & csc, 
             std::vector<EvaluationNode::Ptr> & ti, 
             ProcessVars & s)
{
    double sum, prefactor, expr;

    prefactor = +s.ds8*s.ds10*ti[0]->value();
    expr =       ( - 5/243.*(s.x14*s.x15*csc[49300] + 9/10.*s.x14*s.x15*csc[49301] +
           1/10.*s.x14*s.x15*csc[49302] - 3/5.*s.x14*s.x15*csc[49303] -
           27/10.*s.x14*s.x15*csc[49304] + 12/5.*s.x14*s.x15*csc[49305] -
           3/10.*s.x14*s.x15*csc[49306] - 4/5.*s.x14*s.x15*csc[49307] +
           21/10.*s.x14*s.x15*csc[49308] + 1/10.*s.x14*s.x15*csc[49309] -
           s.x14*s.x15*csc[51370] - 9/10.*s.x14*s.x15*csc[51371] -
           1/10.*s.x14*s.x15*csc[51372] + 3/5.*s.x14*s.x15*csc[51373] +
           27/10.*s.x14*s.x15*csc[51374] - 12/5.*s.x14*s.x15*csc[51375] +
           3/10.*s.x14*s.x15*csc[51376] + 4/5.*s.x14*s.x15*csc[51377] -
           21/10.*s.x14*s.x15*csc[51378] - 1/10.*s.x14*s.x15*csc[51379] -
           2*s.x14*s.x15*csc[55100] - 9/5.*s.x14*s.x15*csc[55101] -
           1/5.*s.x14*s.x15*csc[55102] + 6/5.*s.x14*s.x15*csc[55103] +
           27/5.*s.x14*s.x15*csc[55104] - 24/5.*s.x14*s.x15*csc[55105] +
           3/5.*s.x14*s.x15*csc[55106] + 8/5.*s.x14*s.x15*csc[55107] -
           21/5.*s.x14*s.x15*csc[55108] - 1/5.*s.x14*s.x15*csc[55109] -
           2*s.x14*s.x15*csc[55170] - 9/5.*s.x14*s.x15*csc[55171] -
           1/5.*s.x14*s.x15*csc[55172] + 6/5.*s.x14*s.x15*csc[55173] +
           27/5.*s.x14*s.x15*csc[55174] - 24/5.*s.x14*s.x15*csc[55175] +
           // ...
           ;

        sum += prefactor*expr;
    // ...
    return sum;
}

The object s is relatively small and keeps the needed constants x14, x15, ..., ds0, ..., etc while ti just returns a double from an external library. As you can see, csc[] is a precomputed map of values which is also evaluated in separate object files (again hundreds with about ~1 MB of size each) of the following form:

void cscs132(std::tr1::unordered_map<int,double> & csc, ProcessVars & s)
{
    {
    double csc19295 =       + s.ds0*s.ds1*s.ds2 * ( -
           32*s.x12pow2*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x15*s.x35*s.x45*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12pow2*s.x25*s.x35*s.x45*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.mbpow4*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.x35*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x34*s.x45*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35*s.mbpow4*s.mWpowinv2 +
           32*s.x12pow2*s.x35pow2*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35pow2*s.x45*s.mWpowinv2 +
           64*s.x12pow2*s.x35*s.x45*s.mbpow2*s.mWpowinv2 +
           32*s.x12pow2*s.x35*s.x45pow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.mbpow4*s.mWpowinv2 +
           64*s.x12*s.p1p3*s.x15pow2*s.mbpow2*s.mWpowinv2 +
           96*s.x12*s.p1p3*s.x15*s.x25*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.p1p3*s.x15*s.x45*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.mbpow4*s.mWpowinv2 +
           32*s.x12*s.p1p3*s.x25pow2*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.x35*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x25*s.x45*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.p1p3*s.x45*s.mbpow2 +
           64*s.x12*s.x14*s.x15pow2*s.x35*s.mWpowinv2 +
           96*s.x12*s.x14*s.x15*s.x25*s.x35*s.mWpowinv2 +
           32*s.x12*s.x14*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
           64*s.x12*s.x14*s.x15*s.x35pow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x15*s.x35*s.x45*s.mWpowinv2 +
           32*s.x12*s.x14*s.x25pow2*s.x35*s.mWpowinv2 +
           32*s.x12*s.x14*s.x25*s.x34*s.mbpow2*s.mWpowinv2 -
           32*s.x12*s.x14*s.x25*s.x35pow2*s.mWpowinv2 -
           // ...

       csc.insert(cscMap::value_type(192953, csc19295));
    }

    {
       double csc19296 =      // ... ;

       csc.insert(cscMap::value_type(192956, csc19296));
    }

    // ...
}

and that's about it. The final step then just consists in calling all those func[i] and summing the result up.

Concerning the fact that this is a rather special and unusual case: Yes it is. This is what people have to cope with when trying to do high precision computations for particle physics.

EDIT2: I should also add that x12, x13, etc are not really constants. They are set to specific values, all those functions are run and the result returned, and then a new set of x12, x13, etc is chosen to produce the next value. And this has to be done 10^5 to 10^6 times...

EDIT3: Thank you for the suggestions and the discussion so far... I'll try to roll the loops up upon code generation somehow, not sure how to this exactly, to be honest, but this is the best bet.

Btw, I didn't try to hide behind "this is scientific computing -- no way to optimize". It's just that the basis for this code is something that comes out of a "black box" where I have no real access to and, moreover, the whole thing worked great with simple examples and I mainly feel overwhelmed with what happens in a real world application ...

EDIT4: So, I have managed to reduce the code size of the csc definitions by about one forth by simplifying expressions in a computer algebra system (Mathematica). I see now also some way to reduce it by another order of magnitude or so by applying some other tricks before generating the code (which would bring this part down to about 100MB) and I hope this idea works.

Now related to your answers: I'm trying to roll the loops back up again in the funcs, where a CAS won't help much, but I have already some ideas. For instance, sorting the expressions by the variables like x12, x13,..., parse the cscs with Python and generate tables that relate them to each other. Then I can at least generate these parts as loops. As this seems to be the best solution so far, I mark this as the best answer.

However, I'd like to also give credit to VJo. GCC 4.6 indeed works much better, produces smaller code and is faster. Using the large model works at the code as-is. So technically this is the correct answer, but changing the whole concept is a much better approach.

Thank you all for your suggestions and help. If anyone is interested, I'm going to post the final outcome as soon as I am ready.

REMARKS: Just some remarks to some other answers: The code I'm trying to run does not originate in an expansion of simple functions/algorithms and stupid unnecessary unrolling. What actually happens is that the stuff we start with are pretty complicated mathematical objects and bringing them to a numerically computable form generates these expressions. The problem lies actually in the underlying physical theory. Complexity of intermediate expressions scales factorially, which is well known, but when combining all of this stuff to something physically measureable -- an observable -- it just boils down to only a handful of very small functions that form the basis of the expressions. (There is definitely something "wrong" in this respect with the general and only available ansatz which is called "perturbation theory") We try to bring this ansatz to another level, which is not feasible analytically anymore and where the basis of needed functions is not known. So we try to brute-force it like this. Not the best way, but hopefully one that helps with our understanding of the physics at hand in the end...

LAST EDIT: Thanks to all your suggestions, I've managed to reduce the code size considerably, using Mathematica and a modification of the code generator for the funcs somewhat along the lines of the top answer :)

I have simplified the csc functions with Mathematica, bringing it down to 92MB. This is the irreducible part. The first attempts took forever, but after some optimizations this now runs through in about 10mins on a single CPU.

The effect on the funcs was dramatic: The whole code size for them is down to approximately 9 MB, so the code now totals in the 100 MB range. Now it makes sense to turn optimizations on and the execution is quite fast.

Again, thank you all for your suggestions, I've learned a lot.

So, you already have a program that produces this text:

prefactor = +s.ds8*s.ds10*ti[0]->value();
expr = ( - 5/243.*(s.x14*s.x15*csc[49300] + 9/10.*s.x14*s.x15*csc[49301] +
       1/10.*s.x14*s.x15*csc[49302] - 3/5.*s.x14*s.x15*csc[49303] -...

and

double csc19295 =       + s.ds0*s.ds1*s.ds2 * ( -
       32*s.x12pow2*s.x15*s.x34*s.mbpow2*s.mWpowinv2 -
       32*s.x12pow2*s.x15*s.x35*s.mbpow2*s.mWpowinv2 -
       32*s.x12pow2*s.x15*s.x35*s.x45*s.mWpowinv2 -...

right?

If all your functions have a similar "format" (multiply n numbers m times and add the results - or something similar) then I think you can do this:

  • change the generator program to output offsets instead of strings (i.e. instead of the string "s.ds0" it will produce offsetof(ProcessVars, ds0)
  • create an array of such offsets
  • write an evaluator which accepts the array above and the base addresses of the structure pointers and produces an result

The array+evaluator will represent the same logic as one of your functions, but only the evaluator will be code. The array is "data" and can be either generated at runtime or saved on disk and read i chunks or with a memory mapped file.

For your particular example in func1 imagine how you would rewrite the function via an evaluator if you had access to the base address of s and csc and also a vector like representation of the constants and the offsets you need to add to the base addresses to get to x14, ds8 and csc[51370]

You need to create a new form of "data" that will describe how to process the actual data you pass to your huge number of functions.

Do browsers send "\r\n" or "\n" or does it depend on the browser?

60 votes

This question has bothered me for a million years... whenever I create a website with a textarea that allows multi-line (such as a "Bio" for a user's profile) I always end up writing the following paranoid code:

// C# code sample...
bio = bio.Replace("\r\n", "\n").Replace("\r", "\n");
bio = Regex.Replace(@"\n{2,}", "\n\n");

So, what do browsers send up for a <textarea name="Bio"></textarea> if it has multiple lines?

The HTTP and MIME specs specify that header lines must end with \r\n, but they aren't clear (some would argue that it isn't clear if they are clear) about what to do with the contents of a TEXTAREA. (See, for instance, this thread from an HTML working group about the issue.)

Here's a quote from the HTTP/1.1 spec about message headers:

The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.

I think that is a good strategy in general: be strict about what you produce but liberal in what you accept. You should assume that you will receive all sorts of line terminators. (Note that in addition to CRLF and LF, Mac OS-9 used CR alone, and there are still a few of those around. The Unicode standard (section 5.8) specifies a wide range of character sequences that should be recognized as line terminators; there's a list of them here.)

Why doesn't C# support the return of references?

60 votes

I have read that .NET supports return of references, but C# doesn't. Is there a special reason?

UPDATE: This question was the subject of my blog on June 23rd 2011. There is a lot of good additional conversation on this topic there in the comments. Thanks for the great question!


You are correct; .NET does support methods that return managed references to variables. .NET also supports local variables that contain managed references to other variables. (Note however that .NET does not support fields or arrays that contain managed references to other variables because that overly complicates the garbage collection story. Also the "managed reference to variable" types are not convertible to object, and therefore may not be used as type arguments to generic types or methods.)

Commenter "RPM1984" for some reason asked for a citation for this fact. RPM1984 I encourage you to read the CLI specification Partition I Section 8.2.1.1, "Managed pointers and related types" for information about this feature of .NET.

It is entirely possible to create a version of C# which supports both these features. You could then do things like

static ref int Max(ref int x, ref int y) 
{ 
  if (x > y) 
    return ref x; 
  else 
    return ref y; 
} 

and then call it with

int a = 123;
int b = 456; 
ref int c = ref Max(ref a, ref b); 
c += 100;
Console.WriteLine(b); // 556!

I know empirically that it is possible to build a version of C# that supports these features because I have done so. Advanced programmers, particularly people porting unmanaged C++ code, often ask us for more C++-like ability to do things with references without having to get out the big hammer of actually using pointers and pinning memory all over the place. By using managed references you get these benefits without paying the cost of screwing up your garbage collection performance.

We have considered this feature, and actually implemented enough of it to show to other internal teams to get their feedback. However at this time based on our research we believe that the feature does not have broad enough appeal or compelling usage cases to make it into a real supported language feature. We have other higher priorities and a limited amount of time and effort available, so we're not going to do this feature any time soon.

Also, doing it properly would require some changes to the CLR. Right now the CLR treats ref-returning methods as legal but unverifiable because we do not have a detector that detects this situation:

ref int M1(ref int x)
{
    return ref x;
}

ref int M2()
{
    int y = 123;
    return ref M1(ref y); // Trouble!
}

int M3()
{
    ref int z = ref M2();
    return z;
}

M3 returns the contents of M2's local variable, but the lifetime of that variable has ended! It is possible to write a detector that determines uses of ref-returns that clearly do not violate stack safety. What we would do is write such a detector, and if the detector could not prove stack safety, then we would not allow the usage of ref returns in that part of the program. It is not a huge amount of dev work to do so, but it is a lot of burden on the testing teams to make sure that we've really got all the cases. It's just another thing that increases the cost of the feature to the point where right now the benefits do not outweigh the costs.

If you can describe for me why it is you want this feature, I would really appreciate that. The more information we have from real customers about why they want it, the more likely it will make it into the product someday. It's a cute little feature and I'd like to be able to get it to customers somehow if there is sufficient interest.

(See also related questions Is it Possible to Return a Reference to a Variable in C#? and Can I use a reference inside a C# function like C++?)

How can I reliably get the address of an object?

57 votes

Consider the following program:

struct ghost
{
    // ghosts like to pretend that they don't exist
    ghost* operator&() const volatile { return 0; }
};

int main()
{
    ghost clyde;
    ghost* clydes_address = &clyde; // darn; that's not clyde's address :'( 
}

How do I get clyde's address?

I'm looking for a solution that will work equally well for all types of objects. A C++03 solution would be nice, but I'm interested in C++0x solutions too. If possible, let's avoid any implementation-specific behavior.

I am aware of C++0x's std::addressof function template, but am not interested in using it here: I'd like to understand how a Standard Library implementer might implement this function template.

Let us first copy the code from Boost, minus the compiler work around bits:

template<class T>
struct addr_impl_ref
{
  T & v_;

  inline addr_impl_ref( T & v ): v_( v ) {}
  inline operator T& () const { return v_; }

private:
  addr_impl_ref & operator=(const addr_impl_ref &);
};

template<class T>
struct addressof_impl
{
  static inline T * f( T & v, long ) {
    return reinterpret_cast<T*>(
        &const_cast<char&>(reinterpret_cast<const volatile char &>(v)));
  }

  static inline T * f( T * v, int ) { return v; }
};

template<class T>
T * addressof( T & v ) {
  return addressof_impl<T>::f( addr_impl_ref<T>( v ), 0 );
}

What happens if we pass a reference to function ?

Note: addressof cannot be used with a pointer to function

In C++ if void func(); is declared, then func is a reference to a function taking no argument and return no result. This reference to a function can be trivially converted into a pointer to function -- from @Konstantin: According to 13.3.3.2 both T & and T * are indistinguishable for functions. The 1st one is an Identity conversion and the 2nd one is Function-to-Pointer conversion both having "Exact Match" rank (13.3.3.1.1 table 9).

The reference to function pass through addr_impl_ref, there is an ambiguity in the overload resolution for the choice of f, which is solved thanks to the dummy argument 0, which is an int first and could be promoted to a long (Integral Promotion).

Thus we simply returns the pointer.

What happens if we pass a type with a conversion operator ?

If the conversion operator yields a T* then we have an ambiguity: for f(T&,long) an Integral Promotion is required for the second argument while for f(T*,int) the conversion operator is called on the first (thanks to @litb)

That's when addr_impl_ref kicks in. The C++ Standard mandates that a conversion sequence may contain at most one user-defined conversion. By wrapping the type in addr_impl_ref and forcing the use of a conversion sequence already, we "disable" any conversion operator that the type comes with.

Thus the f(T&,long) overload is selected (and the Integral Promotion performed).

What happens for any other type ?

Thus the f(T&,long) overload is selected, because there the type does not match the T* parameter.

Note: from the remarks in the file regarding Borland compatibility, arrays do not decay to pointers, but are passed by reference.

What happens in this overload ?

We want to avoid applying operator& to the type, as it may have been overloaded.

The Standard guarantees that reinterpret_cast may be used for this work (see @Matteo Italia's answer: 5.2.10/10).

Boost adds some niceties with const and volatile qualifiers to avoid compiler warnings (and properly use a const_cast to remove them).

  • Cast T& to char const volatile&
  • Stip the const and volatile
  • Apply the & operator to take the address
  • Cast back to a T*

The const/volatile juggling is a bit of black magic, but it does simplify the work (rather than providing 4 overloads). Note that since T is unqualified, if we pass a ghost const&, then T* is ghost const*, thus the qualifiers have not really been lost.

EDIT: the pointer overload is used for pointer to functions, I amended the above explanation somewhat. I still do not understand why it is necessary though.

The following ideone output sums this up, somewhat.

Weird C++ Syntax

52 votes

I came across this weird C++ program.

#include <iostream>
using namespace std;
int main()
{
  int a = ({int x; cin >> x; x;});
  cout << a;
}

Can anyone explain what is going on? What is this construct called?

Assigns user input value to a and prints it out. it is done by using a Statement Expression.

Statement Expressions:
A compound statement is a sequence of statements enclosed by braces. In GNU C, a compound statement inside parentheses may appear as an expression in what is called a Statement expression.

         .--------------.
         V              |
>>-(--{----statement--;-+--}--)--------------------------------><

The value of a statement expression is the value of the last simple expression to appear in the entire construct. If the last statement is not an expression, then the construct is of type void and has no value.

Note: This excerpt is taken from IBM XL C/C++ v7.0 doccumentation.

This construct is an orthogonal extension of GNU C and is supported only in IBM C++.

Javascript Shorthand for getElementById

49 votes

Is there any shorthand for the JavaScript document.getElementById? Or is there any way I can define one? It gets repetitive retyping that over and over.

var $ = function( id ) { return document.getElementById( id ); };

$( 'someID' )

Here I used $, but you can use any valid variable name.

var byId = function( id ) { return document.getElementById( id ); };

byId( 'someID' )

Evil code confusion, how does it even compile?

47 votes

I stumbled upon this code:

static void Main()
{
    typeof(string).GetField("Empty").SetValue(null, "evil");//from DailyWTF

    Console.WriteLine(String.Empty);//check

    //how does it behave?
    if ("evil" == String.Empty) Console.WriteLine("equal"); 

    //output: 
    //evil 
    //equal

 }

and I wonder how is it even possible to compile this piece of code. My reasoning is:

According to MSDN String.Empty is read-only therefore changing it should be impossible and compiling should end with "A static readonly field cannot be assigned to" or similar error.

I thought Base Class Library assemblies are somehow protected and signed and whatnot to prevent exactly this kind of attack. Next time someone may change System.Security.Cryptography or another critical class.

I thought Base Class Library assemblies are compiled by NGEN after .NET installation therefore changing fields of String class should require advanced hacking and be much harder.

And yet this code compiles and works. Can somebody please explain what is wrong with my reasoning?

A static readonly field cannot be assigned to

You're not assigning to it. You're calling public functions in the System.Reflection namespace. No reason for the compiler to complain about that.

Besides, typeof(string).GetField("Empty") could use variables entered in by the user instead, there's no sure way for the compiler to tell in all cases whether the argument to GetField will end up being "Empty".

I think you're wanting Reflection to see that the field is marked initonly and throw an error at runtime. I can see why you would expect that, yet for white-box testing, even writing to initonly fields has some application.

The reason NGEN has no effect is that you're not modifying any code here, only data. Data is stored in memory with .NET just as with any other language. Native programs may use readonly memory sections for things like string constants, but the pointer to the string is generally still writable and that is what is happening here.

Note that your code must be running with full-trust to use reflection in this questionable way. Also, the change only affect one program, this isn't any sort of a security vulnerability as you seem to think (if you're running malicious code inside your process with full trust, that design decision is the security problem, not reflection).

How to avoid "too many parameters" problem in API design?

46 votes

I have this API function:

public ResultEnum DoSomeAction(string a, string b, DateTime c, OtherEnum d, 
     string e, string f, out Guid code)

I don't like it. Because parameter order becomes unnecessarily significant. It becomes harder to add new fields. It's harder to see what's being passed around. It's harder to refactor method into smaller parts because it creates another overhead of passing all the parameters in sub functions. Code is harder to read.

I came up with the most obvious idea: have an object encapsulating the data and pass it around instead of passing each parameter one by one. Here is what I came up with:

public class DoSomeActionParameters
{
    public string A;
    public string B;
    public DateTime C;
    public OtherEnum D;
    public string E;
    public string F;        
}

That reduced my API declaration to:

public ResultEnum DoSomeAction(DoSomeActionParameters parameters, out Guid code)

Nice. Looks very innocent but we actually introduced a huge change: we introduced mutability. Because what we previously had been doing was actually to pass an anonymous immutable object: function parameters on stack. Now we created a new class which is very mutable. We created the ability to manipulate the state of the caller. That sucks. Now I want my object immutable, what do I do?

public class DoSomeActionParameters
{
    public string A { get; private set; }
    public string B { get; private set; }
    public DateTime C { get; private set; }
    public OtherEnum D { get; private set; }
    public string E { get; private set; }
    public string F { get; private set; }        

    public DoSomeActionParameters(string a, string b, DateTime c, OtherEnum d, 
     string e, string f)
    {
        this.A = a;
        this.B = b;
        // ... tears erased the text here
    }
}

As you can see I actually re-created my original problem: too many parameters. It's obvious that that's not the way to go. What am I going to do? The last option to achieve such immutability is to use a "readonly" struct like this:

public struct DoSomeActionParameters
{
    public readonly string A;
    public readonly string B;
    public readonly DateTime C;
    public readonly OtherEnum D;
    public readonly string E;
    public readonly string F;        
}

That allows us to avoid constructors with too many parameters and achieve immutability. Actually it fixes all the problems (parameter ordering etc). Yet:

That's when I got confused and decided to write this question: What's the most straightforward way in C# to avoid "too many parameters" problem without introducing mutability? Is it possible to use a readonly struct for that purpose and yet not have a bad API design?

CLARIFICATIONS:

  • Please assume there is no violation of single responsibiltiy principle. In my original case the function just writes given parameters to a single DB record.
  • I'm not seeking a specific solution to the given function. I'm seeking a generalized approach to such problems. I'm specifically interested in solving "too many parameters" problem without introducing mutability or a terrible design.

UPDATE

The answers provided here have different advantages/disadvantages. Therefore I'd like to convert this to a community wiki. I think each answer with code sample and Pros/Cons would make a good guide for similar problems in the future. I'm now trying to find out how to do it.

You could use a combination of builder and domain-specific-language style API or Fluent Interface. The API is a little more verbose but with intellisense it's very quick to type out and easy to understand.

public class Param
{
    public string A { get; private set; }
    public string B { get; private set; }
    public string C { get; private set; }

    public class Builder
    {
        private string a;
        private string b;
        private string c;

        public Builder WithA(string value)
        {
            a = value;
            return this;
        }

        public Builder WithB(string value)
        {
            b = value;
            return this;
        }

        public Builder WithC(string value)
        {
            c = value;
            return this;
        }

        public Param Build()
        {
            return new Param { A = a, B = b, C = c };
        }
    }


    DoSomeAction(new Param.Builder()
        .WithA("a")
        .WithB("b")
        .WithC("c")
        .Build());

Should css class names like 'floatleft' that directly describe the attached style be avoided?

40 votes

Lots of websites use class names like floatleft, clearfloat, alignright, small, center etc that describe the style that is attached to the class. This seems to make sense so when writing new content you can easily wrap (for example) <div class="clearfloat">...</div> around your element to make it behave the way you want.

My question is, doesn't this style of naming classes go against the idea of separating content from presentation? Putting class="floatleft" on an element is clearly putting presentation information into the HTML document.

Should class names like this that directly describe the attached style be avoided, and if so what alternative is there?


To clarify, this isn't just a question of what to name classes. For example a semantically accurate document might look something like:

<div class="foo">Some info about foo</div>
...
<div class="bar">Info about unrelated topic bar</div>
...
<div class="foobar">Another unrelated topic</div>

Say all these divs need to clear floats, the css would look something like:

div.foo, div.bar, div.foobar {
    clear:both;
}

This starts to get ugly as the number of these clearing elements increases - whereas a single class="clearfloat" would serve the same purpose. Is it recommended to group elements based on the attached styles to avoid repetition in the CSS, even if this means presentational information creeps into the HTML?


Update: Thanks for all the answers. The general consensus seems to be to avoid these class names in favour of semantic names, or at least use them sparingly provided they don't hinder maintenance. I think the important thing is that changes in the layout should not require excessive changes to the markup (although a few people said minor changes are okay if it makes overall maintenance easier). Thanks to those who suggested other methods to keep CSS code smaller as well.

It's great until you re-design, and narrow is highlighted yellow, center converts better left-justified, and the image you called floatleft now belongs on the right.

I'll admit to the sin of using floatleft and clear as CSS class names, but it is much easier to maintain your CSS if you choose names that relate to the semantic meaning of the content, like feedback and heroimage.

Is main() overloaded in C++ ?

39 votes

There are 2 valid versions of main() exist in C++:

int main()  // version 1
int main(int argc, char **argv)  // version 2

(Even old gcc allows replacing char **argv with int **argv !!)

But both the versions cannot co-exist at the same time ! (use case can be like: while running the binary from command prompt, if you pass no argument then 1st version should be called else the 2nd version).

Is there a special compiler check to just allow only one version in one binary ?

§3.6.1/2 (C++03) says

An implementation shall not predefine the main function. This function shall not be overloaded. It shall have a return type of type int, but otherwise its type is implementation-defined. All implementations shall allow both of the following definitions of main:

   int main() { /* ... */ }
   int main(int argc, char* argv[]) { /* ... */ }

You can use either of them. Both are standard compliant.

Also, since char *argv[] is equivalent to char **argv, replacing char *argv[] with char **argv doesn't make any difference.


But both the versions cannot co-exist at the same time ! (use case can be like: while running the binary from command prompt, if you pass no argument then 1st version should be called else the 2nd version).

No. Both versions cannot co-exist at the same time. One program can have exactly one main function. Which one, depends on your choice. If you want to process command-line argument, then you've to choose the second version, or else first version is enough. Also note that if you use second version, and don't pass any command line argument, then there is no harm in it. It will not cause any error. You just have to interpret argc and argv accordingly, and based on their value, you've to write the logic and flow of your program.

Does const-correctness give the compiler more room for optimization?

33 votes

I know that it improves readability and makes the program less error-prone, but how much does it improve the performance?

And on a side note, what's the major difference between a reference and a const pointer? I would assume they're stored in the memory differently, but how so?

Thanks.

[Edit: OK so this question is more subtle than I thought at first.]

Declaring a pointer-to-const or reference-of-const never helps any compiler to optimize anything. (Although see the Update at the bottom of this answer.)

The const declaration only indicates how an identifier will be used within the scope of its declaration; it does not say that the underlying object can not change.

Example:

int foo(const int *p) {
    int x = *p;
    bar(x);
    x = *p;
    return x;
}

The compiler cannot assume that *p is not modified by the call to bar(), because p could be (e.g.) a pointer to a global int and bar() might modify it.

If the compiler knows enough about the caller of foo() and the contents of bar() that it can prove bar() does not modify *p, then it can also perform that proof without the const declaration.

But this is true in general. Because const only has an effect within the scope of the declaration, the compiler can already see how you are treating the pointer or reference within that scope; it already knows that you are not modifying the underlying object.

So in short, all const does in this context is prevent you from making mistakes. It does not tell the compiler anything it does not already know, and therefore it is irrelevant for optimization.

What about functions that call foo()? Like:

int x = 37;
foo(&x);
printf("%d\n", x);

Can the compiler prove that this prints 37, since foo() takes a const int *?

No. Even though foo() takes a pointer-to-const, it might cast the const-ness away and modify the int. (This is not undefined behavior.) Here again, the compiler cannot make any assumptions in general; and if it knows enough about foo() to make such an optimization, it will know that even without the const.

The only time const might allow optimizations is cases like this:

const int x = 37;
foo(&x);
printf("%d\n", x);

Here, to modify x through any mechanism whatsoever (e.g., by taking a pointer to it and casting away the const) is to invoke Undefined Behavior. So the compiler is free to assume you do not do that, and it can propagate the constant 37 into the printf(). This sort of optimization is legal for any object you declare const. (In practice, a local variable to which you never take a reference will not benefit, because the compiler can already see whether you modify it within its scope.)

To answer your "side note" question, (a) a const pointer is a pointer; and (b) a const pointer can equal NULL. You are correct that the internal representation (i.e. an address) is most likely the same.

[update]

As Christoph points out in the comments, my answer is incomplete because it does not mention restrict.

Section 6.7.3.1 (4) of the C99 standard says:

During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. ...

(Here B is a basic block over which P, a restrict-pointer-to-T, is in scope.)

So if a C function foo() is declared like this:

foo(const int * restrict p)

...then the compiler may assume that no modifications to *p occur during the lifetime of p -- i.e., during the execution of foo() -- because otherwise the Behavior would be Undefined.

So in principle, combining restrict with a pointer-to-const could enable both of the optimizations that are dismissed above. Do any compilers actually implement such an optimization, I wonder? (GCC 4.5.2, at least, does not.)

Note that restrict only exists in C, not C++ (not even C++0x), except as a compiler-specific extension.

How much is too much with C++0x auto keyword

33 votes

I've been using the new auto keyword available in the C++0x standard for complicated templated types which is what I believe it was designed for. But I'm also using it for things like:

auto foo = std::make_shared<Foo>();

And more skeptically for:

auto foo = bla(); // where bla() return a shared_ptr<Foo>

I haven't seen much discussion on this topic. It seems that auto could be overused, since a type is often a form of documentation and sanity checks. Where do you draw the line in using auto and what are the recommended use cases for this new feature?

To clarify: I'm not asking for a philosophical opinion; I'm asking for the intended use of this keyword by the standard committee, possibly with comments on how that intended use is realized in practice.

Side note: This question was moved to SE.Programmers and then back to Stack Overflow. Discussion about this can be found in this meta question.

I think that one should use the auto keyword whenever it's hard to say how to write the type at first sight, but the type of the right hand side of an expression is obvious. For example, using:

my_multi_type::nth_index<2>::type::key_type::composite_key_type::key_extractor_tuple::tail_type::head_type::result_type

to get the composite key type in boost::multi_index, even though you know that it is int. You can't just write int because it could be changed in the future. I would write auto in this case.

So if the auto keyword improves readability in a particular case then use it. You can write auto when it is obvious to the reader what type auto represents.

Here are some examples:

auto foo = std::make_shared<Foo>(); // obvious
auto foo = bla(); // unclear. don't know which type `foo` has

const size_t max_size = 100;
for ( auto x = max_size; x > 0; --x ) // unclear. could lead to the errors
                                      // since max_size is unsigned

std::vector<some_class> v;
for ( auto it = v.begin(); it != v.end(); ++it ) // ok, since I know
// that `it` has an iterator type (don't really care which one in this context)