Best compiler questions in January 2012

How to drive C#, C++ or Java compiler to compute 1+2+3+...+1000?

69 votes

In a recent interview, I was asked a really strange question. The interviewer asked me how can I compute 1+2+3+...+1000 just using compiler features. This means that I am not allowed to write a program and execute it, but I should just write a program that could drive the compiler to compute this sum while compilation and print the result when compilation completes. As a hint, he told me that I may use generics and pre-processor features of the compiler. It is possible to use C++, C# or Java compiler. Any ideas???

This question is not related to computing the sum without any loops asked here. In addition, It should be noted that the sum SHOULD be calculated during compilation. Printing just the result using C++ compiler directives is not acceptable.

Edit

Reading more about the posted answers, I found that solving problems during compilation using C++ templates is called metaprogramming. This is a technique that was discovered accidentally by Dr. Erwin Unruh, during the process of standardizing the C++ language. You may read more about this topic on wiki page of meta-programming. It seems that it is possible to write the program in Java using java annotations. Take look at maress's answer below.

Updated Now with improved recursion depth! Works on MSVC10 and GCC without increased depth. :)


Simple compile-time recursion + addition:

template<unsigned Cur, unsigned Goal>
struct adder{
  static unsigned const sub_goal = (Cur + Goal) / 2;
  static unsigned const tmp = adder<Cur, sub_goal>::value;
  static unsigned const value = tmp + adder<sub_goal+1, Goal>::value;
};

template<unsigned Goal>
struct adder<Goal, Goal>{
  static unsigned const value = Goal;
};

Testcode:

template<unsigned Start>
struct sum_from{
  template<unsigned Goal>
  struct to{
    template<unsigned N>
    struct equals;

    typedef equals<adder<Start, Goal>::value> result;
  };
};

int main(){
  sum_from<1>::to<1000>::result();
}

Output for GCC:

error: declaration of ‘struct sum_from<1u>::to<1000u>::equals<500500u>’

Live example on Ideone.

Output for MSVC10:

error C2514: 'sum_from<Start>::to<Goal>::equals<Result>' : class has no constructors
      with
      [
          Start=1,
          Goal=1000,
          Result=500500
      ]

Can a compiler automatically detect pure functions without the type information about purity?

27 votes

So I'm arguing with my friend who claims that a compiler like GCC can detect a pure function automatically without any type information. I doubt that.

Languages like D or Haskell have purity in their type systems and a programmer explicitly defines what function is pure or not. A pure function has no side effects and can therefore very easily be parallelized.

So the question is: Is this all necessary or not? Could a compiler detect purity, without any meta or type information, just by assuming that anything that does IO or accesses global variables automatically is not pure?

Sure, you can detect pure functions in some cases. For instance,

int f(int x)
{
    return x*2;
}

can be detected as pure with simple static analysis. The difficulty is doing this in general, and detecting interfaces which use "internal" state but are externally pure is basically impossible.

GCC does have the warning options -Wsuggest-attribute=pure and -Wsuggest-attribute=const, which suggest functions that might be candidates for the pure and const attributes. I'm not sure whether it opts to be conservative (i.e. missing many pure functions, but never suggesting it for a non-pure function) or lets the user decide.

Note that GCC's definition of pure is "depending only on arguments and global variables":

Many functions have no effects except the return value and their return value depends only on the parameters and/or global variables. Such a function can be subject to common subexpression elimination and loop optimization just as an arithmetic operator would be. These functions should be declared with the attribute pure.

GCC manual

Strict purity, i.e. the same results for the same arguments in all circumstances, is represented by the const attribute, but such a function cannot even dereference a pointer passed to it. So the parallelisation opportunities for pure functions are limited, but much fewer functions can be const compared to the pure functions you can write in a language like Haskell.

By the way, automatically parallelising pure functions is not as easy as you might think; the hard part becomes deciding what to parallelise. Parallelise computations that are too cheap, and overhead makes it pointless. Don't parallelise enough, and you don't reap the benefits. I don't know of any practical functional language implementation that does automatic parallelisation for this reason, although libraries like repa parallelise many operations behind the scenes without explicit parallelism in the user code.

Is it a good idea to compile a language to C?

27 votes

All over the web, I am getting the feeling that writing a C backend for a compiler is not such a good idea anymore. GHC's C backend is not being actively developed anymore (this is my unsupported feeling). Compilers are targeting C-- or LLVM.

Normally, I would think that GCC is a good old mature compiler that does performs well at optimizing code, therefore compiling to C will use the maturity of GCC to yield better and faster code. Is this not true?

I realize that the question greatly depends on the nature of the language being compiled and on other factors such that getting more maintainable code. I am looking for a rather more general answer (w.r.t. the compiled language) that focuses solely on performance (disregarding code quality, ..etc.). I would be also really glad if the answer includes an explanation on why GHC is drifting away from C and why LLVM performs better as a backend (see this) or any other examples of compilers doing the same that I am not aware of.

While I'm not a compiler expert, I believe that it boils down to the fact that you lose something in translation to C as opposed to translating to e.g. LLVM's intermediate language.

If you think about the process of compiling to C, you create a compiler that translates to C code, then the C compiler translates to an intermediate representation (the in-memory AST), then translates that to machine code. The creators of the C compiler have probably spent a lot of time optimizing certain human-made patterns in the language, but you're not likely to be able to create a fancy enough compiler from a source language to C to emulate the way humans write code. There is a loss of fidelity going to C - the C compiler doesn't have any knowledge about your original code's structure. To get those optimizations, you're essentially back-fitting your compiler to try to generate C code that the C compiler knows how to optimize when it's building its AST. Messy.

If, however, you translate directly to LLVM's intermediate language, that's like compiling your code to a machine-independent high-level bytecode, which is akin to the C compiler giving you access to specify exactly what its AST should contain. Essentially, you cut out the middleman that parses the C code and go directly to the high-level representation, which preserves more of the characteristics of your code by requiring less translation.

Also related to performance, LLVM can do some really tricky stuff for dynamic languages like generating binary code at runtime. This is the "cool" part of just-in-time compilation: it is writing binary code to be executed at runtime, instead of being stuck with what was created at compile time.