Best questions in May 2012

Which is better option to use for dividing an integer number by 2?

151 votes

Which of the following techniques is the best option for dividing an integer by 2 and why?

Technique 1:

x = x >> 1;

Technique 2:

x = x / 2;

Here x is an integer.

Use the operation that best describes what you are trying to do.

  • If you are treating the number as a sequence of bits, use bitshift.
  • If you are treating it as a numerical value, use division.

Note that they are not exactly equivalent. They can give different results for negative integers. For example:

-5 / 2  = -2
-5 >> 1 = -3

(ideone)

Why would iterating over a List be faster than indexing through it?

92 votes

Reading the Java documentation for the ADT List it says:

The List interface provides four methods for positional (indexed) access to list elements. Lists (like Java arrays) are zero based. Note that these operations may execute in time proportional to the index value for some implementations (the LinkedList class, for example). Thus, iterating over the elements in a list is typically preferable to indexing through it if the caller does not know the implementation.

What exactly does this mean? I don't understand the conclusion which is drawn.

In a linked list, each element has a pointer to the next element:

head -> item1 -> item2 -> item3 -> etc.

To access item3, you can see clearly that you need to walk from the head through every node until you reach item3, since you cannot jump directly.

Thus, if I wanted to print the value of each element, if I write this:

for(int i = 0; i < 4; i++) {
    System.out.println(list.get(i));
}

what happens is this:

head -> print head
head -> item1 -> print item1
head -> item1 -> item2 -> print item2
head -> item1 -> item2 -> item3 print item3

This is horribly inefficient because every time you are indexing it restarts from the beginning of the list and goes through every item. This means that your complexity is effectively O(N^2) just to traverse the list!

If instead I did this:

for(String s: list) {
    System.out.println(s);
}

then what happens is this:

head -> print head -> item1 -> print item1 -> item2 -> print item2 etc.

all in a single traversal, which is O(N).

Now, going to the other implementation of List which is ArrayList, that one is backed by a simple array. In that case both of the above traversals are equivalent, since an array is contiguous so it allows random jumps to arbitrary positions.

C#: Why can I create a class named "var"?

77 votes

Isn't var a keyword in C#? But why can I do this:

public class var { }

public class main
{
    public static void main(string[] args)
    {
        var testVar = new var();
    }
}

The var that is used in the code is the var class that is declared before the main class. And the compiler doesn't even complain.

While when I do this:

public class int { }

or this:

public class true { }

The compiler said that int or true is a keyword and cannot be used like that. Why is it not the same with var?

var is not a keyword according to this list.

it is a contextual keyword, so from the context the compiler is be able to decide which is your class and which is the contextual keyword, and no confusion arises.

a contextual keyword is:

used to provide a specific meaning in the code, but it is not a reserved word in C#.

so as its not reserved you can use it.

As pointed out in the comments above there is a discussion of the differences as well as a list of the various keywords and contextual keywords added at each version of c# on Eric Lipperts blog

It is interesting to note that since the set of keywords were decided upon in C#1.0 there have been no additions, so as to preserve backwards compatibility.

str performance in python

68 votes

While profiling a piece of python code (python 2.6 up to 3.2), I discovered that the str method to convert an object (in my case an integer) to a string is almost an order of magnitude slower than using string formatting.

Here is the benchmark

>>> from timeit import Timer
>>> Timer('str(100000)').timeit()
0.3145311339386332
>>> Timer('"%s"%100000').timeit()
0.03803517023435887

Does anyone know why this is the case? Am I missing something?

'%s' % 100000 is evaluated by the compiler and is equivalent to a constant at run-time.

>>> import dis
>>> dis.dis(lambda: str(100000))
  8           0 LOAD_GLOBAL              0 (str)
              3 LOAD_CONST               1 (100000)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE        
>>> dis.dis(lambda: '%s' % 100000)
  9           0 LOAD_CONST               3 ('100000')
              3 RETURN_VALUE        

% with a run-time expression is not (significantly) faster than str:

>>> Timer('str(x)', 'x=100').timeit()
0.25641703605651855
>>> Timer('"%s" % x', 'x=100').timeit()
0.2169809341430664

Do note that str is still slightly slower, as @DietrichEpp said, this is because str involves lookup and function call operations, while % compiles to a single immediate bytecode:

>>> dis.dis(lambda x: str(x))
  9           0 LOAD_GLOBAL              0 (str)
              3 LOAD_FAST                0 (x)
              6 CALL_FUNCTION            1
              9 RETURN_VALUE        
>>> dis.dis(lambda x: '%s' % x)
 10           0 LOAD_CONST               1 ('%s')
              3 LOAD_FAST                0 (x)
              6 BINARY_MODULO       
              7 RETURN_VALUE        

Of course the above is true for the system I tested on (CPython 2.7); other implementations may differ.

Are Roslyn SyntaxNodes reused?

67 votes

I've been taking a look to Roslyn CTP and, while it solves a similar problem to the Expression tree API, both are immutable but Roslyn does so in a quite different way:

  • Expression nodes have no reference to the parent node, are modified using a ExpressionVisitor and that's why big parts can be reused.

  • Roslyn's SyntaxNode, on the other side, has a reference to its parent, so all the nodes effectively become a block that's impossible to re-use. Methods like Update, ReplaceNode, etc, are provided to make modifications.

Where does this end? Document? Project? ISolution? The API promotes a step-by-step change of the tree (instead of a button up), but does each step makes a full copy?

Why they did they make such a choice? Is there some interesting trick I'm missing?

Great question. We debated the issues you raise for a long, long time.

We would like to have a data structure that has the following characteristics:

  • Immutable.
  • The form of a tree.
  • Cheap access to parent nodes from child nodes.
  • Possible to map from a node in the tree to a character offset in the text.
  • Persistent.

By persistence I mean the ability to reuse most of the existing nodes in the tree when an edit is made to the text buffer. Since the nodes are immutable, there's no barrier to reusing them. We need this for performance; we cannot be re-parsing huge wodges of the file every time you hit a key. We need to re-lex and re-parse only the portions of the tree that were affected by the edit.

Now when you try to put all five of those things into one data structure you immediately run into problems:

  • How do you build a node in the first place? The parent and the child both refer to each other, and are immutable, so which one gets built first?
  • Supposing you manage to solve that problem: how do you make it persistent? You cannot re-use a child node in a different parent because that would involve telling the child that it has a new parent. But the child is immutable.
  • Supposing you manage to solve that problem: when you insert a new character into the edit buffer, the absolute position of every node that is mapped to a position after that point changes. This makes it very difficult to make a persistent data structure, because any edit can change the spans of most of the nodes!

But on the Roslyn team we routinely do impossible things. We actually do the impossible by keeping two parse trees. The "green" tree is immutable, persistent, has no parent references, is built "bottom-up", and every node tracks its width but not its absolute position. When an edit happens we rebuild only the portions of the green tree that were affected by the edit, which is typically about O(log n) of the total parse nodes in the tree.

The "red" tree is an immutable facade that is built around the green tree; it is built "top-down" on demand and thrown away on every edit. It computes parent references by manufacturing them on demand as you descend through the tree from the top. It manufactures absolute positions by computing them from the widths, again, as you descend.

You, the user, only ever see the red tree; the green tree is an implementation detail. If you peer into the internal state of a parse node you'll in fact see that there is a reference to another parse node in there of a different type; that's the green tree node.

Incidentally, these are called "red/green trees" because those were the whiteboard marker colours we used to draw the data structure in the design meeting. There's no other meaning to the colours.

The benefit of this strategy is that we get all those great things: immutability, persistence, parent references, and so on. The cost is that this system is complex and can consume a lot of memory if the "red" facades get large. We are at present doing experiments to see if we can reduce some of the costs without losing the benefits.

Is there a setting on Google Analytics to suppress use of cookies for users who have not yet given consent

65 votes

According to EU Article 5(3) of the E-Privacy Directive (a.k.a 'The Cookie Laws'), web sites that target EU users have to gain opt-in consent from users before they set a cookie.

See ICO Guidance

I am trying to square this with Google Analytics on my web site.

I would imagine that Google Analytics (GA) can do a certain level of analytic data gathering without requiring the use of cookies.

However, I cannot find any info on this (on the Google sites/settings panels) about how to relay information about the 'state of consent' back to Google during a page request. So, my only option seems to be that I should not embed Google tag code at all if the user has not explicitly given consent. Which seems a bit drastic.

Letting my serverside script set a 'hasConsentedToCookies=FALSE' flag in the JavaScript tags would allow me to instruct Google's services to run in a gracefully degraded fashion.

Is there a setting on Google Analytics to suppress use of cookies for users that have not yet given consent?

If so, where can I find info on this?

Google Analytics has a new set of APIs to assist with compliance with a cookie opt-out. Here's the documentation, and here's their help docs.

(There has been some ambiguity as to whether the EU Cookie Regulations (as implemented in member countries) require that passive web analytics tracking requires opt-in mechanisms for compliance. If you're concerned one way or another, consult an attorney. Google is empowering you to make the decision as to how you want to proceed)

Basically, they'll leave implementation details to you, but, the idea is, once you've determined whether or not to track the user in Google Analytics, if the answer is to not track, you'd set the following property to true before Google Analytics runs:

window['ga-disable-UA-XXXXXX-Y'] = true;

Where UA-XXXXXX-Y is your account ID in Google Analytics

As the other posters have noted, Google Analytics relies on cookies. So, you're not able to do any kind of tracking without cookies. If you've determined that someone is not to be cookied for tracking, you'll need to implement something like this:

if(doNotCookie()){
   window['ga-disable-UA-XXXXXX-Y'] = true;
}

Opt In

This does require a little bit of jujitsu for when you first load Google Analytics, since this property will need to be set before Google Analytics runs to prevent tracking from ever happening, which means, for an "opt in to tracking" approach, you'd probably need to implement a mechanism where, on first visit, Google Analytics is automatically disabled in the absence of an opt-in cookie (cookies that determine cookie preferences are explicitly allowed), and then, if an opt-in happens, re-runs Google Analytics. On subsequent pageviews, all would run smoothly.

Could look something like (pseudo-code):

if( hasOptedOut() || hasNotExpressedCookiePreferenceYet() ){ //functions you've defined elsewhere
     window['ga-disable-UA-XXXXXX-Y'] = true;
}
  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXXX-Y']);
  _gaq.push(['_trackPageview']);


  function onOptIn(){ //have this run when/if they opt-in.
      window['ga-disable-UA-XXXXXX-Y'] = false;
      //...snip...
      //set a cookie to express that the user has opted-in to tracking, for future pageviews
      _gaq.push(['_trackPageview']); // now run the pageview that you 'missed'
   }

Opt Out

With this approach, you'd allow the user to opt-out of tracking, which would mean you'd use a cookie to set the ga-disable-UA-XXXXXX-Y' property and a cookie to manage it in the future:

if( hasOptedOut() ){ // function you've defined elsewhere 
     window['ga-disable-UA-XXXXXX-Y'] = true;
}

  var _gaq = _gaq || [];
  _gaq.push(['_setAccount', 'UA-XXXXXX-Y']);
  _gaq.push(['_trackPageview']);

What is the correct answer for cout << c++ << c;?

Asked on Mon, 28 May 2012 by pravs c++
65 votes

Recently in an interview there was a following objective type question.

int c = 0;
cout << c++ << c;

Answers:

a. 10
b. 01
c. undefined behavior

I answered choice b, i.e. output would be "01".

But to my surprise later I was told by an interviewer that the correct answer is option c: undefined.

Now, I do know the concept of sequence points in C++. The behavior is undefined for the following statement:

int i = 0;
i += i++ + i++;

but as per my understanding for the statement cout << c++ << c , the ostream.operator<<() would be called twice, first with ostream.operator<<(c++) and later ostream.operator<<(c).

I also checked the result on VS2010 compiler and its output is also '01'.

You can think of:

cout<<c++<<c;

As:

std::operator<<(std::operator<<(std::cout, c++), c);

C++ guarantees that all side effects of previous evaluations will have been performed at sequence points. There are no sequence points in between function arguments evaluation which means that argument c can be evaluated before argument std::operator<<(std::cout, c++) or after. So the result of the above is undefined.

Is this a known pitfall of C++11 for loops?

64 votes

Let's imagine we have a struct for holding 3 doubles with some member functions:

struct Vector {
  double x, y, z;
  // ...
  Vector &negate() {
    x = -x; y = -y; z = -z;
    return *this;
  }
  Vector &normalize() {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  // ...
};

This is a little contrived for simplicity, but I'm sure you agree that similar code is out there. The methods allow you to conveniently chain, for example:

Vector v = ...;
v.normalize().negate();

Or even:

Vector v = Vector{1., 2., 3.}.normalize().negate();

Now if we provided begin() and end() functions, we could use our Vector in a new-style for loop, say to loop over the 3 coordinates x, y, and z (you can no doubt construct more "useful" examples by replacing Vector with e.g. String):

Vector v = ...;
for (double x : v) { ... }

We can even do:

Vector v = ...;
for (double x : v.normalize().negate()) { ... }

and also:

for (double x : Vector{1., 2., 3.}) { ... }

However, the following (it seems to me) is broken:

for (double x : Vector{1., 2., 3.}.normalize()) { ... }

While it seems like a logical combination of the previous two usages, I think this last usage creates a dangling reference while the previous two are completely fine.

  • Is this correct and Widely appreciated?
  • Which part of the above is the "bad" part, that should be avoided?
  • Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

Is this correct and Widely appreciated?

Yes, your understanding of things is correct.

Which part of the above is the "bad" part, that should be avoided?

The bad part is taking an l-value reference to a temporary returned from a function, and binding it to an r-value reference. It is just as bad as this:

auto &&t = Vector{1., 2., 3.}.normalize();

The temporary Vector{1., 2., 3.}'s lifetime cannot be extended because the compiler has no idea that the return value from normalize references it.

Would the language be improved by changing the definition of the range-based for loop such that temporaries constructed in the for-expression exist for the duration of the loop?

That would be highly inconsistent with how C++ works.

Would it prevent certain gotchas made by people using chained expressions on temporaries or various lazy-evaluation methods for expressions? Yes. But it would also be require special-case compiler code, as well as be confusing as to why it doesn't work with other expression constructs.

A much more reasonable solution would be some way to inform the compiler that the return value of a function is always a reference to this, and therefore if the return value is bound to a temporary-extending construct, then it would extend the correct temporary. That's a language-level solution though.

Presently (if the compiler supports it), you can make it so that normalize cannot be called on a temporary:

struct Vector {
  double x, y, z;
  // ...
  Vector &normalize() & {
     double s = 1./sqrt(x*x+y*y+z*z);
     x *= s; y *= s; z *= s;
     return *this;
  }
  Vector &normalize() && = delete;
};

This will cause Vector{1., 2., 3.}.normalize() to give a compile error, while v.normalize() will work fine. Obviously you won't be able to do correct things like this:

Vector t = `Vector{1., 2., 3.}.normalize()`;

But you also won't be able to do incorrect things.

Why would code actively try to prevent tail-call optimization?

58 votes

The title of the question might be a bit strange, but the thing is that, as far as I know, there is nothing that speaks against tail call optimization at all. However, while browsing open source projects, I already came across a few functions that actively try to stop the compiler from doing a tail call optimization, for example the implementation of CFRunLoopRef which is full of such hacks. For example:

static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__() __attribute__((noinline));
static void __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__(CFRunLoopObserverCallBack func, CFRunLoopObserverRef observer, CFRunLoopActivity activity, void *info) {
    if (func) {
        func(observer, activity, info);
    }
    getpid(); // thwart tail-call optimization
}

I would love to know why this is seemingly so important, and are there any cases were I as a normal developer should keep this is mind too? Eg. are there common pitfalls with tail call optimization?

This is only a guess, but maybe to avoid an infinite loop vs bombing out with a stack overflow error.

Since the method in question doesn't put anything on the stack it would seem possible for the tail-call recursion optimization to produce code that would enter an infinite loop as opposed to the non-optimized code which would put the return address on the stack which would eventually overflow in the event of misuse.

The only other thought I have is related to preserving the calls on the stack for debugging and stacktrace printing.

Why "$().ready(handler)" is not recommended?

57 votes

From the jQuery API docs site for ready

All three of the following syntaxes are equivalent:

  • $(document).ready(handler)
  • $().ready(handler) (this is not recommended)
  • $(handler)

After doing homework - reading and playing with the source code, I have no idea why

$().ready(handler) 

is not recommended. The first and third ways, are exactly the same, the third option calls the ready function on a cached jQuery object with document:

rootjQuery = jQuery(document);
...
...

// HANDLE: $(function)
// Shortcut for document ready
} else if ( jQuery.isFunction( selector ) ) {
    return rootjQuery.ready( selector );
}

But the ready function has no interaction with the selector of the selected node elements, The ready source code:

ready: function( fn ) {
    // Attach the listeners
    jQuery.bindReady();
        // Add the callback
    readyList.add( fn );
        return this;
},

As you can see, it justs add the callback to an internal queue( readyList) and doesn't change or use the elements in the set. This lets you call the ready function on every jQuery object.

Like:

  • regular selector: $('a').ready(handler) DEMO
  • Nonsense selector: $('fdhjhjkdafdsjkjriohfjdnfj').ready(handler) DEMO
  • Undefined selector:$().ready(handler) DEMO

Finally... to my question: Why $().ready(handler) is not recommended?

I got an official answer from one of the jQuery developers:

$().ready(fn) only works because $() used to be a shortcut to $(document) (jQuery <1.4)
So $().ready(fn) was a readable code.

But people used to do things like $().mouseover() and all sorts of other madness.
and people had to do $([]) to get an empty jQuery object

So in 1.4 we changed it so $() gives an empty jQuery and we just made $().ready(fn) work so as not to break a lot of code

$().ready(fn) is literally now just patched in core to make it work properly for the legacy case.

The best place for the ready function is $.ready(fn), but it's a really old design decision and that is what we have now.


I asked him:

Do you $(fn) is more readable than $().ready(fn) ?!

His answer was:

I always do $(document).ready(fn) in actual apps and typically there's only one doc ready block in the app it's not exactly like a maintenance thing.

I think $(fn) is pretty unreadable too, it's just A Thing That You Have To Know Works™...

What's the result of += in C and C++?

54 votes

I've got the following code:

#include <stdio.h>
int main(int argc, char **argv) {
    int i = 0;
    (i+=10)+=10;
    printf("i = %d\n", i);
    return 0;
}

If I try to compile it as a C source using gcc I get an error:

error: lvalue required as left operand of assignment

But if I compile it as a C++ source using g++ I get no error and when i run the executable:

i = 20

Why the different behaviour?

Semantics of the add-assign operators is different in C and C++:

C99 standard, 6.5.16, part 3:

An assignment operator stores a value in the object designated by the left operand. An assignment expression has the value of the left operand after the assignment, but is not an lvalue.

In C++ 5.17.1:

The assignment operator (=) and the compound assignment operators all group right-to-left. All require a modifiable lvalue as their left operand and return an lvalue with the type and value of the left operand after the assignment has taken place.

EDIT : The behavior of (i+=10)+=10 in C++ is undefined in C++98, but well defined in C++11. See this answer to the question by aix for the relevant portions of the standards.

Why is this valid C# code?

54 votes

This is valid C# code

var bob = "abc" + null + null + null + "123";  // abc123

This is not valid C# code

var wtf = null.ToString(); // compiler error

Why is the first statement valid?

The reason for first one working:

From MSDN:

In string concatenation operations,the C# compiler treats a null string the same as an empty string, but it does not convert the value of the original null string.

More information on the + binary operator:

The binary + operator performs string concatenation when one or both operands are of type string.

If an operand of string concatenation is null, an empty string is substituted. Otherwise, any non-string argument is converted to its string representation by invoking the virtual ToString method inherited from type object.

If ToString returns null, an empty string is substituted.

The reason of the error in second is:

null (C# Reference) - The null keyword is a literal that represents a null reference, one that does not refer to any object. null is the default value of reference-type variables.

Creating an object in a static way

53 votes

Could anyone explain how Java executes this code? I mean the order of executing each statement.

public class Foo
{
    boolean flag = sFlag;
    static Foo foo = new Foo();
    static boolean sFlag = true;

    public static void main(String[] args)
    {
        System.out.println(foo.flag);
    }
}

OUTPUT:

false

  • Class initialization starts. Initially, foo is null and sFlag is false
  • The first static variable initializer (foo) runs:
    • A new instance of Foo is created
    • The instance variable initializer for flag executes - currently sFlag is false, so the value of flag is false
  • The second static variable initializer (sFlag) executes, setting the value to true
  • Class initialization completes
  • main runs, printing out foo.flag, which is false

Note that if sFlag were declared to be final it would be treated as a compile-time constant, at which point all references to it would basically be inlined to true, so foo.flag would be true too.

What is the scope of a lambda variable in C#?

51 votes

I'm confused about the scope of the lambda variable, take for instance the following

var query = 
    from customer in clist
    from order in olist
    .Where(o => o.CustomerID == customer.CustomerID && o.OrderDate ==  // line 1
        olist.Where(o1 => o1.CustomerID == customer.CustomerID)        // line 2
             .Max(o1 => o1.OrderDate)                                  // line 3
    )
    select new {
        customer.CustomerID,
        customer.Name,
        customer.Address,
        order.Product,
        order.OrderDate
    };

In line 1 I have declare a lambda variable 'o' which means I cannot declare it again in line 2 (or at least the compiler complains if I try to) But it doesn't complain about line 3 even though 'o1' already exists??

What is the scope of a lambda variable?

The brackets give the clue - the lambda variable is captured in the scope of where it's declared:

.Where(o => ... olist.Where(o1 => ...).Max(o1 => ...))
  //  |----------------------------------------------| scope of o
  //                       |---------|                 scope of first o1
  //                                      |---------|  scope of second o1

Note that there's no overlap for the two o1 variables, but they both overlap (or shadow) the o variable and hence can't use the same name.

String.Format - how it works and how to implement custom formatstrings

45 votes

With String.Format() it is possible to format for example DateTime objects in many different ways. Every time I am looking for a desired format I need to search around on Internet. Almost always I find an example I can use. For example:

String.Format("{0:MM/dd/yyyy}", DateTime.Now);          // "09/05/2012"

But I don't have any clue how it works and which classes support these 'magic' additional strings.

So my questions are:

  1. How does String.Format map the additional information MM/dd/yyyy to a string result?
  2. Do all Microsoft objects support this feature?
    Is this documented somewhere?
  3. Is it possible to do something like this:
    String.Format("{0:MyCustomFormat}", new MyOwnClass())

String.Format matches each of the tokens inside the string ({0} etc) against the corresponding object: http://msdn.microsoft.com/en-us/library/system.string.format.aspx

A format string is optionally provided:

{ index[,alignment][ : formatString] }

If formatString is provided, the corresponding object must implement IFormattable and specifically the ToString method that accepts formatString and returns the corresponding formatted string: http://msdn.microsoft.com/en-us/library/system.iformattable.tostring.aspx

An IFormatProvider may also used which can be used to capture basic formatting standards/defaults etc. Examples here and here.

So the answers to your questions in order:

  1. It uses the IFormattable interface's ToString() method on the DateTime object and passes that the MM/dd/yyyy format string. It is that implementation which returns the correct string.

  2. Any object that implement IFormattable supports this feature. You can even write your own!

  3. Yes, see above.

How to choose between two method of the same name in Java

44 votes

I'm trying to access a method in a class I made, but since it is similar in name and in number of arguments my IDE says the method is ambiguous. Here's a mock-up of what the two methods look like:

methodName(X, Y, Z)
methodName(A, Y, Z)

I called on the method, and passed in the value null for the first argument for the purpose of my test. Unfortunately I cannot rename the methods, change the order of the arguments or modify the structure of the method in any way. Is there a way I can differentiate between these two methods?

Cast the first argument to the type of the first parameter of the method you want to call, for example:

methodName((A) null, y, z);

Single and double quotes in C/C++

42 votes

I was looking at the question Single quotes vs. double quotes in C. I couldn't completely understand the explanation given so I wrote a program

#include <stdio.h>
int main()
{
  char ch = 'a';
  printf("sizeof(ch) :%d\n", sizeof(ch));
  printf("sizeof(\'a\') :%d\n", sizeof('a'));
  printf("sizeof(\"a\") :%d\n", sizeof("a"));
  printf("sizeof(char) :%d\n", sizeof(char));
  printf("sizeof(int) :%d\n", sizeof(int));
  return 0;
}

I compiled them using both gcc and g++ and these are my outputs

gcc:

sizeof(ch)   : 1  
sizeof('a')  : 4  
sizeof("a")  : 2  
sizeof(char) : 1  
sizeof(int)  : 4  

g++:

sizeof(ch)   : 1  
sizeof('a')  : 1  
sizeof("a")  : 2  
sizeof(char) : 1  
sizeof(int)  : 4  

The g++ output makes sense to me and I don't have any doubt regarding that. In gcc what is the need to have sizeof('a') to be different from sizeof(char). Is there some actual reason behind it or is it just historical?

Also in C if char and 'a' have different size does that mean when we are doing char ch = 'a'; we are doing implicit type-conversion?

In C, character constants such as 'a' have type int, in C++ it's char.

Regarding the last question, yes,

char ch = 'a';

causes an implicit conversion of the int to char.

Why does in_array() wrongly return true with these (large numeric) strings?

37 votes

I am not getting what is wrong with this code. It's returning "Found", which it should not.

$lead = "418176000000069007";
$diff = array("418176000000069003","418176000000057001");

if (in_array($lead,$diff))
    echo "Found";
else
    echo "Not found";

I think it is because of the limitations of the number storage. The values exceed PHP_INT_MAX.

Try without using the quotes, and try to echo the values of the variables. It will result in something like

$lead ---> 418176000000070000  
$diff ---> Array ( [0] => 418176000000070000 [1] => 418176000000060000 )

so in this case the in_array result is true!

<?php
     $lead = "418176000000069007";
     $diff = array("418176000000069003","418176000000057001");

     if(in_array($lead,$diff,true)) //use type too
       echo "Found";
     else
       echo "Not found";
?>

Try this. It will work.

What are the incompatible differences betweeen C(99) and C++(11)?

36 votes

This question was triggered by replie(s) to a post by Herb Sutter where he explained MS's decision to not support/make a C99 compiler but just go with the C(99) features that are in the C++(11) standard anyway.

One commenter replied:

(...) C is important and deserves at least a little bit of attention.

There is a LOT of existing code out there that is valid C but is not valid C++. That code is not likely to be rewritten (...)

Since I only program in MS C++, I really don't know "pure" C that well, i.e. I have no ready picture of what details of the C++-language I'm using are not in C(99) and I have little clues where some C99 code would not work as-is in a C++ compiler.

Note that I know about the C99 only restrict keyword which to me seems to have very narrow application and about variable-length-arrays (of which I'm not sure how widespread or important they are).

Also, I'm very interested whether there are any important semantic differences or gotchas, that is, C(99) code that will compiler under C++(11) but do something differently with the C++ compiler than with the C compiler.


Quick links: External resources from the answers:

If you start from the common subset of C and C++, sometimes called clean C (which is not quite C90), you have to consider 3 types of incompatibilities:

  1. Additional C++ featues which make legal C illegal C++

    Examples for this are C++ keywords which can be used as identifiers in C or conversions which are implicit in C but require an explicit cast in C++.

    This is probably the main reason why Microsoft still ships a C frontend at all: otherwise, legacy code that doesn't compile as C++ would have to be rewritten.

  2. Additional C features which aren't part of C++

    The C language did not stop evolving after C++ was forked. Some examples are variable-length arrays, designated initializers and restrict. These features can be quite handy, but aren't part of any C++ standard, and some of them will probably never make it in.

  3. Features which are available in both C and C++, but have different semantics

    An example for this would be the linkage of const objects or inline functions.

A list of incompatibilities between C99 and C++98 can be found here (which has already been mentioned by Mat).

While C++11 and C11 got closer on some fronts (variadic macros are now available in C++, variable-length arrays are now an optional C language feature), the list of incompatibilities has grown as well (eg generic selections in C and the auto type-specifier in C++).

As an aside, while Microsoft has taken some heat for the decision to abandon C (which is not a recent one), as far as I know no one in the open source community has actually taken steps to do something about it: It would be quite possible to provide many features of modern C via a C-to-C++ compiler, especially if you consider that some of them are trivial to implement. This is actually possible right now using Comeau C/C++, which does support C99.

However, it's not really a pressing issue: Personally, I'm quite comfortable with using GCC and Clang on Windows, and there are proprietary alternatives to MSVC as well, eg Pelles C or Intel's compiler.

Does new char actually guarantee aligned memory for a class type?

36 votes

Is allocating a buffer via new char[sizeof(T)] guaranteed to allocate memory which is properly aligned for the type T, where all members of T has their natural, implementation defined, alignment (that is, you have not used the alignas keyword to modify their alignment).

I have seen this guarantee made in a few answers around here but I'm not entirely clear how the standard arrives at this guarantee. 5.3.4-10 of the standard gives the basic requirement: essentially new char[] must be aligned to max_align_t.

What I'm missing is the bit which says alignof(T) will always be a valid alignment with a maximum value of max_align_t. I mean, it seems obvious, but must the resulting alignment of a structure be at most max_align_t? Even point 3.11-3 says extended alignments may be supported, so may the compiler decide on its own a class is an over-aligned type?

What I'm missing is the bit which says alignof(T) will always be a valid alignment with a maximum value of max_align_t. I mean, it seems obvious, but must the resulting alignment of a structure be at most max_align_t ? Even point 3.11-3 says extended alignments may be supported, so may the compiler decide on its own a class is an over-aligned type ?

As noted by Mankarse, the best quote I could get is from [basic.align]/3:

A type having an extended alignment requirement is an over-aligned type. [ Note: every over-aligned type is or contains a class type to which extended alignment applies (possibly through a non-static data member). —end note ]

which seems to imply that extended alignment must be explicitly required (and then propagates) but cannot

I would have prefer a clearer mention; the intent is obvious for a compiler-writer, and any other behavior would be insane, still...