Best c questions in August 2010

Why does C++ support memberwise assignment of arrays within structs, but not generally?

36 votes

I understand that memberwise assignment of arrays is not supported, such that the following will not work:

int num1[3] = {1,2,3};
int num2[3];
num2 = num1; // "error: invalid array assignment"

I just accepted this as fact, figuring that the aim of the language is to provide an open-ended framework, and let the user decide how to implement something such as the copying of an array.

However, the following does work:

struct myStruct {int num[3];};
myStruct struct1={{1,2,3}};
myStruct struct2;
struct2 = struct1;

The array num[3] is member-wise assigned from its instance in struct1, into its instance in struct2.

Why is member-wise assignment of arrays supported for structs, but not in general?

edit: Roger Pate's comment in the thread std::string in struct - Copy/assignment issues? seems to point in the general direction of the answer, but I don't know enough to confirm it myself.

edit 2: Many excellent responses. I choose Luther Blissett's because I was mostly wondering about the philosophical or historical rationale behind the behavior, but James McNellis's reference to the related spec documentation was useful as well.

Here's my take on it:

The Development of the C Language offers some insight in the evolution of the array type in C:

I'll try to outline the array thing:

C's forerunners B and BCPL had no distinct array type, a declaration like:

auto V[10] (B)
or 
let V = vec 10 (BCPL)

would declare V to be a (untyped) pointer which is initialized to point to an unused region of 10 "words" of memory. B already used * for pointer dereferencing and had the [] short hand notation, *(V+i) meant V[i], just as in C/C++ today. However, V is not an array, it is still a pointer which has to point to some memory. This caused trouble when Dennis Ritchie tried to extend B with struct types. He wanted arrays to be part of the structs, like in C today:

struct {
    int inumber;
    char name[14];
};

But with the B,BCPL concept of arrays as pointers, this would have required the name field to contain a pointer which had to be initialized at runtime to a memory region of 14 bytes within the struct. The initialization/layout problem was eventually solved by giving arrays a special treatment: The compiler would track the location of arrays in structures, on the stack etc. without actually requiring the pointer to the data to materialize, except in expressions which involve the arrays. This treatment allowed almost all B code to still run and is the source of the "arrays convert to pointer if you look at them" rule. It is a compatiblity hack, which turned out to be very handy, because it allowed arrays of open size etc.

And here's my guess why array can't be assigned: Since arrays were pointers in B, you could simply write:

auto V[10];
V=V+5;

to rebase an "array". This was now meaningless, because the base of an array variable was not a lvalue anymore. So this assigment was disallowed, which helped to catch the few programs that did this rebasing on declared arrays. And then this notion stuck: As arrays were never designed to be first class citized of the C type system, they were mostly treated as special beasts which become pointer if you use them. And from a certain point of view (which ignores that C-arrays are a botched hack), disallowing array assignment still makes some sense: An open array or an array function parameter is treated as a pointer without size information. The compiler doesn't have the information to generate an array assignment for them and the pointer assignment was required for compatibility reasons. Introducing array assignment for the declared arrays would have introduced bugs though spurious assigments (is a=b a pointer assignment or an elementwise copy?) and other trouble (how do you pass an array by value?) without actually solving a problem - just make everything explicit with memcpy!

/* Example how array assignment void make things even weirder in C/C++, 
   if we don't want to break existing code.
   It's actually better to leave things as they are...
*/
typedef int vec[3];

void f(vec a, vec b) 
{
    vec x,y; 
    a=b; // pointer assignment
    x=y; // NEW! element-wise assignment
    a=x; // pointer assignment
    x=a; // NEW! element-wise assignment
}

This didn't change when a revision of C in 1978 added struct assignment ( http://cm.bell-labs.com/cm/cs/who/dmr/cchanges.pdf ). Even though records were distinct types in C, it was not possible to assign them in early K&R C. You had to copy them member-wise with memcpy and you could pass only pointers to them as function parameters. Assigment (and parameter passing) was now simply defined as the memcpy of the struct's raw memory and since this couldn't break exsisting code it was readily adpoted. As a unintended side effect, this implicitly introduced some kind of array assignment, but this happended somewhere inside a structure, so this couldn't really introduce problems with the way arrays were used.

What's the graceful way of handling out of memory situations in C/C++?

33 votes

I'm writing an caching app that consumes large amounts of memory.

Hopefully, I'll manage my memory well enough, but I'm just thinking about what to do if I do run out of memory.

If a call to allocate even a simple object fails, is it likely that even a syslog call will also fail?

EDIT: Ok perhaps I should clarify the question. If malloc or new returns a NULL or 0L value then it essentially means the call failed and it can't give you the memory for some reason. So, what would be the sensible thing to do in that case?

EDIT2: I've just realised that NULL will throw an exception. This could be caught at a higher level so I can perhaps gracefully exit further up. At that point, it may even be possible to recover depending on how much memory is freed. In the least I should by that point hopefully be able to log something. So while I have seen code that checks the value of a pointer after new, it is unnecessary. While in C, you should check the return value for malloc.

Well, if you are in a case where there is a failure to allocate memory, you're going to get a std::bad_alloc exception. The exception causes the stack of your program to be unwound. In all likelihood, the inner loops of your application logic are not going to be handling out of memory conditions, only higher levels of your application should be doing that. Because the stack is getting unwound, a significant chunk of memory is going to be free'd -- which in fact should be almost all the memory used by your program.

The one exception to this is when you ask for a very large (several hundred MB, for example) chunk of memory which cannot be satisfied. When this happens though, there's usually enough smaller chunks of memory remaining which will allow you to gracefully handle the failure.

Stack unwinding is your friend ;)

EDIT: Just realized that the question was also tagged with C -- if that is the case, then you should be having your functions free their internal structures manually when out of memory conditions are found; not to do so is a memory leak.

EDIT2: Example:

#include <iostream>
#include <vector>

void DoStuff()
{
    std::vector<int> data;
    //insert a whole crapload of stuff into data here.
    //Assume std::vector::push_back does the actual throwing
    //i.e. data.resize(SOME_LARGE_VALUE_HERE);
}

int main()
{
    try
    {
        DoStuff();
        return 0;
    }
    catch (const std::bad_alloc& ex)
    {   //Observe that the local variable `data` no longer exists here.
        std::cerr << "Oops. Looks like you need to use a 64 bit system (or "
                     "get a bigger hard disk) for that calculation!";
        return -1;
    }
}

EDIT3: Okay, according to commenters there are systems out there which do not follow the standard in this regard. On the other hand, on such systems, you're going to be SOL in any case, so I don't see why they merit discussion. But if you are on such a platform, it is something to keep in mind.

Optimize me! (C, performance) -- followup to bit-twiddling question

25 votes

Thanks to some very helpful stackOverflow users at Bit twiddling: which bit is set?, I have constructed my function (posted at the end of the question).

Any suggestions -- even small suggestions -- would be appreciated. Hopefully it will make my code better, but at the least it should teach me something. :)

Overview

This function will be called at least 1013 times, and possibly as often as 1015. That is, this code will run for months in all likelihood, so any performance tips would be helpful.

This function accounts for 72-77% of the program's time, based on profiling and about a dozen runs in different configurations (optimizing certain parameters not relevant here).

At the moment the function runs in an average of 50 clocks. I'm not sure how much this can be improved, but I'd be thrilled to see it run in 30.

Key Observation

If at some point in the calculation you can tell that the value that will be returned will be small (exact value negotiable -- say, below a million) you can abort early. I'm only interested in large values.

This is how I hope to save the most time, rather than by further micro-optimizations (though these are of course welcome as well!).

Performance Information

  • smallprimes is a bit array (64 bits); on average about 8 bits will be set, but it could be as few as 0 or as many as 12.
  • q will usually be nonzero. (Notice that the function exits early if q and smallprimes are zero.)
  • r and s will often be 0. If q is zero, r and s will be too; if r is zero, s will be too.
  • As the comment at the end says, nu is usually 1 by the end, so I have an efficient special case for it.
  • The calculations below the special case may appear to risk overflow, but through appropriate modeling I have proved that, for my input, this will not occur -- so don't worry about that case.
  • Functions not defined here (ugcd, minuu, star, etc.) have already been optimized; none take long to run. pr is a small array (all in L1). Also, all functions called here are pure functions.
  • But if you really care... ugcd is the gcd, minuu is the minimum, vals is the number of trailing binary 0s, __builtin_ffs is the location of the leftmost binary 1, star is (n-1) >> vals(n-1), pr is an array of the primes from 2 to 313.
  • The calculations are currently being done on a Phenom II 920 x4, though optimizations for i7 or Woodcrest are still of interest (if I get compute time on other nodes).
  • I would be happy to answer any questions you have about the function or its constituents.

What it actually does

Added in response to a request. You don't need to read this part.

The input is an odd number n with 1 < n < 4282250400097. The other inputs provide the factorization of the number in this particular sense:

smallprimes&1 is set if the number is divisible by 3, smallprimes&2 is set if the number is divisible by 5, smallprimes&4 is set if the number is divisible by 7, smallprimes&8 is set if the number is divisible by 11, etc. up to the most significant bit which represents 313. A number divisible by the square of a prime is not represented differently from a number divisible by just that number. (In fact, multiples of squares can be discarded; in the preprocessing stage in another function multiples of squares of primes <= lim have smallprimes and q set to 0 so they will be dropped, where the optimal value of lim is determined by experimentation.)

q, r, and s represent larger factors of the number. Any remaining factor (which may be greater than the square root of the number, or if s is nonzero may even be less) can be found by dividing factors out from n.

Once all the factors are recovered in this way, the number of bases, 1 <= b < n, to which n is a strong pseudoprime are counted using a mathematical formula best explained by the code.

Improvements so far

  • Pushed the early exit test up. This clearly saves work so I made the change.
  • The appropriate functions are already inline, so __attribute__ ((inline)) does nothing. Oddly, marking the main function bases and some of the helpers with __attribute ((hot)) hurt performance by almost 2% and I can't figure out why (but it's reproducible with over 20 tests). So I didn't make that change. Likewise, __attribute__ ((const)), at best, did not help. I was more than slightly surprised by this.

Code

ulong bases(ulong smallprimes, ulong n, ulong q, ulong r, ulong s)
{
    if (!smallprimes & !q)
        return 0;

    ulong f = __builtin_popcountll(smallprimes) + (q > 1) + (r > 1) + (s > 1);
    ulong nu = 0xFFFF;  // "Infinity" for the purpose of minimum
    ulong nn = star(n);
    ulong prod = 1;

    while (smallprimes) {
        ulong bit = smallprimes & (-smallprimes);
        ulong p = pr[__builtin_ffsll(bit)];
        nu = minuu(nu, vals(p - 1));
        prod *= ugcd(nn, star(p));
        n /= p;
        while (n % p == 0)
            n /= p;
        smallprimes ^= bit;
    }
    if (q) {
        nu = minuu(nu, vals(q - 1));
        prod *= ugcd(nn, star(q));
        n /= q;
        while (n % q == 0)
            n /= q;
    } else {
        goto BASES_END;
    }
    if (r) {
        nu = minuu(nu, vals(r - 1));
        prod *= ugcd(nn, star(r));
        n /= r;
        while (n % r == 0)
            n /= r;
    } else {
        goto BASES_END;
    }
    if (s) {
        nu = minuu(nu, vals(s - 1));
        prod *= ugcd(nn, star(s));
        n /= s;
        while (n % s == 0)
            n /= s;
    }

    BASES_END:
    if (n > 1) {
        nu = minuu(nu, vals(n - 1));
        prod *= ugcd(nn, star(n));
        f++;
    }

    // This happens ~88% of the time in my tests, so special-case it.
    if (nu == 1)
        return prod << 1;

    ulong tmp = f * nu;
    long fac = 1 << tmp;
    fac = (fac - 1) / ((1 << f) - 1) + 1;
    return fac * prod;
}

You seem to be wasting much time doing divisions by the factors. It is much faster to replace a division with a multiplication by the reciprocal of divisor (division: ~15-80(!) cycles, depending on the divisor, multiplication: ~4 cycles), IF of course you can precompute the reciprocals.

While this seems unlikely to be possible with q, r, s - due to the range of those vars, it is very easy to do with p, which always comes from the small, static pr[] array. Precompute the reciprocals of those primes and store them in another array. Then, instead of dividing by p, multiply by the reciprocal taken from the second array. (Or make a single array of structs.)

Now, obtaining exact division result by this method requires some trickery to compensate for rounding errors. You will find the gory details of this technique in this document, on page 138.

EDIT:

After consulting Hacker's Delight (an excellent book, BTW) on the subject, it seems that you can make it even faster by exploiting the fact that all divisions in your code are exact (i.e. remainder is zero).

It seems that for every divisor d which is odd and base B = 2word_size, there exists a unique multiplicative inverse d⃰ which satisfies the conditions: d⃰ < B and d·d⃰ ≡ 1 (mod B). For every x which is an exact multiple of d, this implies x/d ≡ x·d⃰ (mod B). Which means you can simply replace a division with a multiplication, no added corrections, checks, rounding problems, whatever. (The proofs of these theorems can be found in the book.) Note that this multiplicative inverse need not be equal to the reciprocal as defined by the previous method!

How to check whether a given x is an exact multiple of d - i.e. x mod d = 0 ? Easy! x mod d = 0 iff x·d⃰ mod B ≤ ⌊(B-1)/d⌋. Note that this upper limit can be precomputed.

So, in code:

unsigned x, d;
unsigned inv_d = mulinv(d);          //precompute this!
unsigned limit = (unsigned)-1 / d;   //precompute this!

unsigned q = x*inv_d;
if(q <= limit)
{
   //x % d == 0
   //q == x/d
} else {
   //x % d != 0
   //q is garbage
}

Assuming the pr[] array becomes an array of struct prime:

struct prime {
   ulong p;
   ulong inv_p;  //equal to mulinv(p)
   ulong limit;  //equal to (ulong)-1 / p
}

the while(smallprimes) loop in your code becomes:

while (smallprimes) {
    ulong bit = smallprimes & (-smallprimes);
    int bit_ix = __builtin_ffsll(bit);
    ulong p = pr[bit_ix].p;
    ulong inv_p = pr[bit_ix].inv_p;
    ulong limit = pr[bit_ix].limit;
    nu = minuu(nu, vals(p - 1));
    prod *= ugcd(nn, star(p));
    n *= inv_p;
    for(;;) {
        ulong q = n * inv_p;
        if (q > limit)
            break;
        n = q;
    }
    smallprimes ^= bit;
}

And for the mulinv() function:

ulong mulinv(ulong d) //d needs to be odd
{
   ulong x = d;
   for(;;)
   {
      ulong tmp = d * x;
      if(tmp == 1)
         return x;
      x *= 2 - tmp;
   }
}

Note you can replace ulong with any other unsigned type - just use the same type consistently.

The proofs, whys and hows are all available in the book. A heartily recommended read :-).

Why do digits 1, 2 and 3 appear so frequently using C rand() function?

23 votes

What I am trying to do is to generate some random numbers (not necessarily single digit) like

29106
7438
5646
4487
9374
28671
92
13941
25226
10076

and then count the number of digits I get:

count[0] =       3  Percentage =  6.82
count[1] =       5  Percentage = 11.36
count[2] =       6  Percentage = 13.64
count[3] =       3  Percentage =  6.82
count[4] =       6  Percentage = 13.64
count[5] =       2  Percentage =  4.55
count[6] =       7  Percentage = 15.91
count[7] =       5  Percentage = 11.36
count[8] =       3  Percentage =  6.82
count[9] =       4  Percentage =  9.09

This is the code I am using:

#include <stdio.h>
#include <time.h>
#include <stdlib.h>

int main() {

    int i;
    srand(time(NULL));
    FILE* fp = fopen("random.txt", "w");    
    // for(i = 0; i < 10; i++)
    for(i = 0; i < 1000000; i++)
        fprintf(fp, "%d\n", rand());
    fclose(fp);

    int dummy;
    long count[10] = {0,0,0,0,0,0,0,0,0,0};
    fp = fopen("random.txt", "r");
    while(!feof(fp)) {
        fscanf(fp, "%1d", &dummy);
        count[dummy]++;                 
    }
    fclose(fp);

    long sum = 0;
    for(i = 0; i < 10; i++)
        sum += count[i];

    for(i = 0; i < 10; i++)
        printf("count[%d] = %7ld  Percentage = %5.2f\n",
            i, count[i], ((float)(100 * count[i])/sum));

}

If I generate a large number of random numbers (1000000), this is the result I get:

count[0] =  387432  Percentage =  8.31
count[1] =  728339  Percentage = 15.63
count[2] =  720880  Percentage = 15.47
count[3] =  475982  Percentage = 10.21
count[4] =  392678  Percentage =  8.43
count[5] =  392683  Percentage =  8.43
count[6] =  392456  Percentage =  8.42
count[7] =  391599  Percentage =  8.40
count[8] =  388795  Percentage =  8.34
count[9] =  389501  Percentage =  8.36

Notice that 1, 2 and 3 have too many hits. I have tried running this several times and each time I get very similar results.

I am trying to understand what could cause 1, 2 and 3 to appear much more frequently than any other digit.


Taking hint from what Matt Joiner and Pascal Cuoq pointed out,

I changed the code to use

for(i = 0; i < 1000000; i++)
    fprintf(fp, "%04d\n", rand() % 10000);
// pretty prints 0
// generates numbers in range 0000 to 9999

and this is what I get (similar results on multiple runs):

count[0] =  422947  Percentage = 10.57
count[1] =  423222  Percentage = 10.58
count[2] =  414699  Percentage = 10.37
count[3] =  391604  Percentage =  9.79
count[4] =  392640  Percentage =  9.82
count[5] =  392928  Percentage =  9.82
count[6] =  392737  Percentage =  9.82
count[7] =  392634  Percentage =  9.82
count[8] =  388238  Percentage =  9.71
count[9] =  388352  Percentage =  9.71

What can be the reason that 0, 1 and 2 are favored?


Thanks everyone. Using

int rand2(){
    int num = rand();
    return (num > 30000? rand2():num);     
}

    fprintf(fp, "%04d\n", rand2() % 10000);

I get

count[0] =  399629  Percentage =  9.99
count[1] =  399897  Percentage = 10.00
count[2] =  400162  Percentage = 10.00
count[3] =  400412  Percentage = 10.01
count[4] =  399863  Percentage = 10.00
count[5] =  400756  Percentage = 10.02
count[6] =  399980  Percentage = 10.00
count[7] =  400055  Percentage = 10.00
count[8] =  399143  Percentage =  9.98
count[9] =  400104  Percentage = 10.00

rand() generates a value from 0 to RAND_MAX. RAND_MAX is set to INT_MAX on most platforms, which may be 32767 or 2147483647.

For your example given above, it appears that RAND_MAX is 32767. This will place an unusually high frequency of 1, 2 and 3 for the most significant digit for the values from 10000 to 32767. You can observe that to a lesser degree, values up to 6 and 7 will also be slightly favored.

Is there a way to make this hash lookup any faster?

19 votes

I have a requirement to (very) quickly process strings of a limited range, tallying their values. The input file is of the form:

January    7
March     22
September 87
March     36

and so forth. Because the line widths are identical, I can simply read in a line with fread reasonably fast, and I've developed a perfect hashing function which works, but I wanted to see if anyone could offer any advice on how to make it even faster. I'll profile each suggestion to see how it goes.

The hashing function is based on the month name to allow fast allocation of the value to a bucket. Bear with me here. I first figured out the minimal number of characters for a perfect hash:

January
February
March
April
May
June
July
August
September
October
November
December

Keep in mind that the months are all nine characters due to the fact I have the entire input line.

Unfortunately, there is no single column to mark a month unique. Column 1 duplicates J, column 2 duplicates a, column 3 duplicates r, column 4 duplicates u and columns 5 onwards duplicate <space> (there are other duplicates but one is enough to prevent a single-column hash key).

However, by using the first and fourth column, I get the values Ju, Fr, Mc, Ai, M<space>, Je, Jy, Au, St, Oo, Ne and De, which are unique. There will be no invalid values in this file so I don't have to worry about incorrect buckets for the input data.

By viewing the hex codes for the characters, I found I could get low unique values by just ANDing with strategic values:

FirstChar  Hex  Binary     &0x0f
---------  ---  ---------  -----
   A       x41  0100 0001      1
   D       x44  0100 0100      4
   F       x46  0100 0110      6
   J       x4a  0100 1010     10
   M       x4d  0100 1101     13
   N       x4e  0100 1110     14
   O       x4f  0100 1111     15
   S       x53  0101 0011      3

SecondChar  Hex  Binary     &0x1f
----------  ---  ---------  -----
 <space>    x20  0010 0000      0
    c       x63  0110 0011      3
    e       x65  0110 0101      5
    i       x69  0110 1001      9
    o       x6f  0110 1111     15
    r       x72  0111 0010     18
    t       x74  0111 0100     20
    u       x75  0111 0101     21
    y       x79  0111 1001     25

and this allowed me to set up a static array to create a (hopefully) blindingly-fast hash function:

#define __ -1
static unsigned int hash (const char *str) {
    static unsigned char bucket[] = {
        //   A       S   D       F               J           M   N   O
        __, __, __, __, __, __, __, __, __, __, __, __, __,  4, __, __, // space
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __,  2, __, __, // c
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, 11, __, __, __, __, __,  5, __, __, __, 10, __, // e
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __,  3, __, __, __, __, __, __, __, __, __, __, __, __, __, __, // i
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __,  9, // o
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __,  1, __, __, __, __, __, __, __, __, __, // r
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __,  8, __, __, __, __, __, __, __, __, __, __, __, __, // t
        __,  7, __, __, __, __, __, __, __, __,  0, __, __, __, __, __, // u
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, __, //
        __, __, __, __, __, __, __, __, __, __,  6, __, __, __, __, __  // y
    };
    return bucket[((unsigned int)(str[3]&0x1f)<<4)|(str[0]&0xf)];
}

Testing that with the code:

#include <stdio.h>
#include <string.h>

// Hash function here.

static char *months[] = {
    "January  ", "February ", "March    ", "April    ", "May      ", "June     ",
    "July     ", "August   ", "September", "October  ", "November ", "December "
};

int main (void) {
    int i;
    for (i = 0; i < sizeof(months)/sizeof(*months); i++)
        printf ("%-10s -> %2d\n", months[i], hash(months[i]));
    return 0;
}

shows that it's functionally correct:

January    ->  0
February   ->  1
March      ->  2
April      ->  3
May        ->  4
June       ->  5
July       ->  6
August     ->  7
September  ->  8
October    ->  9
November   -> 10
December   -> 11

but I want to know if it can be made faster.

Any suggestions out there? I'm open to any simple optimisations or even a total rewrite if there's something inherently bad with my hashing function.


I don't think this is that important but the final version will be using EBCDIC. The theory will still stand but the AND operation may change slightly since the characters have different code points. I'll be happy with any assistance only on the ASCII front since I'm confident whatever advice is offered will translate okay to EBCDIC.

Here's the smallest sequence I could find for EBCDIC-US:

It has 24 elements in the bucket and uses only 2 operations to compute the index:

static unsigned int hash (const char *str)
{
 static unsigned char tab[] = {
    11, 4,__, 7,__,__, 9, 1,
    __,__,__,__,__,__,__,__,
     3, 5, 2,10, 8,__, 0, 6
 };
 return tab[0x17 & (str[ 1 ] + str[ 2 ])];
}

Second best, 25 items with xor:

static unsigned int hash(const char *str)
{
 static unsigned char tab[] = {
  9,__,__, 7,__,__,11, 1,
 __, 4,__,__,__,__, 3,__,
 __, 5, 8,10, 0,__,__, 6, 2
 };
 return tab[0x1f & (str[ 1 ] ^ str[ 2 ])];
}

(Actually, tab[] should be 32 entries long here, because 0x1f can generate an overflow for incorrect inputs).


Update from Pax: For what it's worth, the first option worked perfectly for EBCDIC code page 500:

## Month     str[1] str[2] Lookup
-- --------- ------ ------ ------
 0 January   a (81) n (95)      0
 1 February  e (85) b (82)      1
 2 March     a (81) r (99)      2
 3 April     p (97) r (99)      3
 4 May       a (81) y (a8)      4
 5 June      u (a4) n (95)      5
 6 July      u (a4) l (93)      6
 7 August    u (a4) g (87)      7
 8 September e (85) p (97)      8
 9 October   c (83) t (a3)      9
10 November  o (96) v (a5)     10
11 December  e (85) c (83)     11

Has the use of C to implement other languages constrained their designs in any way?

18 votes

It seems that most new programming languages that have appeared in the last 20 years have been written in C. This makes complete sense as C can be seen as a sort of portable assembly language. But what I'm curious about is whether this has constrained the design of the languages in any way. What prompted my question was thinking about how the C stack is used directly in Python for calling functions. Obviously the programming language designer can do whatever they want in whatever language they want, but it seems to me that the language you choose to write your new language in puts you in a certain mindset and gives you certain shortcuts that are difficult to ignore. Are there other characteristics of these languages that come from being written in that language (good or bad)?

Even with a C implementation, you're surprisingly free in terms of implementation. For example, chicken scheme uses C as an intermediate, but still manages to use the stack as a nursery generation in its garbage collector.

That said, there are some cases where there are constraints. Case in point: The GHC haskell compiler has a perl script called the Evil Mangler to alter the GCC-outputted assembly code to implement some important optimizations. They've been moving to internally-generated assembly and LLVM partially for that reason. That said, this hasn't constrained the language design - only the compiler's choice of available optimizations.

Weird behavior of right shift operator

Asked on Tue, 03 Aug 2010 by ereOn c++ c
17 votes

I recently faced a strange behavior using the right-shift operator.

The following program:

#include <cstdio>
#include <cstdlib>
#include <iostream>
#include <stdint.h>

int foo(int a, int b)
{
   return a >> b;
}

int bar(uint64_t a, int b)
{
   return a >> b;
}

int main(int argc, char** argv)
{
    std::cout << "foo(1, 32): " << foo(1, 32) << std::endl;
    std::cout << "bar(1, 32): " << bar(1, 32) << std::endl;
    std::cout << "1 >> 32: " << (1 >> 32) << std::endl; //warning here
    std::cout << "(int)1 >> (int)32: " << ((int)1 >> (int)32) << std::endl; //warning here

    return EXIT_SUCCESS;
}

Outputs:

foo(1, 32): 1 // Should be 0 (but I guess I'm missing something)
bar(1, 32): 0
1 >> 32: 0
(int)1 >> (int)32: 0

What happens with the foo() function ? I understand that the only difference between what it does and the last 2 lines, is that the last two lines are evaluated at compile them. And why does it "work" if I use a 64 bits integer ?

Any lights regarding this will be greatly appreciated !


Surely related, here is what g++ gives:

> g++ -o test test.cpp
test.cpp: In function 'int main(int, char**)':
test.cpp:20:36: warning: right shift count >= width of type
test.cpp:21:56: warning: right shift count >= width of type

It's likely the CPU is actually computing

a >> (b % 32)

in foo; meanwhile, the 1 >> 32 is a constant expression, so the compiler will fold the constant at compile-time, which somehow gives 0.

Since the standard (C++98 §5.8/1) states that

The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand.

there is no contradiction having foo(1,32) and 1>>32 giving different results.

 

On the other hand, in bar you provided a 64-bit unsigned value, as 64 > 32 it is guaranteed the result must be 1 / 232 = 0. Nevertheless, if you write

bar(1, 64);

you may still get 1.


Edit: The logical right shift (SHR) behaves like a >> (b % 32/64) on x86/x86-64 (Intel #253667, Page 4-404):

The destination operand can be a register or a memory location. The count operand can be an immediate value or the CL register. The count is masked to 5 bits (or 6 bits if in 64-bit mode and REX.W is used). The count range is limited to 0 to 31 (or 63 if 64-bit mode and REX.W is used). A special opcode encoding is provided for a count of 1.

However, on ARM (armv6&7, at least), the logical right-shift (LSR) is implemented as (ARMISA Page A2-6)

(bits(N), bit) LSR_C(bits(N) x, integer shift)
    assert shift > 0;
    extended_x = ZeroExtend(x, shift+N);
    result = extended_x<shift+N-1:shift>;
    carry_out = extended_x<shift-1>;
    return (result, carry_out);

where (ARMISA Page AppxB-13)

ZeroExtend(x,i) = Replicate('0', i-Len(x)) : x

This guarantees a right shift of ≥32 will produce zero. For example, when this code is run on the iPhone, foo(1,32) will give 0.

These shows shifting a 32-bit integer by ≥32 is non-portable.

Vim different textwidth for multiline C comments?

17 votes

In our C++ code base we keep 99 column lines but 79-some-odd column multiline comments. Is there a good strategy to do this automagically? I assume the modes are already known because of smart comment line-joining and leading * insertion.

Apparently both code and comments use the same textwidth option. As far as I can see, the only trick is to set this option dynamically:

 :autocmd CursorMoved,CursorMovedI * :if match(getline(.), '^\s*\*') == 0 | :setlocal textwidth=79 | :else | :setlocal textwidth=99 | :endif

Here the critical part is detecting when we are in a comment. If you only format comments this way:

/*
 * my comment
 */

my regex should work... unless you have lines in the code starting with * (which I guess can happen in C, less frequently in C++). If you use comments like this:

// comment line 1
// comment line 2

the regex is even simpler to write. If you want to cover all possible situations, including corner cases, well... I guess the best thing would be to define a separate detection function and call that from the :autocmd instead of match().

I can use more memory than how much I've allocated with malloc(), why?

17 votes
char *cp = (char *) malloc(1);
strcpy(cp, "123456789");
puts(cp);

output is "123456789" on both gcc (Linux) and Visual C++ Express, does that mean when there is free memory, I can actually use more than what I've allocated with malloc()?

and why malloc(0) doesn't cause runtime error?

Thanks.

You've asked a very good question and maybe this will whet your appetite about operating systems. Already you know you've managed to achieve something with this code that you wouldn't ordinarily expect to do. So you would never do this in code you want to make portable.

To be more specific, and this depends entirely on your operating system and CPU architecture, the operating system allocates "pages" of memory to your program - typically this can be in the order of 4 kilobytes. The operating system is the guardian of pages and will immediately terminate any program that attempts to access a page it has not been assigned.

malloc, on the other hand, is not an operating system function but a C library call. It can be implemented in many ways. It is likely that your call to malloc resulted in a page request from the operating system. Then malloc would have decided to give you a pointer to a single byte inside that page. When you wrote to the memory from the location you were given you were just writing in a "page" that the operating system had granted your program, and thus the operating system will not see any wrong doing.

The real problems, of course, will begin when you continue to call malloc to assign more memory. It will eventually return pointers to the locations you just wrote over. This is called a "buffer overflow" when you write to memory locations that are legal (from an operating system perspective) but could potentially be overwriting memory another part of the program will also be using.

If you continue to learn about this subject you'll begin to understand how programs can be exploited using such "buffer overflow" techniques - even to the point where you begin to write assembly language instructions directly into areas of memory that will be executed by another part of your program.

When you get to this stage you'll have gained much wisdom. But please be ethical and do not use it to wreak havoc in the universe!

PS when I say "operating system" above I really mean "operating system in conjunction with privileged CPU access". The CPU and MMU (memory management unit) triggers particular interrupts or callbacks into the operating system if a process attempts to use a page that has not been allocated to that process. The operating system then cleanly shuts down your application and allows the system to continue functioning. In the old days, before memory management units and privileged CPU instructions, you could practically write anywhere in memory at any time - and then your system would be totally at the mercy of the consequences of that memory write!

Where can I start with programmable Hardware?

17 votes

I've had a desire to learn at least a tiny bit about programming hardware for quite some time now and thought I'd ask here to get some starting points. I am a reasonably accomplished programmer with Delphi and Objective-c experience but have never even listened to a device port / interupt (I dont even know the terminology) let alone programmed a piece of hardware.

To start with what I would like to be able to do is,

  • Buy a simple bit of kit with 2,3 or 10 buttons
  • Plug the device into my pc via USB
  • Listen to the device and write some code to do something once the button is pressed.

I reckon this is a good place to start, anyone got pointers on hardware to buy or how I could start this?

I like the Arduino, easy to use, open source and a great community!

Good to get started with, and uses a subset of C/C++.

Also, has alot of addon hardware available, like GPS, Bluetooth, Wifi etc

My experiences with Arduino have been nothing but good, from the point you get it out of it's box (and install the free compiler on either Windows / Mac / Linux), to building your first 'sketch' (a project or application for the Arduino).

Making an application is easy, you have a Setup Method, which is called on startup, and then a loop method which is looped while the Arduino is running.

Then all you have to do is hook either inputs or outputs up to the pins on the Arduino board, tell the code what they are and hopefully you'll get the desired output.

One other really good thing about the Arduino (and others I'm sure) is that you now have a use for those old broken printers, or 2x CD-Rom's that no one wants, and every other little bit of out dated technology. It's amazing what you can find in a server room!

Now, I have only worked on small projects, like plugging in an LCD, and reading the room temp and various projects like that. But based on what I have done, I am happy with the Ardunio, it gives a good base to embedded programming and if it's not enough, you can always go bigger!

My 2 cents!

What does "12345" + 2 do in C?

Asked on Wed, 25 Aug 2010 by Joe c
17 votes

I've seen this done in C before:

#define MY_STRING "12345"
...
#define SOMETHING (MY_STRING + 2)

What does SOMETHING get expanded to, here? Is this even legal? Or do they mean this?:

#define SOMETHING (MY_STRING[2])

String literals exist in the fixed data segment of the program, so they appear to the compiler as a type of pointer.

+-+-+-+-+-+--+
|1|2|3|4|5|\0|
+-+-+-+-+-+--+
 ^ MY_STRING
     ^ MY_STRING + 2

Howto check if a char* points to a string literal in C

16 votes

I have a struct

struct request {
  int code;
  char *message;
};

that I'd like to free properly.

I have the following function to do that:

void free_request(struct request *req) {
  if (req->message != NULL) {
      free(req->message);
  }
  free(req);
  req = NULL;
}

The problem is that I get an "free(): invalid pointer"/segfault error from the compiler when I try to free a request that has been created using a string literal:

struct request *req;
req = malloc(sizeof(struct request));
req->message = "TEST";
free_request(req);

Since I want to create request structs in different places, once using literals (on the client side) and once using *chars that I read from a socket (on the server side) I was wondering if there is a function to make sure that I don't try to free the literals while still allowing me to free the message I have created using a malloc.

There is no standard function that lets you know if a pointer was dynamically allocated or not. You should include a flag in your struct to inform yourself of it, or only use dynamically allocated strings (strdup is your friend in this case). Depending on your networking setup, it might be simpler to use strdup (well, to tell the truth, it is simpler to use the strdup at all).

With strdup:

struct message* req;
req = malloc(sizeof *req);
req->message = strdup("TEST");
free_request(req);

With a flag:

struct message
{
    int code;
    char* message;
    bool isStatical; // replace to 'char' if bool doesn't exist
};

void free_request(struct message* req)
{
    if (!req->isStatical) free(req->message);
    free(req);
}

struct message* req;
req = malloc(sizeof *req);
req->message = "TEST";
req->isStatical = 1;
free_request(req);

Also, don't forget to zero your allocated memory when you create an object. That could save you a lot of trouble.

req = malloc(sizeof *req);
memset(req, 0, sizeof *req);

That, and setting req to NULL from free_request won't have any effect. You either need to take a struct message** or do it yourself after the function calls.

where to document functions in C

16 votes

I have a C program with multiple files, so I have, for example, stuff.c which implements a few functions, and stuff.h with the function prototypes. How should I go about documenting the functions in comments? Should I have all the docs in the header file, all the docs in the .c file, or duplicate the docs for both? I like the latter approach, but then I run into problems where I'll update the docs on one of them and not the other (usually the one where I make the first modification, i.e. if I modify the header file first, then its comments will reflect that, but if I update the implementation, only those comments will change).

  • Put the information that people using the functions need to know in the header.

  • Put the information that maintainers of the functions need to know in the source code.

Would it break the language or existing code if we'd add safe signed/unsigned compares to C/C++?

14 votes

After reading this question on signed/unsigned compares (they come up every couple of days I'd say):

I wondered why we don't have proper signed unsigned compares and instead this horrible mess? Take the output from this small program:

#include <stdio.h>
#define C(T1,T2)\
 {signed   T1 a=-1;\
 unsigned T2 b=1;\
  printf("(signed %5s)%d < (unsigned %5s)%d = %d\n",#T1,(int)a,#T2,(int)b,(a<b));}\

 #define C1(T) printf("%s:%d\n",#T,(int)sizeof(T)); C(T,char);C(T,short);C(T,int);C(T,long);
int main()
{
 C1(char); C1(short); C1(int); C1(long); 
}

Compiled with my standard compiler (gcc, 64bit), I get this:

char:1
(signed  char)-1 < (unsigned  char)1 = 1
(signed  char)-1 < (unsigned short)1 = 1
(signed  char)-1 < (unsigned   int)1 = 0
(signed  char)-1 < (unsigned  long)1 = 0
short:2
(signed short)-1 < (unsigned  char)1 = 1
(signed short)-1 < (unsigned short)1 = 1
(signed short)-1 < (unsigned   int)1 = 0
(signed short)-1 < (unsigned  long)1 = 0
int:4
(signed   int)-1 < (unsigned  char)1 = 1
(signed   int)-1 < (unsigned short)1 = 1
(signed   int)-1 < (unsigned   int)1 = 0
(signed   int)-1 < (unsigned  long)1 = 0
long:8
(signed  long)-1 < (unsigned  char)1 = 1
(signed  long)-1 < (unsigned short)1 = 1
(signed  long)-1 < (unsigned   int)1 = 1
(signed  long)-1 < (unsigned  long)1 = 0

If I compile for 32 bit, the result is the same except that:

long:4
(signed  long)-1 < (unsigned   int)1 = 0

The "How?" of all this is easy to find: Just goto section 6.3 of the C99 standard or chapter 4 of C++ and dig up the clauses which describe how the operands are converted to a common type and this can break if the common type reinterprets negative values.

But what about the "Why?". As we can see, the '<' fails in 50% of all cases, also it depends on the concrete sizes of the types, so it is platform dependent. Here are some points to consider:

  • The convert & compare process is not really a prime example for the Rule of Least Surprise

  • I don't believe that there is code out there, which relies on the proposition that (short)-1 > (unsigned)1 and is not written by terrorists.

  • This is all terrible when you're in C++ with template code, because you need type trait magic to knit a correct "<".


After all, comparing signed and unsigned value of different types is easy to implement:

signed X < unsigned Y -> (a<(X)0) || ((Z)a<(Z)b) where Z=X|Y 

The pre-check is cheap and can also be optimized away by the compiler if a>=0 can statically be proven.

So here's my question:

Would it break the language or existing code if we'd add safe signed/unsigned compares to C/C++?

("Would it break the language" means would we need to make massive changes to different parts of the language to accommodate this change)


UPDATE: I've ran this on my good old Turbo-C++ 3.0 and got this output:

char:1
(signed  char)-1 < (unsigned  char)1 = 0

Why is (signed char)-1 < (unsigned char) == 0 here?

Yes it would break the language/existing code. The language, as you have noted, carefully specifies the behavior when signed and unsigned operands are used together. This behavior with comparison operators is essential for some important idioms, like:

if (x-'0' < 10U)

Not to mention things like (equality comparison):

size_t l = mbrtowc(&wc, s, n, &state);
if (l==-1) ... /* Note that mbrtowc returns (size_t)-1 on failure */

As an aside, specifying "natural" behavior for mixed signed/unsigned comparisons would also incur a significant performance penalty, even in programs which are presently using such comparisons in safe ways where they already have their "natural" behavior due to constraints on the input which the compiler would have a hard time determining (or might not be able to determine at all). In writing your own code to handle these tests, I'm sure you've already seen what the performance penalty would look like, and it's not pretty.

How does a variable in C/C++ work?

14 votes

How does a variable in C/C++ work?

I mean, a pointer stores an address from a variable and then you have to dereference it to access the object to which it refers, so I think that a variable is a pointer that is dereferenced automatically when used... does that make any sense?

BTW, I didn't find an answer to this on google...

Thank you

A variable is an abstraction (a convenient name) for a memory position on the computer. In C/C++ if the variable is of type int it will be a convenient name for a memory address holding an integer value.

And a variable is not a pointer automatically dereferenced. A variable just holds the value it is supposed to hold. If it is a pointer, it will hold a memory address, if it is an integer it will hold an integer value, if it is a float, it will hold a floating point number... And so on and so forth...

Is there a more efficient way to get the length of a 32bit integer in bytes?

14 votes

I'd like a shortcut for the following little function, where performance is very important (the function is called more than 10.000.000 times):

inline int len(uint32 val)
{
    if(val <= 0x000000ff) return 1;
    if(val <= 0x0000ffff) return 2;
    if(val <= 0x00ffffff) return 3;
    return 4;
} 

Does anyone have any idea... a cool bitoperation trick? Thanks for your help in advance!

How about this one?

inline int len(uint32 val)
{
    return 4
        - ((val & 0xff000000) == 0)
        - ((val & 0xffff0000) == 0)
        - ((val & 0xffffff00) == 0)
    ;
}

Removing the inline keyword, g++ -O2 compiles this to the following branchless code:

movl    8(%ebp), %edx
movl    %edx, %eax
andl    $-16777216, %eax
cmpl    $1, %eax
sbbl    %eax, %eax
addl    $4, %eax
xorl    %ecx, %ecx
testl   $-65536, %edx
sete    %cl
subl    %ecx, %eax
andl    $-256, %edx
sete    %dl
movzbl  %dl, %edx
subl    %edx, %eax

If you don't mind machine-specific solutions, you can use the bsr instruction which searches for the first 1 bit. Then you simply divide by 8 to convert bits to bytes and add 1 to shift the range 0..3 to 1..4:

int len(uint32 val)
{
    asm("mov 8(%ebp), %eax");
    asm("or  $255, %eax");
    asm("bsr %eax, %eax");
    asm("shr $3, %eax");
    asm("inc %eax");
    asm("mov %eax, 8(%ebp)");
    return val;
}

Note that I am not an inline assembly god, so maybe there's a better to solution to access val instead of addressing the stack explicitly. But you should get the basic idea.

The GNU compiler also has an interesting built-in function called __builtin_clz:

inline int len(uint32 val)
{
    return ((__builtin_clz(val | 255) ^ 31) >> 3) + 1;
}

This looks much better than the inline assembly version to me :)

Performance differences between Python and C

11 votes

Working on different projects I have the choice of selecting different programming languages, as long as the task is done.

I was wondering what the real difference is, in terms of performance, between writing a program in Python, versus doing it in C.

The tasks to be done are pretty varied, e.g. sorting textfiles, disk access, network access, textfile parsing.

Is there really a noticeable difference between sorting a textfile using the same algorithm in C versus Python, for example?

And in your experience, given the power of current CPU's (i7), is it really a noticeable difference (Consider that its a program that doesnt bring the system to its knees).

Thanks! :)

Use python until you have a performance problem. If you ever have one figure out what the problem is (often it isn't what you would have guessed up front). Then solve that specific performance problem which will likely be an algorithm or data structure change. In the rare case that your problem really needs C then you can write just that portion in C and use it from your python code.

What should a developer know before building cellphone apps?

10 votes

I want to start making cellphone apps with Android as first choice but not the only one. I have 10 years of experience with Java, C#, C++ in commercial applications and I know that many things and practices for this applications are not valid for cellphones. Where do I start reading? How do I adapt my way of thinking to this new environment as quickly as posible? I plan to make some money with it sometime in the future as an extra income or a career change maybe, who know. Any resource or advice you could recommend will be very welcome. Thanks in advance.

Just start with Android Developer site http://developer.android.com/index.html. It contains all you need for the beginning. Also take a look onto Commonsware android books, those are really great both for beginners and experienced programmers - http://commonsware.com/books.html.

What's so special about file descriptor 3 on linux?

10 votes

I'm working on a server application that's going to work on Linux and Mac OS X. It goes like this:

  • start main application
  • fork of the controller process
  • call lock_down() in the controller process
  • terminate main application
  • the controller process then forks again, creating a worker process
  • eventually the controller keeps forking more worker processes

I can log using several of methods (e.g. syslog or a file) but right now I'm pondering about syslog. The "funny" thing is that no syslog output is ever seen in the controller process unless I include the #ifdef section below.

The worker processes logs flawlessly in Mac OS X and linux with or without the ifdef'ed section below. The controller also logs flawlessly in Mac OS X without the #ifdef'ed section, but on linux the ifdef is needed if I want to see any output into syslog (or the log file for that matter) from the controller process.

So, why is that?

static int
lock_down(void)
{
    struct rlimit rl;
    unsigned int n;
    int fd0;
    int fd1;
    int fd2;

    // Reset file mode mask
    umask(0);

    // change the working directory
    if ((chdir("/")) < 0)
        return EXIT_FAILURE;

    // close any and all open file descriptors
    if (getrlimit(RLIMIT_NOFILE, &rl))
        return EXIT_FAILURE;
    if (RLIM_INFINITY == rl.rlim_max)
        rl.rlim_max = 1024;

    for (n = 0; n < rl.rlim_max; n++) {
#ifdef __linux__        
        if (3 == n) // deep magic...
            continue;
#endif
        if (close(n) && (EBADF != errno))
            return EXIT_FAILURE;
    }

    // attach file descriptors 0, 1 and 2 to /dev/null
    fd0 = open("/dev/null", O_RDWR);
    fd1 = dup2(fd0, 1);
    fd2 = dup2(fd0, 2);
    if (0 != fd0)
        return EXIT_FAILURE;

    return EXIT_SUCCESS;
}

camh was close, but using closelog() was the idea that did the trick so the honor goes to jilles. Something else, aside from closing a file descriptor from under syslogs feet must go on though. To make the code work I added a call to closelog() just before the loop:

closelog();
for (n = 0; n < rl.rlim_max; n++) {
    if (close(n) && (EBADF != errno))
        return EXIT_FAILURE;
}

I was relying on a verbatim understanding of the manual page, saying:

The use of openlog() is optional; it will automatically be called by syslog() if necessary...

I interpreted this as saying that syslog would detect if the file descriptor was closed under it. Apparently it did not. An explicit closelog() on linux was needed to tell syslog that the descriptor was closed.

One more thing that still perplexes me is that not using closelog() prevented the first forked process (the controller) from even opening and using a log file. The following forked processes could use syslog or a log file with no problems. Maybe there are some caching effect in the filesystem that make the first forked process having an unreliable "idea" of which file descriptors are available, while the next set of forked process are sufficiently delayed to not be affected by this?

syslog(3) may keep a file descriptor to syslogd's socket open; closing this under its feet is likely to cause problems. A closelog(3) call may help.

Finding security problems in a given code

8 votes

Can some one please tell me an approach for finding security flaws in a given code. For ex: in a given socket program. Any good examples or good book recommendations are welcome.

Thanks & Regards,

Mousey

The lowest hanging fruit in this category would be to simply search the source for functions which are commonly misused or are difficult use safely such as:

  • strcpy
  • strcat
  • sprintf
  • gets

then start looking at ones that are not inherintly too bad, but could be misused. Particularly anything that writes to a buffer can potentially be hazardous if misused.

  • memcpy
  • memmove
  • recv/read
  • send/write
  • the entire printf family should always have a constant for the format string

NOTE: all of these (except gets) can be used correctly, so don't think it's a flaw just because the function is used, instead take a look at how it is used. Also note that gets is always a flaw.

NOTE2: this list is not exhaustive, do a little research about commonly misused functions and how they can be avoided.

As far as tools, I recommend things like valgrind and splint