Best c questions in January 2011

Which one will execute faster, if (flag==0) or if (0==flag)?

70 votes

Interview question: Which one will execute faster, if (flag==0) or if (0==flag)? Why?

I haven't seen any correct answer yet (and there are already some) caveat: Nawaz did point out the user-defined trap. And I regret my hastily cast upvote on "stupidest question" because it seems that many did not get it right and it gives room for a nice discussion on compiler optimization :)

The answer is:

What is flag's type?

In the case where flag actually is a user-defined type. Then it depends on which overload of operator== is selected. Of course it can seem stupid that they would not be symmetric, but it's certainly allowed, and I have seen other abuses already.

If flag is a built-in, then both should take the same speed.

From the Wikipedia article on x86, I'd bet for a Jxx instruction for the if statement: perhaps a JNZ (Jump Near if not zero) or some equivalent.

I'd doubt the compiler misses such an obvious optimization, even with optimizations turned off. This is the type of things for which Peephole Optimization is designed for.

EDIT: Sprang up again, so let's add some assembly (LLVM 2.7 IR)

int regular(int c) {
  if (c == 0) { return 0; }
  return 1;
}

int yoda(int c) {
  if (0 == c) { return 0; }
  return 1;
}

define i32 @regular(i32 %c) nounwind readnone {
entry:
  %not. = icmp ne i32 %c, 0                       ; <i1> [#uses=1]
  %.0 = zext i1 %not. to i32                      ; <i32> [#uses=1]
  ret i32 %.0
}

define i32 @yoda(i32 %c) nounwind readnone {
entry:
  %not. = icmp ne i32 %c, 0                       ; <i1> [#uses=1]
  %.0 = zext i1 %not. to i32                      ; <i32> [#uses=1]
  ret i32 %.0
}

Even if one does not know how to read the IR, I think it is self explanatory.

Why is "a" != "a" in C?

47 votes
void main() {
    if("a" == "a")
      printf("Yes, equal");  
    else
      printf("No, not equal");
}

Why is the output No, not equal?

What you are comparing are the two memory addresses for the different strings, which are stored in different locations. Use the following code to compare two string values:

#include <string.h>

...

if(strcmp("a", "a") == 0)
{
    // Equal
}

EDIT:

In response to the comments, "a" == "a" may indeed return true, depending on your compiler, which may combine equal strings at compile time into one to save space.

When you're comparing two character values (not pointers), it is a literal comparison. For example:

'a' == 'A' // not true

Which functions in the C standard library commonly encourage bad practice?

34 votes

Hello all,

This is inspired by this question and the comments on one particular answer in that I learnt that strncpy is not a very safe string handling function in C and that it pads zeros, until it reaches n, something I was unaware of.

Specifically, to quote R..

strncpy does not null-terminate, and does null-pad the whole remainder of the destination buffer, which is a huge waste of time. You can work around the former by adding your own null padding, but not the latter. It was never intended for use as a "safe string handling" function, but for working with fixed-size fields in Unix directory tables and database files. snprintf(dest, n, "%s", src) is the only correct "safe strcpy" in standard C, but it's likely to be a lot slower. By the way, truncation in itself can be a major bug and in some cases might lead to privilege elevation or DoS, so throwing "safe" string functions that truncate their output at a problem is not a way to make it "safe" or "secure". Instead, you should ensure that the destination buffer is the right size and simply use strcpy (or better yet, memcpy if you already know the source string length).

And from Jonathan Leffler

Note that strncat() is even more confusing in its interface than strncpy() - what exactly is that length argument, again? It isn't what you'd expect based on what you supply strncpy() etc - so it is more error prone even than strncpy(). For copying strings around, I'm increasingly of the opinion that there is a strong argument that you only need memmove() because you always know all the sizes ahead of time and make sure there's enough space ahead of time. Use memmove() in preference to any of strcpy(), strcat(), strncpy(), strncat(), memcpy().

So, I'm clearly a little rusty on the C standard library. Therefore, I'd like to pose the question:

What C standard library functions are used inappropriately/in ways that may cause/lead to security problems/code defects/inefficiencies?

In the interests of objectivity, I have a number of criteria for an answer:

  • Please, if you can, cite design reasons behind the function in question i.e. its intended purpose.
  • Please highlight the misuse to which the code is currently put.
  • Please state why that misuse may lead towards a problem. I know that should be obvious but it prevents soft answers.

Please avoid:

  • Debates over naming conventions of functions (except where this unequivocably causes confusion).
  • "I prefer x over y" - preference is ok, we all have them but I'm interested in actual unexpected side effects and how to guard against them.

As this is likely to be considered subjective and has no definite answer I'm flagging for community wiki straight away.

I am also working as per C99.

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

The CERT C Secure Coding Standard lists many of these pitfalls you asked about.

Why pre-increment operator gives rvalue in C ?

19 votes

In C++, pre-increment operator gives lvalue because incremented object itself is returned, not a copy. But in C, it gives rvalue. Why?

C doesn't have references. In C++ ++i returns a reference to i (lvalue) whereas in C it returns a copy(incremented).

C99 6.5.3.1/2

The value of the operand of the prefix ++ operator is incremented. The result is the new value of the operand after incrementation. The expression ++Eis equivalent to (E+=1).

‘‘value of an expression’’ <=> rvalue

However for historical reasons I think "references not being part of C" could be a possible reason.

Why does the "static" keyword have so many meanings in C and C++?

18 votes

As we know, the keyword static has multiple meanings in C. C99 added the possibility of legally writing

void foo (int arr[static 50])
{
    // ...
}

which adds to the confusion, and C++ has static member variables and functions.

This would not be so troublesome if all the uses could be connected in some way, but I find it hard to find that link for some of the cases. Particularly why the static keyword should be used to modify visibility (linkage), or what on earth it's got to do with an array's minimum amount of elements.

So is there a historical reason for the abuse of the static keyword, or is there a secret link under the hood that connects all of its uses?

Adding new keywords to a language breaks backwards compatibility. So static gets used where its use might possibly mean something ( int arr[static 50] vs int arr[auto 50] or int arr[extern 50] ) and cannot syntactically appear in that location based its use in previous versions.

Though in that case adding a not_less_than context sensitive keyword in that position would not break previous code, it would add another keyword (so simple text editors which are keyword aware but not syntax aware would not know whether or not it is a keyword), and break the 'keywords are not context sensitive' simplification made in C.

How to stop time from running backwards on Linux?

18 votes

Here's a little test I've written to verify that time does indeed only run forwards in Linux.

#include <time.h>
#include <sys/time.h>  

bool timeGoesForwardTest2()
{
   timeval tv1, tv2;   
   double startTime = getTimeSeconds();  // my function

   while ( getTimeSeconds() - startTime < 5 )
   {
      gettimeofday( &tv1, NULL );  
      gettimeofday( &tv2, NULL );  

      if ( tv2.tv_usec == tv1.tv_usec &&
           tv2.tv_sec == tv1.tv_sec )
      {
         continue;  // Equal times are allowed.
      }

      // tv2 should be greater than tv1
      if ( !( tv2.tv_usec>tv1.tv_usec ||
              tv2.tv_sec-1 == tv1.tv_sec ) )
      {
         printf( "tv1: %d %d\n", int( tv1.tv_sec ), int( tv1.tv_usec ) );
         printf( "tv2: %d %d\n", int( tv2.tv_sec ), int( tv2.tv_usec ) );
         return false;
      }         
   }
   return true;
}

Test fails with the result.

 tv1: 1296011067 632550
 tv2: 1296011067 632549

ummm....

Why does this happen?

Here's my setup:

Linux version 2.6.35-22-generic (buildd@rothera) (gcc version 4.4.5 (Ubuntu/Linaro 4.4.4-14ubuntu4) ) #33-Ubuntu SMP Sun Sep 19 20:34:50 UTC 2010 (Ubuntu 2.6.35-22.33-generic 2.6.35.4)
... running inside VirtualBox 3.2.12, in Windows 7.

There is an open issue at the VirtualBox Bug Tracker. They link to a blog post stating why you shouldn't use gettimeofday() to measure the passage of time:

The most portable way to measure time correctly seems to be clock_gettime(CLOCK_MONOTONIC, ...)

Python (and Python C API): __new__ versus __init__

16 votes

The question I'm about to ask seems to be a duplicate of Python's use of __new__ and __init__ ?, but regardless, it's still unclear to me exactly what the practical difference between __new__ and __init__ is.

Before you rush to tell me that __new__ is for creating objects and __init__ is for initializing objects, let me be clear: I get that. In fact, that distinction is quite natural to me, since I have experience in C++ where we have placement new, which similarly separates object allocation from initialization.

The Python C API tutorial explains it like this:

The new member is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the new() method. ... One reason to implement a new method is to assure the initial values of instance variables.

So, yeah - I get what __new__ does, but despite this, I still don't understand why it's useful in Python. The example given says that __new__ might be useful if you want to "assure the initial values of instance variables". Well, isn't that exactly what __init__ will do?

In the C API tutorial, an example is shown where a new Type (called a "Noddy") is created, and the Type's __new__ function is defined. The Noddy type contains a string member called first, and this string member is initialized to an empty string like so:

static PyObject * Noddy_new(PyTypeObject *type, PyObject *args, PyObject *kwds)
{
    .....

    self->first = PyString_FromString("");
    if (self->first == NULL)
    {
       Py_DECREF(self);
       return NULL;
    }

    .....
}

Note that without the __new__ method defined here, we'd have to use PyType_GenericNew, which simply initializes all of the instance variable members to NULL. So the only benefit of the __new__ method is that the instance variable will start out as an empty string, as opposed to NULL. But why is this ever useful, since if we cared about making sure our instance variables are initialized to some default value, we could have just done that in the __init__ method?

The difference mainly arises with mutable vs immutable types.

__new__ accepts a type as the first argument, and (usually) returns a new instance of that type. Thus it is suitable for use with both mutable and immutable types.

__init__ accepts an instance as the first argument and modifies the attributes of that instance. This is inappropriate for an immutable type, as it would allow them to be modified after creation by calling obj.__init__(*args).

Compare the behaviour of tuple and list:

>>> x = (1, 2)
>>> x
(1, 2)
>>> x.__init__([3, 4])
>>> x # tuple.__init__ does nothing
(1, 2)
>>> y = [1, 2]
>>> y
[1, 2]
>>> y.__init__([3, 4])
>>> y # list.__init__ reinitialises the object
[3, 4]

As to why they're separate (aside from simple historical reasons): __new__ methods require a bunch of boilerplate to get right (the initial object creation, and then remembering to return the object at the end). __init__ methods, by contrast, are dead simple, since you just set whatever attributes you need to set.

Odd C interview question

15 votes

Possible Duplicate:
How to write program during compiling?

Hi guys. I found this problem on a site full of interview questions, and was stumped by it. Is there some preprocessor directive that allows one to read from standard input during compilation?

Write a small C program, which while compiling takes another program from input terminal, and on running gives the result for the second program. (NOTE: The key is, think UNIX). Suppose, the program is 1.c Then, while compiling

$ cc -o 1 1.c 
int main() { printf("Hello World\n"); } ^D 
$ ./1
Hello World

EDIT It turns out this question is an exact duplicate. How to write program during compiling?

#include "/dev/stdin" is the trick.

A silly interview question at best.

What's the syntactically proper way to declare a C struct?

15 votes

I've seen C structs declared several different ways before. Why is that and what, if anything, does each do different?

For example:

struct foo {
  short a;
  int b;
  float c;
};

typedef struct {
  short d;
  int e;
  float f;
} bar;

typedef struct _baz {
  short a;
  int b;
  float c;
} baz;

int main (int argc, char const *argv[])
{
  struct foo a;
  bar b;
  baz c;

  return 0;
}

Well, the obvious difference is demonstrated in your main:

struct foo a;
bar b;
baz c;

The first declaration is of an un-typedefed struct and needs the struct keyword to use. The second is of a typedefed anonymous struct, and so we use the typedef name. The third combines both the first and the second: your example uses baz (which is conveniently short) but could just as easily use struct _baz to the same effect.

Update: larsmans' answer mentions a more common case where you have to use at least struct x { } to make a linked list. The second case wouldn't be possible here (unless you abandon sanity and use a void * instead) because the struct is anonymous, and the typedef doesn't happen until the struct is defined, giving you no way to make a (type-safe) pointer to the struct type itself. The first version works fine for this use, but the third is generally preferred in my experience. Give him some rep for that.

A more subtle difference is in namespace placement. In C, struct tags are placed in a separate namespace from other names, but typedef names aren't. So the following is legal:

struct test {
  // contents
};

struct test *test() {
  // contents
}

But the following is not, because it would be ambiguous what the name test is:

typedef struct {
  // contents
} test;

test *test() {
  // contents
}

typedef makes the name shorter (always a plus), but it puts it in the same namespace as your variables and functions. Usually this isn't an issue, but it is a subtle difference beyond the simple shortening.

How to find (all) integer overflows in a C program?

14 votes

I am working on a large project that generally works just fine, but shows serious issues once the input data size exceeds some limitations.

These issues are (suspected) only due to signed integer overflows like these:

int a, o;
// Initialize a and o
int x = (a+o) >> 1);

Obviously, once the sum of a and o overflows (gets larger than 2^31-1), x is no longer the mean of a and o.

Is there a generic way to find all of these integer overflows in a running program?

I am thinking of a tool like Valgrind or a GDB extension that breaks at every integer arithmetic instruction, takes the parameters and compares the correct result (calculated with a larger-sized datatype or arbitrary-precision arithmetic) with the actual result. If the results differ, it should output a warning, trigger a debug break or something like this.

I know, how to check a single arithmetic instruction for overflows (e.g. checking the sign for additions), however due to the vast amount of code, it is not viable solution for me to go through the whole project and insert checking code everywhere by hand.

For large code base, Coverity is a good tool. I am not sure it will detect all integer overflows or not, but its worth giving a try.

Why does OSX document atoi/atof as not being threadsafe?

14 votes

I understand that strtol and strtof are preferred to atoi/atof, since the former detect errors, and also strtol is much more flexible than atoi when it comes to non-base-10.

But I'm still curious about something: 'man atoi' (or atof) on OS X (though not on Linux!) mentions that atoi/atof are not threadsafe. I frankly have a hard time imagining a possible implementation of atoi or atof that would not be threadsafe. Does anybody know why the man page says this? Are these functions actually unsafe on OS X or any other platform? And if they are, why on earth wouldn't the library just define atoi in terms of strtol, and therefore be safe?

Taking a look at the manual page on MacOS X 10.6.6, it documents two functions, atof() and atof_l(), and I suspect that gives a hint as to why the function is deemed not thread-safe:

SYNOPSIS

#include <stdlib.h>
double atof(const char *str);

#include <xlocale.h>
double atof_l(const char *str, locale_t loc);

DESCRIPTION

The atof() function converts the initial portion of the string pointed to by str to double representation.

It is equivalent to:

      strtod(str, (char **)NULL);

The decimal point character is defined in the program's locale (category LC_NUMERIC).

While the atof() function uses the current locale, the atof_l() function may be passed a locale directly. See xlocale(3) for more information.

IMPLEMENTATION NOTES

The atof() function is not thread-safe and also not async-cancel-safe.

The atof() function has been deprecated by strtod() and should not be used in new code.

ERRORS

The function atof() need not affect the value of errno on an error.

My suspicion is that if the current locale is changed by another thread while the atof() function is executing, the result is not guaranteed. Otherwise, there seems to be no reason for the warning.


I've poked around for a definitive location of the Darwin C library source code, but have not found one. If you go to the FreeBSD source code for atoi(), it is clear that the function implementation is trivial:

int
atoi(str)
    const char *str;
{
    return (int)strtol(str, (char **)NULL, 10);
}

(Yes, not even using a prototyped definition!)

The man page for strtol() does not have the weasel wording about thread safety or async-cancel safety. However, a quick look at the source code for strtol() shows that it uses isspace(), which is affected by locale:

ISO/IEC 9899:1999, Section 7.11.1.1 The setlocale function

187 The only functions in 7.4 whose behavior is not affected by the current locale are isdigit and isxdigit.

(Where §7.4 is for <ctype.h>.)

Now, while I'm not sure that this code is identical to what's in Darwin (MacOS X), it is likely to be similar. I think that there could be room for errata in the man pages - it is not so clear whether the page that needs correction is the one for atoi() or the one for strtol().

What is a VM and why do dynamic languages need one?

14 votes

So, for example, Python and Java have a VM, C and Haskell do not. (Correct me if I'm wrong)

Thinking about what languages on both sides of the line have, I can't find the reason. Java is static in a lot of ways, while Haskell provides a lot of dynamic features.

Let's forget about VMs for a sec (we'll get back to those below, I promise), and start with this important fact:

C doesn't have garbage collection.

For a language to provide garbage collection, there has to be some sort of "runtime"/runtime-environment/thing that will perform it.

That's why Python, Java, and Haskell require a "runtime", and C, which does not, can just straight-forwardly compile to native code.

Note that psyco was a Python optimizer that compiled Python code to machine code, however, a lot of that machine code consisted of calls to C-Python's runtime's functions, such as PyImport_AddModule, PyImport_GetModuleDict, etc.

Haskell/GHC is in a similar boat to psyco-compiled Python. Ints are added as simple machine instructions, but more complicated stuff which allocate objects etc, invoke the runtime.

What else?

C doesn't have "exceptions"

If we were to add exceptions to C, our generated machine code would need to do some stuff for every function and for every function call.

If we then add "closures" as well, there would be more stuff added.

Now, instead of having this boilerplate machine code repeated in every function, we could make it instead call a subprocedure to do the necessary stuff, something like PyErr_Occurred.

So now, basically every original source line maps to some calls to some functions and a smaller unique part.

But as long as we're doing so much stuff per original source code line, why even bother with machine code?

Here's an idea (btw let's call this idea a "Virtual Machine").

Let's represent your Python code, which is for example:

def has_no_letters(text):
  return text.upper() == text.lower()

As an in-memory data-structure, for example:

{ 'func_name': 'has_no_letters',
  'num_args': 1,
  'kwargs': [],
  'codez': [
    ('get_attr', 'tmp_a', 'arg_0', 'upper'),  # tmp_a = arg_0.upper
    ('func_call', 'tmp_b', 'tmp_a', []),  # tmp_b = tmp_a() # tmp_b = arg_0.upper()
    ('get_attr', 'tmp_c', 'arg_0', 'lower'),
    ('func_call', 'tmp_d', 'tmp_c', []),
    ('get_global', 'tmp_e', '=='),
    ('func_call', 'tmp_f', 'tmp_e', ['tmp_b', 'tmp_d']),
    ('return', 'tmp_f'),
  ]
}

Now, let's write an interpreter that executes this in-memory data structure.

Let's discuss the benefits of this over direct-from-text-interpreters, and then the benefits over compiling to machine code.

The benefits of VMs over direct-from-text-interpreters

  • The VM system gives you all the syntax errors before executing the code.
  • When evaluating a loop, a VM system doesn't parse the source code each time it runs.
    • Making the VM faster than the direct-from-text-interpreter.
    • So the direct interpreter runs slower with long variable name, and faster with short variable names. This encourages people to write crappy mathematician-style code such as wt(f, d(o, e), s) <= th(i, s) + cr(a, p * d + o)

The benefits of VMs over compiling to machine code

  • The in-memory data structure describing the program, or the "VM code", will probably be much more compact than boilerplate-full machine code which does the same stuff again and again for every original line of code. This will make the VM system run faster because less "instructions" will need to be fetched from memory.
  • Creating a VM is much simpler than creating a compiler to machine code. You can probably do this now without even knowing any assembly/machine-code.

calling c function from assembly

14 votes

I'm trying to use a function in assembly in a C project, the function is supposed to call a libc function let's say printf() but I keep getting a segmentation fault.

In the .c file I have the declaration of the function let's say

int do_shit_in_asm()

In the .asm file I have

.extern printf
.section .data
         printtext:
              .ascii "test"
.section .text
.global do_shit_in_asm
.type do_shit_in_asm, @function

do_shit_in_asm:
    pushl %ebp
    movl %esp, %ebp
    push printtext
    call printf
    movl %ebp, %esp
    pop %ebp
ret

Any pointers comments would be appreciated.

as func.asm -o func.o

gcc prog.c func.o -o prog

Change push printtext to push $printtext.

As it is, you're loading a value from the address printtext and pushing that, rather than pushing the address. Thus, you're passing 'test' as a 32-bit number, rather than a pointer, and printf is trying to interpret that as an address and crashing.

How will _exit behave in a C++ program?

12 votes

C99 offers the _Exit function, which exits "immediately", although it does may close file descriptors. Unix/POSIX extends this behavior by mandating the closing of all fd's without flushing (and offers the synonym _exit).

Will these functions call destructors for static objects when called from a C++ program? Does the C++ standard make any guarantees about _Exit?

(Inspired by this question; I suddenly wondered what happens in the typical fork-exec-_exit idiom in C++.)

It simply doesn't exist in standard C++, so there are no guarantees.

It is planned for inclusion in C++0x. That specifies (§18.5):

The function _Exit(int status) has additional behavior in this International Standard:

— The program is terminated without executing destructors for objects of automatic, thread, or static storage duration and without calling functions passed to atexit() (3.6.3).

Memory-efficient line stitching in very large images

12 votes

Background

I work with very large datasets from Synthetic Aperture Radar satellites. These can be thought of as high dynamic range greyscale images of the order of 10k pixels on a side.

Recently, I've been developing applications of a single-scale variant of Lindeberg's scale-space ridge detection algorithm method for detecting linear features in a SAR image. This is an improvement on using directional filters or using the Hough Transform, methods that have both previously been used, because it is less computationally expensive than either. (I will be presenting some recent results at JURSE 2011 in April, and I can upload a preprint if that would be helpful).

The code I currently use generates an array of records, one per pixel, each of which describes a ridge segment in the rectangle to bottom right of the pixel and bounded by adjacent pixels.

struct ridge_t { unsigned char top, left, bottom, right };
int rows, cols;
struct ridge_t *ridges;  /* An array of rows*cols ridge entries */

An entry in ridges contains a ridge segment if exactly two of top, left, right and bottom have values in the range 0 - 128. Suppose I have:

ridge_t entry;
entry.top = 25; entry.left = 255; entry.bottom = 255; entry.right = 76;

Then I can find the ridge segment's start (x1,y1) and end (x2,y2):

float x1, y1, x2, y2;
x1 = (float) col + (float) entry.top / 128.0;
y1 = (float) row;
x2 = (float) col + 1;
y2 = (float) row + (float) entry.right / 128.0;

When these individual ridge segments are rendered, I get an image something like this (a very small corner of a far larger image):

Rendered ridge segments

Each of those long curves are rendered from a series of tiny ridge segments.

It's trivial to determine whether two adjacent locations which contain ridge segments are connected. If I have ridge1 at (x, y) and ridge2 at (x+1, y), then they are parts of the same line if 0 <= ridge1.right <= 128 and ridge2.left = ridge1.right.

Problem

Ideally, I would like to stitch together all of the ridge segments into lines, so that I can then iterate over each line found in the image to apply further computations. Unfortunately, I'm finding it hard to find an algorithm for doing this which is both low complexity and memory-efficient and suitable for multiprocessing (all important consideration when dealing with really huge images!)

One approach that I have considered is scanning through the image until I find a ridge which only has one linked ridge segment, and then walking the resulting line, flagging any ridges in the line as visited. However, this is unsuitable for multiprocessing, because there's no way to tell if there isn't another thread walking the same line from the other direction (say) without expensive locking.

What do readers suggest as a possible approach? It seems like the sort of thing that someone would have figured out an efficient way to do in the past...

I'm not entirely sure this is correct, but I thought I'd throw it out for comment. First, let me introduce a lockless disjoint set algorithm, which will form an important part of my proposed algorithm.

Lockless disjoint set algorithm

I assume the presence of a two-pointer-sized compare-and-swap operation on your choice of CPU architecture. This is available on x86 and x64 architectures at the least.

The algorithm is largely the same as described on the Wikipedia page for the single threaded case, with some modifications for safe lockless operation. First, we require that the rank and parent elements to both be pointer-sized, and aligned to 2*sizeof(pointer) in memory, for atomic CAS later on.

Find() need not change; the worst case is that the path compression optimization will fail to have full effect in the presence of simultaneous writers.

Union() however, must change:

function Union(x, y)
  redo:
    x = Find(x)
    y = Find(y)
    if x == y
        return
    xSnap = AtomicRead(x) -- read both rank and pointer atomically
    ySnap = AtomicRead(y) -- this operation may be done using a CAS
    if (xSnap.parent != x || ySnap.parent != y)
        goto redo
    -- Ensure x has lower rank (meaning y will be the new root)
    if (xSnap.rank > ySnap.rank)
        swap(xSnap, ySnap)
        swap(x, y)
    -- if same rank, use pointer value as a fallback sort
    else if (xSnap.rank == ySnap.rank && x > y)
        swap(xSnap, ySnap)
        swap(x, y)
    yNew = ySnap
    yNew.rank = max(yNew.rank, xSnap.rank + 1)
    xNew = xSnap
    xNew.parent = y
    if (!CAS(y, ySnap, yNew))
      goto redo
    if (!CAS(x, xSnap, xNew))
      goto redo
    return

This should be safe in that it will never form loops, and will always result in a proper union. We can confirm this by observing that:

  • First, prior to termination, one of the two roots will always end up with a parent pointing to the other. Therefore, as long as there is no loop, the merge succeeds.
  • Second, rank always increases. After comparing the order of x and y, we know x has lower rank than y at the time of the snapshot. In order for a loop to form, another thread would need to have increased x's rank first, then merged x and y. However in the CAS that writes x's parent pointer, we check that rank has not changed; therefore, y's rank must remain greater than x.

In the event of simultaneous mutation, it is possible that y's rank may be increased, then return to redo due to a conflict. However, this implies that either y is no longer a root (in which case rank is irrelevant) or that y's rank has been increased by another process (in which case the second go around will have no effect and y will have correct rank).

Therefore, there should be no chance of loops forming, and this lockless disjoint-set algorithm should be safe.

And now on to the application to your problem...

Assumptions

I make the assumption that ridge segments can only intersect at their endpoints. If this is not the case, you will need to alter phase 1 in some manner.

I also make the assumption that co-habitation of a single integer pixel location is sufficient for ridge segments can be connected. If not, you will need to change the array in phase 1 to hold multiple candidate ridge segments+disjoint-set pairs, and filter through to find ones that are actually connected.

The disjoint set structures used in this algorithm shall carry a reference to a line segment in their structures. In the event of a merge, we choose one of the two recorded segments arbitrarily to represent the set.

Phase 1: Local line identification

We start by dividing the map into sectors, each of which will be processed as a seperate job. Multiple jobs may be processed in different threads, but each job will be processed by only one thread. If a ridge segment crosses a sector, it is split into two segments, one for each sector.

For each sector, an array mapping pixel position to a disjoint-set structure is established. Most of this array will be discarded later, so its memory requirements should not be too much of a burden.

We then proceed over each line segment in the sector. We first choose a disjoint set representing the entire line the segment forms a part of. We first look up each endpoint in the pixel-position array to see if a disjoint set structure has already been assigned. If one of the endpoints is already in this array, we use the assigned disjoint set. If both are in the array, we perform a merge on the disjoint sets, and use the new root as our set. Otherwise, we create a new disjoint-set, and associate with the disjoint-set structure a reference to the current line segment. We then write back into the pixel-position array our new disjoint set's root for each of our endpoints.

This process is repeated for each line segment in the sector; by the end, we will have identified all lines completely within the sector by a disjoint set.

Note that since the disjoint sets are not yet shared between threads, there's no need to use compare-and-swap operations yet; simply use the normal single-threaded union-merge algorithm. Since we do not free any of the disjoint set structures until the algorithm completes, allocation can also be made from a per-thread bump allocator, making memory allocation (virtually) lockless and O(1).

Once a sector is completely processed, all data in the pixel-position array is discarded; however data corresponding to pixels on the edge of the sector is copied to a new array and kept for the next phase.

Since iterating over the entire image is O(x*y), and disjoint-merge is effectively O(1), this operation is O(x*y) and requires working memory O(m+2*x*y/k+k^2) = O(x*y/k+k^2), where t is the number of sectors, k is the width of a sector, and m is the number of partial line segments in the sector (depending on how often lines cross borders, m may vary significantly, but it will never exceed the number of line segments). The memory carried over to the next operation is O(m + 2*x*y/k) = O(x*y/k)

Phase 2: Cross-sector merges

Once all sectors have been processed, we then move to merging lines that cross sectors. For each border between sectors, we perform lockless merge operations on lines that cross the border (ie, where adjacent pixels on each side of the border have been assigned to line sets).

This operation has running time O(x+y) and consumes O(1) memory (we must retain the memory from phase 1 however). Upon completion, the edge arrays may be discarded.

Phase 3: Collecting lines

We now perform a multi-threaded map operation over all allocated disjoint-set structure objects. We first skip any object which is not a root (ie, where obj.parent != obj). Then, starting from the representative line segment, we move out from there and collect and record any information desired about the line in question. We are assured that only one thread is looking at any given line at a time, as intersecting lines would have ended up in the same disjoint-set structure.

This has O(m) running time, and memory usage dependent on what information you need to collect about these line segments.

Summary

Overall, this algorithm should have O(x*y) running time, and O(x*y/k + k^2) memory usage. Adjusting k gives a tradeoff between transient memory usage on the phase 1 processes, and the longer-term memory usage for the adjacency arrays and disjoint-set structures carried over into phase 2.

Note that I have not actually tested this algorithm's performance in the real world; it is also possible that I have overlooked concurrency issues in the lockless disjoint-set union-merge algorithm above. Comments welcome :)

Write the prototype for a C function that takes an array of exactly 16 integers

11 votes

One of the interview questions asked me to "write the prototype for a C function that takes an array of exactly 16 integers" and I was wondering what it could be? Maybe a function declaration like this:

void foo(int a[], int len);

Or something else?

And what about if the language was C++ instead?

In C, this requires a pointer to an array of 16 integers:

void special_case(int (*array)[16]);

It would be called with:

int array[16];
special_case(&array);

In C++, you can use a reference to an array, too, as shown in Nawaz's answer. (The question asks for C in the title, and originally only mentioned C++ in the tags.)


Any version that uses some variant of:

void alternative(int array[16]);

ends up being equivalent to:

void alternative(int *array);

which will accept any size of array, in practice.


The question is asked - does special_case() really prevent a different size of array from being passed. The answer is 'Yes'.

void special_case(int (*array)[16]);

void anon(void)
{

    int array16[16];
    int array18[18];
    special_case(&array16);
    special_case(&array18);
}

The compiler (GCC 4.5.2 on MacOS X 10.6.6, as it happens) complains (warns):

$ gcc -c xx.c
xx.c: In function ‘anon’:
xx.c:9:5: warning: passing argument 1 of ‘special_case’ from incompatible pointer type
xx.c:1:6: note: expected ‘int (*)[16]’ but argument is of type ‘int (*)[18]’
$

Change to GCC 4.2.1 - as provided by Apple - and the warning is:

$ /usr/bin/gcc -c xx.c
xx.c: In function ‘anon’:
xx.c:9: warning: passing argument 1 of ‘special_case’ from incompatible pointer type
$

The warning in 4.5.2 is better, but the substance is the same.

How to make a process aware of other processes of the same program

10 votes

I must write a program that must be aware of another instance of itself running on that machine, and communicate with it, then die. I want to know if there is a canonical way of doing that in Linux.

My first thought was to write a file containing the PID of the process somewere, and look for that file every time the program executes, but where is the "right" place and name for that file? Is there a better, or more "correct" way?

Then I must communicate, saying the user tried to run it, but since there is another instance it will hand over the job and exit. I thought of just sending a signal, like SIGUSR1, but that would not allow me to send more information, like the X11 display from where the user executed the second process. How to send this info?

The program is linked against Gtk, so a solution that uses the glib is OK.

Putting the pid in a file is a common way of achieving this. For daemons ("system programs"), the common place to put such a file is /var/run/PROGRAM.pid. For user programs, put the pid file hidden in the user's homedir (if the program also has configuration files, then put both config files and the pid file in a subdir of the home dir).

Sending information to the "master" instance is most commonly achieved using Unix domain sockets, also known as local sockets. With a socket, you won't need a pid file (if no-one listens on the socket, the process knows it's master).

Rounding differences on Windows vs Unix based system in sprintf

9 votes

I have problem on UNIX based systems sprintf does not round up properly value.

For example

double tmp = 88888888888885.875
char out[512];

Thats 88,888,888,888,885.875 just to be easier on eyes. I am giving such specific and big example because it seems it works fine on smaller numbers.

I am trying to use it in following way

sprintf(out, "%021.2f", tmp);
printf("out = %s\n", tmp);

On windows this results in:

out = 000088888888888885.88

On for example AIX, but shows in Linux as well:

out = 000088888888888885.87

Why is this happening? Any ideas and how to make it behave same way on Win/Unix

Thanks

There is a bug report for glibc with a problem very similar to yours. The main conclusion (in comment 46) here is that double is not a 15-decimal-digit number and you should not expect it to work like that.

As a workaround you can add something small to your numbers to make them round better. But this solution is not general because it depends on number ranges you deal with.

Another workaround can be multiplying to prepare them for rounding, then rounding (e.g. 2597.625*100 = 259762.5 -> 259763 = 2597.63*100)

However I think there must be smarter workarounds.

Change read/write permissions on a file descriptor

7 votes

I'm working on a linux C project and I'm having trouble working with file descriptors.

I have an orphan file descriptor (the file was open()'d then unlink()'d but the fd is still good) that has write-only permission. The original backing file had full permissions (created with S_IRUSR | S_IWUSR | S_IRGRP | S_IROTH), but alas the file was opened with O_WRONLY. Is it possible to duplicate the file descriptor and change the copy to O_RDWR?

psudo-code:


//open orphan file
int fd = open(fname, O_WRONLY, ...)
unlink(fname)
//fd is still good, but I can't read from it

//...

//I want to be able to read from orphan file
int fd2 = dup(fd)
//----change fd2 to read/write???----

Thanks in advance! -Andrew

No, there is no POSIX function to change the open mode. You will need to open it in read / write mode. Since you are created a temporary file, though, I strongly recommend that you use mkstemp. That function properly opens the file in read/write mode and unlinks it. Most importantly, it avoids a race condition in naming and creating the file, thereby avoiding a vulnerability in the creation of temporary files.

Why is the output of my forking program different when I pipe its output?

7 votes

I was looking at some simple code on fork, and decided to try it out for myself. I compiled and then ran it from inside Emacs, and got a different output to that output produced from running it in Bash.

#include <unistd.h>
#include <stdio.h>

int main() {
  if (fork() != 0) {
    printf("%d: X\n", getpid());
  }

  if (fork() != 0) {
    printf("%d: Y\n", getpid());
  }

  printf("%d: Z\n", getpid());
}

I compiled it with gcc, and then ran a.out from inside Emacs, as well as piping it to cat, and grep ., and got this.

2055: X
2055: Y
2055: Z
2055: X
2058: Z
2057: Y
2057: Z
2059: Z

This isn't right. Running it just from Bash I get (which I expected)

2084: X
2084: Y
2084: Z
2085: Y
2085: Z
2087: Z
2086: Z

edit - missed some newlines

What's going on?

The order in which different processes write their output is entirely unpredictable. So the only surprise is that sometimes the "X" print statement sometimes happens twice.

I believe this is because sometimes at the second fork(), an output line including "X" is in an output buffer, needing to be flushed. So both processes eventually print it. Since getpid() was already called and converted into the string, they'll show the same pid.

I was able to reproduce multiple "X" lines, but if I add fflush(stdout); just before the second fork(), I always only see one "X" line and always a total of 7 lines.