Best questions in November 2010

Why doesn't this code simply print letters A to Z?

160 votes
<?php
for ($i = 'a'; $i <= 'z'; $i++)
    echo "$i\n";

This snippet gives the following output (newlines are replaced by spaces):

a b c d e f g h i j k l m n o p q r s t u v w x y z aa ab ac ad ae af ag ah ai aj ak al am an ao ap aq ar as at au av aw ax ay az ba bb bc bd be bf bg bh bi bj bk bl bm bn bo bp bq br bs bt bu bv bw bx by bz ca cb cc cd ce cf cg ch ci cj ck cl cm cn co cp cq cr cs ct cu cv cw cx cy cz da db dc dd de df dg dh di dj dk dl dm dn do dp dq dr ds dt du dv dw dx dy dz ea eb ec ed ee ef eg eh ei ej ek el em en eo ep eq er es et eu ev ew ex... on to yz

From the docs:

PHP follows Perl's convention when dealing with arithmetic operations on character variables and not C's.

For example, in Perl 'Z'+1 turns into 'AA', while in C 'Z'+1 turns into '[' ( ord('Z') == 90, ord('[') == 91 ).

Note that character variables can be incremented but not decremented and even so only plain ASCII characters (a-z and A-Z) are supported.

Why does (0 < 5 <3) return true?

149 votes

This may be a stupid question but I was playing around in jsfiddle.net and I'm curious as to why this returns true?

if(0 < 5 < 3) {
    alert("True");
}

So does this -

if(0 < 5 < 2) {
    alert("True");
}

But this doesn't -

if(0 < 5 < 1) {
    alert("True");
}

Edit

I suppose the next question is is this quirk ever useful?

Order of operations causes (0 < 5 < 3) to be interpreted in javascript as ((0 < 5) < 3) which produces (true < 3) and true is counted as 1, causing it to return true.

This is also why (0 < 5 < 1) returns false, (0 < 5) returns true, which is interpreted as 1, resulting in (1 < 1).

How can I improve my paw detection?

85 votes

After my previous question on finding toes within each paw, I started loading up other measurements to see how it would hold up. Unfortunately, I quickly ran into a problem with one of the preceding steps: recognizing the paws.

You see, my proof of concept basically took the maximal pressure of each sensor over time and would start looking for the sum of each row, until it finds on that != 0.0. Then it does the same for the columns and as soon as it finds more than 2 rows with that are zero again. It stores the minimal and maximal row and column values to some index.

alt text

As you can see in the figure, this works quite well in most cases. However, there are a lot of downsides to this approach (other than being very primitive):

  • Humans can have 'hollow feet' which means there are several empty rows within the footprint itself. Since I feared this could happen with (large) dogs too, I waited for at least 2 or 3 empty rows before cutting off the paw.

    This creates a problem if another contact made in a different column before it reaches several empty rows, thus expanding the area. I figure I could compare the columns and see if they exceed a certain value, they must be separate paws.

  • The problem gets worse when the dog is very small or walks at a higher pace. What happens is that the front paw's toes are still making contact, while the hind paw's toes just start to make contact within the same area as the front paw!

    With my simple script, it won't be able to split these two, because it would have to determine which frames of that area belong to which paw, while currently I would only have to look at the maximal values over all frames.

Examples of where it starts going wrong:

alt text alt text

So now I'm looking for a better way of recognizing and separating the paws (after which I'll get to the problem of deciding which paw it is!).

Update:

I've been tinkering to get Joe's (awesome!) answer implemented, but I'm having difficulties extracting the actual paw data from my files.

alt text

The coded_paws shows me all the different paws, when applied to the maximal pressure image (see above). However, the solution goes over each frame (to separate overlapping paws) and sets the four Rectangle attributes, such as coordinates or height/width.

I can't figure out how to take these attributes and store them in some variable that I can apply to the measurement data. Since I need to know for each paw, what its location is during which frames and couple this to which paw it is (front/hind, left/right).

So how can I use the Rectangles attributes to extract these values for each paw?

I have the measurements I used in the question setup in my public Dropbox folder (example 1, example 2, example 3). For anyone interested I also set up a blog to keep you up to date :-)

If you're just wanting (semi) contiguous regions, there's already an easy implementation in Python: SciPy's ndimage.morphology module. This is a fairly common image morphology operation.


Basically, you have 5 steps:

def find_paws(data, smooth_radius=5, threshold=0.0001):
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    thresh = data > threshold
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    coded_paws, num_paws = sp.ndimage.label(filled)
    data_slices = sp.ndimage.find_objects(coded_paws)
    return object_slices
  1. Blur the input data a bit to make sure the paws have a continuous footprint. (It would be more efficient to just use a larger kernel (the structure kwarg to the various scipy.ndimage.morphology functions) but this isn't quite working properly for some reason...)

  2. Threshold the array so that you have a boolean array of places where the pressure is over some threshold value (i.e. thresh = data > value)

  3. Fill any internal holes, so that you have cleaner regions (filled = sp.ndimage.morphology.binary_fill_holes(thresh))

  4. Find the separate contiguous regions (coded_paws, num_paws = sp.ndimage.label(filled)). This returns an array with the regions coded by number (each region is a contiguous area of a unique integer (1 up to the number of paws) with zeros everywhere else)).

  5. Isolate the contiguous regions using data_slices = sp.ndimage.find_objects(coded_paws). This returns a list of tuples of slice objects, so you could get the region of the data for each paw with [data[x] for x in data_slices]. Instead, we'll draw a rectangle based on these slices, which takes slightly more work.


The two animations below show your "Overlapping Paws" and "Grouped Paws" example data. This method seems to be working perfectly. (And for whatever it's worth, this runs much more smoothly than the GIF images below on my machine, so the paw detection algorithm is fairly fast...)

Overlapping Paws Grouped Paws


Here's a full example (now with much more detailed explanations). The vast majority of this is reading the input and making an animation. The actual paw detection is only 5 lines of code.

import numpy as np
import scipy as sp
import scipy.ndimage

import matplotlib.pyplot as plt
from matplotlib.patches import Rectangle

def animate(input_filename):
    """Detects paws and animates the position and raw data of each frame
    in the input file"""
    # With matplotlib, it's much, much faster to just update the properties
    # of a display object than it is to create a new one, so we'll just update
    # the data and position of the same objects throughout this animation...

    infile = paw_file(input_filename)

    # Since we're making an animation with matplotlib, we need 
    # ion() instead of show()...
    plt.ion()
    fig = plt.figure()
    ax = fig.add_subplot(111)
    fig.suptitle(input_filename)

    # Make an image based on the first frame that we'll update later
    # (The first frame is never actually displayed)
    im = ax.imshow(infile.next()[1])

    # Make 4 rectangles that we can later move to the position of each paw
    rects = [Rectangle((0,0), 1,1, fc='none', ec='red') for i in range(4)]
    [ax.add_patch(rect) for rect in rects]

    title = ax.set_title('Time 0.0 ms')

    # Process and display each frame
    for time, frame in infile:
        paw_slices = find_paws(frame)

        # Hide any rectangles that might be visible
        [rect.set_visible(False) for rect in rects]

        # Set the position and size of a rectangle for each paw and display it
        for slice, rect in zip(paw_slices, rects):
            dy, dx = slice
            rect.set_xy((dx.start, dy.start))
            rect.set_width(dx.stop - dx.start + 1)
            rect.set_height(dy.stop - dy.start + 1)
            rect.set_visible(True)

        # Update the image data and title of the plot
        title.set_text('Time %0.2f ms' % time)
        im.set_data(frame)
        im.set_clim([frame.min(), frame.max()])
        fig.canvas.draw()

def find_paws(data, smooth_radius=5, threshold=0.0001):
    """Detects and isolates contiguous regions in the input array"""
    # Blur the input data a bit so the paws have a continous footprint 
    data = sp.ndimage.uniform_filter(data, smooth_radius)
    # Threshold the blurred data (this needs to be a bit > 0 due to the blur)
    thresh = data > threshold
    # Fill any interior holes in the paws to get cleaner regions...
    filled = sp.ndimage.morphology.binary_fill_holes(thresh)
    # Label each contiguous paw
    coded_paws, num_paws = sp.ndimage.label(filled)
    # Isolate the extent of each paw
    data_slices = sp.ndimage.find_objects(coded_paws)
    return data_slices

def paw_file(filename):
    """Returns a iterator that yields the time and data in each frame
    The infile is an ascii file of timesteps formatted similar to this:

    Frame 0 (0.00 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0

    Frame 1 (0.53 ms)
    0.0 0.0 0.0
    0.0 0.0 0.0
    ...
    """
    with open(filename) as infile:
        while True:
            try:
                time, data = read_frame(infile)
                yield time, data
            except StopIteration:
                break

def read_frame(infile):
    """Reads a frame from the infile."""
    frame_header = infile.next().strip().split()
    time = float(frame_header[-2][1:])
    data = []
    while True:
        line = infile.next().strip().split()
        if line == []:
            break
        data.append(line)
    return time, np.array(data, dtype=np.float)

if __name__ == '__main__':
    animate('Overlapping paws.bin')
    animate('Grouped up paws.bin')
    animate('Normal measurement.bin')

Update: As far as identifying which paw is in contact with the sensor at what times, the simplest solution is to just do the same analysis, but use all of the data at once. (i.e. stack the input into a 3D array, and work with it, instead of the individual time frames.) Because SciPy's ndimage functions are meant to work with n-dimensional arrays, we don't have to modify the original paw-finding function at all.

# This uses functions (and imports) in the previous code example!!
def paw_regions(infile):
    # Read in and stack all data together into a 3D array
    data, time = [], []
    for t, frame in paw_file(infile):
        time.append(t)
        data.append(frame)
    data = np.dstack(data)
    time = np.asarray(time)

    # Find and label the paw impacts
    data_slices, coded_paws = find_paws(data, smooth_radius=4)

    # Sort by time of initial paw impact... This way we can determine which
    # paws are which relative to the first paw with a simple modulo 4.
    # (Assuming a 4-legged dog, where all 4 paws contacted the sensor)
    data_slices.sort(key=lambda dat_slice: dat_slice[2].start)

    # Plot up a simple analysis
    fig = plt.figure()
    ax1 = fig.add_subplot(2,1,1)
    annotate_paw_prints(time, data, data_slices, ax=ax1)
    ax2 = fig.add_subplot(2,1,2)
    plot_paw_impacts(time, data_slices, ax=ax2)
    fig.suptitle(infile)

def plot_paw_impacts(time, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Group impacts by paw...
    for i, dat_slice in enumerate(data_slices):
        dx, dy, dt = dat_slice
        paw = i%4 + 1
        # Draw a bar over the time interval where each paw is in contact
        ax.barh(bottom=paw, width=time[dt].ptp(), height=0.2, 
                left=time[dt].min(), align='center', color='red')
    ax.set_yticks(range(1, 5))
    ax.set_yticklabels(['Paw 1', 'Paw 2', 'Paw 3', 'Paw 4'])
    ax.set_xlabel('Time (ms) Since Beginning of Experiment')
    ax.yaxis.grid(True)
    ax.set_title('Periods of Paw Contact')

def annotate_paw_prints(time, data, data_slices, ax=None):
    if ax is None:
        ax = plt.gca()

    # Display all paw impacts (sum over time)
    ax.imshow(data.sum(axis=2).T)

    # Annotate each impact with which paw it is
    # (Relative to the first paw to hit the sensor)
    x, y = [], []
    for i, region in enumerate(data_slices):
        dx, dy, dz = region
        # Get x,y center of slice...
        x0 = 0.5 * (dx.start + dx.stop)
        y0 = 0.5 * (dy.start + dy.stop)
        x.append(x0); y.append(y0)

        # Annotate the paw impacts         
        ax.annotate('Paw %i' % (i%4 +1), (x0, y0),  
            color='red', ha='center', va='bottom')

    # Plot line connecting paw impacts
    ax.plot(x,y, '-wo')
    ax.axis('image')
    ax.set_title('Order of Steps')

alt text


alt text


alt text

77 votes

Many times, when generating messages to show to the user, the message will contain a number of something that I want to inform the customer about.

I'll give an example: The customer has selected a number of items from 1 and up, and has clicked delete. Now I want to give a confirmation message to the customer, and I want to mention the number of items he has selected to minimize the chance of him making a mistake by selecting a bunch of items and clicking delete when he only wants to delete one of them.

One way is to make the generic message like this:

int noofitemsselected = SomeFunction();
string message = "You have selected " + noofitemsselected + " item(s). Are you sure you want to delete it/them?";

The "problem" here is the case where noofitemselected is 1, and we have to write item and it instead of items and them.

My normal solution will be something like this

int noofitemsselected = SomeFunction();
string message = "You have selected " + noofitemsselected + " " + (noofitemsselected==1?"item" : "items") + ". Are you sure you want to delete " + (noofitemsselected==1?"it" : "them") + "?";

This gets quite long and quite nasty really fast if there are many references to the numbers plurality inside the code, and the actual message gets hard to read.

So my questions is simply. Are there any better ways of generating messages like this?

EDIT

I see a lot of persons has got very hung up in the case that I mentioned that the message should be displayed inside a message box, and has simply given an answer of how to avoid using the message box at all, and that is all good.

But remember that the problem of pluralization also apply to texts other places in the program in addition to message boxes. For example, a label alongside a grid displaying the number of lines selected in the grid will have the same problem regarding pluralization.

So this basically apply to most text that is outputted in some way from programs, and then the solution is not as simple as to just change the program to not output text anymore :)

If there is ever any chance, no matter how small, that this app will need to be translated to other languages then both are wrong. The correct way of doing this is:

string message = ( noofitemsselected==1 ?
  "You have selected " + noofitemsselected + " item. Are you sure you want to delete it?":
  "You have selected " + noofitemsselected + " items. Are you sure you want to delete them?"
);

This is because different languages handle plurality differently. Some like Malay don't even have syntactic plurals so the strings would generally be identical. Separating the two strings makes it easier to support other languages later on.

Otherwise if this app is meant to be consumed by the general public and is supposed to be user friendly then the second method is preferable. Sorry but I don't really know a shorter way of doing this.

If this app is meant to be consumed only internally by your company then do the shortcut "item(s)" thing. You don't really have to impress anybody when writing enterprisy code. But I'd advise against doing this for publicly consumed app because this gives the impression that the programmer is lazy and thus lower their opinion of the quality of the app. Trust me, small things like this matter.

What Android 3rd-party libraries are there?

66 votes

I'd like to gather a list of 3rd-party libraries for Android. Since there is no collection of libraries for Android I thought it might a good idea to create one. I'm only interested in libraries you have experience with and know that are working well especially [and|or] exclusively on Android.

I'm going to start right away so you can see what I mean.

It'd be nice if you all could up-vote the actual list (accepted answer) so it gets "pinned" to top.


Droid-fu

Description

A great utility extension created for easing Android developers daily work.

Type

utiliy classes, ready-to-use components

URLs

Droid-fu on GitHub


Updated the list

I'm going to cumulate the answers here in a sorted list so no information gets lost.

Current List

Utility

Persistance

Network

Barcode/QR-Code/Image Processing

Contacts/Social Network

Payment

UI stuff

Mixed/Allround

Maps

Game Engines/3D stuff

Image Processing and Graphics

Translation

Testing

TTS / STT

Help! I've learned jQuery... now I want to learn JavaScript

61 votes

I am a self-taught web developer/programmer. I started out about two years ago by learning how to make simple dynamic websites with HTML/CSS/PHP. Then I started dabbling with animation...

Enter jQuery

I've become quite proficient with jQuery over the last year and I've even started making my own plugins. I've spent most of my effort learning how to beautify websites with fancy effects and what not.

Upon tackling my first full-blown application, I realized how under-developed my knowledge of JavaScript actually is. jQuery has allowed me to rely on its framework so heavily that I rarely use any interesting functions, techniques, or whatever that are 'native' to the JavaScript language.

For example:

I have a basic understanding of what a closure is... but I am unsure where this technique can actually benefit me. Although as I understand it, that's what my jQuery plugins do with (function ($){//plugin code here})(jQuery). I've seen many posts/blogs/whatever about memory leaks and circular references which is concerning.

I'm frustrated because I can wrap my head around the basic concepts of what these are just by reading the articles, but I'm finding that the deeper I go the more I don't understand. The vocabulary alone is burdensome. Let alone how to actually use these techniques/functions/language features.


I am trying to figure out what I don't know

I'm looking to gather any advice, techniques, articles, books, videos, snippets, examples, potential pitfalls... really anything you have regarding application development with JavaScript.

There are several books about just javascript and not jquery that should be able to help.

javascript the definitive guide

javascript: the good parts

secrets of a javascript ninja (thanks to James Kovacs for this recommendation)

and many more...

Here's some specific articles on 'advanced javscript'.

memoization

memory leaks, circular references, and closures

and just about everything else

What does this CSS do?

55 votes

I'm seeing CSS that is font: 12px/18px.... what does it do exactly?

12px is the font size, 18px is the line height. This only works when you're setting the font shorthand property. In other words, it simply expands to the following:

font-size: 12px;
line-height: 18px;

If you set the line height to a relative value (e.g. percentage or ems), it's relative to the font size.

W3C CSS font property reference

Which is faster: x<<1 or x<<10 ?

50 votes

I don't want to optimize anything, I swear, I just want to ask this question out of curiosity. I know that on most hardware there's an assembly command of bit-shift (e.g. shl, shr), which is a single command. But does it matter (nanosecond-wise, or CPU-tact-wise) how many bits you shift. In other words, is either of the following faster on any CPU?

x << 1;

And:

x << 10;

And please don't hate me for this question. :)

Potentially depends on the CPU.

However, all modern CPUs (x86, ARM) use a "barrel shifter" -- a hardware module specifically designed to perform arbitrary shifts in constant time.

So the bottom line is... no. No difference.

What does copying an object mean? What are the copy constructor and the copy assignment operator? When do I need to declare them myself? How can I prevent my objects from being copied?

Introduction

C++ treats variables of user-defined types with value semantics. This means that objects are implicitly copied in various contexts, and we should understand what "copying an object" actually means.

Let us consider a simple example:

class person
{
    std::string name;
    int age;

public:

    person(const std::string& name, int age) : name(name), age(age)
    {
    }
};

int main()
{
    person a("Bjarne Stroustrup", 60);
    person b(a);   // What happens here?
    b = a;         // And here?
}

(If you are puzzled by the name(name), age(age) part, this is called a member initializer list.)

Special member functions

What does it mean to copy a person object? The main function shows two distinct copying scenarios. The initialization person b(a); is performed by the copy constructor. Its job is to construct a fresh object based on the state of an existing object. The assignment b = a is performed by the copy assignment operator. Its job is generally a little more complicated, because the target object is already in some valid state that needs to be dealt with.

Since we declared neither the copy constructor nor the assignment operator (nor the destructor) ourselves, these are implicitly defined for us. Quote from the standard:

The [...] copy constructor and copy assignment operator, [...] and destructor are special member functions. [ Note: The implementation will implicitly declare these member functions for some class types when the program does not explicitly declare them. The implementation will implicitly define them if they are used. [...] end note ] [n3126.pdf section 12 §1]

By default, copying an object means copying its members:

The implicitly-defined copy constructor for a non-union class X performs a memberwise copy of its subobjects. [n3126.pdf section 12.8 §16]

The implicitly-defined copy assignment operator for a non-union class X performs memberwise copy assignment of its subobjects. [n3126.pdf section 12.8 §30]

Implicit definitions

The implicitly-defined special member functions for person look like this:

    // 1. copy constructor
    person(const person& that) : name(that.name), age(that.age)
    {
    }

    // 2. copy assignment operator
    person& operator=(const person& that)
    {
        name = that.name;
        age = that.age;
        return *this;
    }

    // 3. destructor
    ~person()
    {
    }

Memberwise copying is exactly what we want in this case: name and age are copied, so we get a self-contained, independent person object. The implicitly-defined destructor is always empty. This is also fine in this case since we did not acquire any resources in the constructor. The members' destructors are implicitly called after the person destructor is finished:

After executing the body of the destructor and destroying any automatic objects allocated within the body, a destructor for class X calls the destructors for X's direct [...] members [n3126.pdf 12.4 §6]

Managing resources

So when should we declare those special member functions explicitly? When our class manages a resource, that is, when an object of the class is responsible for that resource. That usually means the resource is acquired in the constructor (or passed into the constructor) and released in the destructor.

Let us go back in time to pre-standard C++. There was no such thing as std::string, and programmers were in love with pointers. The person class might have looked like this:

class person
{
    char* name;
    int age;

public:

    // the constructor acquires a resource:
    // in this case, dynamic memory obtained via new[]
    person(const char* the_name, int the_age)
    {
        name = new char[strlen(the_name) + 1];
        strcpy(name, the_name);
        age = the_age;
    }

    // the destructor must release this resource via delete[]
    ~person()
    {
        delete[] name;
    }
};

Even today, people still write classes in this style and get into trouble: "I pushed a person into a vector and now I get crazy memory errors!" Remember that by default, copying an object means copying its members, but copying the name member merely copies a pointer, not the character array it points to! This has several unpleasant effects:

  1. Changes via a can be observed via b.
  2. Once b is destroyed, a.name is a dangling pointer.
  3. If a is destroyed, deleting the dangling pointer yields undefined behavior.
  4. Since the assignment does not take into account what name pointed to before the assignment, sooner or later you will get memory leaks all over the place.

Explicit definitions

Since memberwise copying does not have the desired effect, we must define the copy constructor and the copy assignment operator explicitly to make deep copies of the character array:

    // 1. copy constructor
    person(const person& that)
    {
        name = new char[strlen(that.name) + 1];
        strcpy(name, that.name);
        age = that.age;
    }

    // 2. copy assignment operator
    person& operator=(const person& that)
    {
        if (this != &that)
        {
            delete[] name;
            // This is a dangerous point in the flow of execution!
            // We have temporarily invalidated the class invariants,
            // and the next statement might throw an exception,
            // leaving the object in an invalid state :(
            name = new char[strlen(that.name) + 1];
            strcpy(name, that.name);
            age = that.age;
        }
        return *this;
    }

Note the difference between initialization and assignment: we must tear down the old state before assigning to name to prevent memory leaks. Also, we have to protect against self-assignment of the form x = x. Without that check, delete[] name would delete the array containing the source string, because when you write x = x, both this->name and that.name contain the same pointer.

Exception safety

Unfortunately, this solution will fail if new char[...] throws an exception due to memory exhaustion. One possible solution is to introduce a local variable and reorder the statements:

    // 2. copy assignment operator
    person& operator=(const person& that)
    {
        char* local_name = new char[strlen(that.name) + 1];
        // If the above statement throws,
        // the object is still in the same state as before.
        // None of the following statements will throw an exception :)
        strcpy(local_name, that.name);
        delete[] name;
        name = local_name;
        age = that.age;
        return *this;
    }

This also takes care of self-assignment without an explicit check. An even more robust solution to this problem is the copy-and-swap idiom, but I will not go into the details of exception safety here. I only mentioned exceptions to make the following point: Writing classes that manage resources is hard.

Noncopyable resources

Some resources cannot or should not be copied, such as file handles or a mutexes. In that case, simply declare the copy constructor and copy assignment operator as private without giving a definition:

private:

    person(const person& that);
    person& operator=(const person& that);

Alternatively, you can inherit from boost::noncopyable or declare them as deleted (C++0x):

    person(const person& that) = delete;
    person& operator=(const person& that) = delete;

The rule of three

Sometimes you need to implement a class that manages a resource. (Never manage multiple resources in a single class, this will only lead to pain.) In that case, remember the rule of three:

If you need to explicitly declare either the destructor, copy constructor or copy assignment operator yourself, you probably need to explicitly declare all three of them.

(Unfortunately, this "rule" is not enforced by the C++ standard or any compiler I am aware of.)

Advice

Most of the time, you do not need to manage a resource yourself, because an existing class such as std::string already does it for you. Just compare the simple code using a std::string member to the convoluted and error-prone alternative using a char* and you should be convinced. As long as you stay away from raw pointer members, the rule of three is unlikely to concern your own code.

Why should casting be avoided?

47 votes

I generally avoid casting types as much as possible since I am under the impression that it's poor coding practice and may incur a performance penalty.

But if someone asked me to explain why exactly that is, i would probably look at them like a deer in headlights.

So why/when is casting bad?

Is it general for java, c#, c++ or does every different runtime environment deal with it on it's own terms?

Specifics for a any language are welcome, example why is it bad in c++?

You've tagged this with three languages, and the answers are really quite different between the three. Discussion of C++ more or less implies discussion of C casts as well, and that gives (more or less) a fourth answer.

Since it's the one you didn't mention explicitly, I'll start with C. C casts have a number of problems. One is that they can do any of a number of different things. In some cases, the cast does nothing more than tell the compiler (in essence): "shut up, I know what I'm doing" -- i.e., it ensures that even when you do a conversion that could cause problems, the compiler won't warn you about those potential problems. Just for example, char a=(char)123456;. The exact result of this implementation defined (depends on the size and signedness of char), and except in rather strange situations, probably isn't useful. C casts also vary in whether they're something that happens only at compile time (i.e., you're just telling the compiler how to interpret/treat some data) or something that happens at run time (e.g., an actual conversion from double to long).

C++ attempts to deal with that to at least some extent by adding a number of "new" cast operators, each of which is restricted to only a subset of the capabilities of a C cast. This makes it more difficult to (for example) accidentally do a conversion you really didn't intend -- if you only intend to cast away constness on an object, you can use const_cast, and be sure that the only thing it can affect is whether an object is const, volatile, or not. Conversely, a static_cast is not allowed to affect whether an object is const or volatile. In short, you have most of the same types of capabilities, but they're categorized so one cast can generally only do one kind of conversion, where a single C-style cast can do two or three conversions in one operation. The primary exception is that you can use a dynamic_cast in place of a static_cast in at least some cases and despite being written as a dynamic_cast, it'll really end up as a static_cast. For example, you can use dynamic_cast to traverse up or down a class hierarchy -- but a cast "up" the hierarchy is always safe, so it can be done statically, while a cast "down" the hierarchy isn't necessarily safe so it's done dynamically.

Java and C# are much more similar to each other. In particular, with both of them casting is (virtually?) always a run-time operation. In terms of the C++ cast operators, it's usually closest to a dynamic_cast in terms of what's really done -- i.e., when you attempt to cast an object to some target type, the compiler inserts a run-time check to see whether that conversion is allowed, and throw an exception if it's not. The exact details (e.g., the name used for the "bad cast" exception) varies, but the basic principle remains mostly similar (though, if memory serves, Java does make casts applied to the few non-object types like int much closer to C casts -- but these types are used rarely enough that 1) I don't remember that for sure, and 2) even if it's true, it doesn't matter much anyway).

Looking at things more generally, the situation's pretty simple (at least IMO): a cast (obviously enough) means you've converting something from one type to another. When/if you do that, it raises the question "Why?" If you really want something to be a particular type, why didn't you define it to be that type to start with? That's not to say there's never a reason to do such a conversion, but anytime it happens, it should prompt the question of whether you could re-design the code so the correct type was used throughout. Even seemingly innocuous conversions (e.g., between integer and floating point) should be examined much more closely than is common. Despite their seeming similarity, integers should really be used for "counted" types of things and floating point for "measured" kinds of things. Ignoring the distinction is what leads to some of the crazy statements like "the average American family has 1.8 children." Even though we can all see how that happens, the fact is that no family has 1.8 children. They might have 1 or they might 2 or they might have more than that -- but never 1.8.

functional programming - Is Immutability Expensive ?

39 votes

Request:

  1. The question is two part. The first is conceptual, comparing functional and imperative programming from the perspective of cost of immutability. Second, about specifics of java/scala.
  2. Quicksort is meant to be in-memory sort. No arguments there. But, how does one implement such a thing in a PURE functional way. specifically in Scala.

Question:

Coming from a heavy imperative background (C++, java). I have been exploring functional programming, more specifically Scala. Though, I might be a funct nube, I thought it might be a good idea to ask this question here now.

Some of the concepts of functional programming ( I am sure I am missing out a lot )

  1. Functions as first class citizens.
  2. Functions do not have side effects and hence Immutable objects.
  3. Concurrency becomes easy as a result of 2)

Even though modern JVMs are extremely efficient with object creation and gc is very inexpensive for short lived objects, its probably still better to minimize object creation right?, at least in a single threaded application where concurrency and locking is not an issue. Since Scala is a hybrid, I know I can write imperative code with mutable objects if necessary. But, as someone who has spent a lot of years trying to reuse objects and minimize allocation. I would like a good understanding of the counter school of thought.

As a specific case, I was a little surprised by this code snippet in this tutorial by Ted Neward( seems like a prominent name in the Scala community ). It has a java version of Quicksort and then gives a neat looking Scala implementation of the same.

Here is my attempt to benchmark the implementations. I haven't done detailed profiling. But, my guess is that the Scala version is slower because the no of objects allocated is linear( one per recursion call). Is there any way chance, tail call optimizations come into play. If I am right, Scala supports tail call optimizations for self recursive calls. So, it should only be helping it. I am using Scala 2.8.

JAVA VERSION

public class QuickSortJ {
    void sort(int[] xs) {
      sort(xs, 0, xs.length -1 );
    }
    void sort(int[] xs, int l, int r) {
      int pivot = xs[(l+r)/2];
      int a = l; int b = r;
      while (a <= b)
        while (xs[a] < pivot) { a = a + 1; }
        while (xs[b] > pivot) { b = b – 1; }
        if (a <= b) {
          swap(xs, a, b);
          a = a + 1;
          b = b – 1;
        }
      }
      if (l < b) sort(xs, l, b);
      if (b < r) sort(xs, a, r);
    }
    void swap(int[] arr, int i, int j) {
      int t = arr[i]; arr[i] = arr[j]; arr[j] = t;
    }
}

SCALA VERSION

object QuickSortS {

  def sort(xs: Array[Int]): Array[Int] =
    if (xs.length <= 1) xs
    else {
      val pivot = xs(xs.length / 2)
      Array.concat(
        sort(xs filter (pivot >)),
        xs filter (pivot ==),
        sort(xs filter (pivot <)))
    }
}

Scala Code to compare implementations

  def main(args: Array[String]): Unit = {
    val ran = new java.util.Random(5);

    val ints = new Array[Int](100000);
    for (i <- 0 to ints.length - 1)
      ints(i) = ran.nextInt();

    var cur = new Date().getTime();
    QuickSortS.sort(ints) ;
    println("Time Scala " + (new Date().getTime() - cur));

    //Note: the scala version does not mutate the array.

    cur = new Date().getTime();
    QuickSortJ.sort(ints);
    println("Time Java " + (new Date().getTime() - cur));

  }

RESULTS

Time Scala 665
Time Java  67

Since there are a few misconceptions flying around here, I’d like to clarify some points.

  • The “in-place” quicksort isn’t really in-place (and quicksort is not by definition in-place). It requires additional storage in the form of stack space for the recursive step, which is in the order of O(log n) in the best case, but O(n) in the worst case.

  • Implementing a functional variant of quicksort that operates on arrays defeats the purpose. Arrays are never immutable.

  • The “proper” functional implementation of quicksort uses immutable lists. It is of course not in-place but it’s got the same worst-case asymptotic runtime (O(n^2)) and space complexity (O(n)) as the procedural in-place version.

    On average, its running time is still on par with that of the in-place variant (O(n log n)). Its space complexity, however, is still O(n).


There are two obvious disadvantages of a functional quicksort implementation. In the following, let’s consider this reference implementation in Haskell (I don’t know Scala …) from the Haskell introduction:

qsort []     = []
qsort (x:xs) = qsort lesser ++ [x] ++ qsort greater
    where lesser  = (filter (< x) xs)
          greater = (filter (>= x) xs)
  1. The first disadvantage is the choice of the pivot element, which is very inflexible. The strength of modern quicksort implementations relies heavily on a smart choice of the pivot (compare “Engineering a sort function” by Bentley et al.). The above algorithm is poor in that regard, which degrades average performance considerably.

  2. Secondly, this algorithm uses list concatenation (instead of list construction) which is an O(n) operation. This doesn’t impact on the asymptotic complexity but it’s a measurable factor.

A third disadvantage is somewhat hidden: unlike the “in-place” variant, this implementation continually requests memory from the heap for the cons cells of the list and potentially scatters memory all over the place. As a result, this algorithm has a very poor cache locality. I don’t know whether smart allocators in modern functional programming languages can mitigate this – but on modern machines, cache misses have become a major performance killer.


What’s the conclusion? Unlike others, I wouldn’t say that quicksort is inherently imperative and that’s why it performs badly in an FP environment. Quite on the contrary, I would argue that quicksort is a perfect example of a functional algorithm: it translates seamlessly into an immutable environment, its asymptotic running time and space complexity are on par with the procedural implementation, and even its procedural implementation employs recursion.

But this algorithm still performs worse when constrained to an immutable domain. The reason for this is that the algorithm has the peculiar property of benefitting from a lot of (sometimes low-level) fine-tuning that can only be efficiently performed on arrays. A naive description of the quicksort misses all these intricacies (both in the functional and in the procedural variant).

After reading “Engineering a sort function” I can no longer consider quicksort an elegant algorithm. Implemented efficiently, it is a clunky mess, an engineer’s piece of work, not an artist’s (not to devalue engineering! this has its own aesthetic).


But I would also like to point out that this point is particular to quicksort. Not every algorithm is amenable to the same sort of low-level tweaking. A lot of algorithms and data structures really can be expressed without performance loss in an immutable environment.

And immutability can even decrease performance costs by removing the need of costly copies or cross-thread synchronizations.

So, to answer the original question, “is immutability expensive?” – In the particular case of quicksort, there is a cost that is indeed a result of immutability. But in general, no.

5 years later, is there something better than the "Fastest Possible C++ Delegates" ?

35 votes

I know that the topic of "C++ delegates" has been done to death, and both http://www.codeproject.com and http://stackoverflow.com deeply cover the question.

Generally, it seems that Don Clugston's fastest possible delegate is the first choice for many people. There are a few other popular ones.

However, I noticed that most of those articles are old (around 2005) and many design choices seem to have been made taking in account old compilers like VC7

I'm in need of a very fast delegate implementation for an audio application.

I still need it to be portable (Windows, Mac, Linux) but I only use modern compilers (VC9, the one in VS2008 SP1 and gcc 4.5.x) .

My main criteria is :

  • it must be fast, fast, fast !
  • it must be forward compatible with newer versions of the compilers. I have some doubts about that with Don's implementation because he explicitely states it's not standard-compliant
  • optionaly, a KISS-syntax and ease of use is nice to have
  • multicast would be nice, although I'm convinced it's really easy to build it around any delegate library

Furthermore, I don't really need exotic features : I just need the good old pointer-to-method thing. No need to support static methods, free functions or things like that.

As of today, what is the recomended approach ? Still use Don's version ? Or is there a kind of a "comunity consensus" about another option ?

I really don't wan't to use boost.signal/signal2 because it's not acceptable in terms of performance. A dependency on QT is not acceptable as well.

Furthermore, I've seen some newer libraries while googling, like for example cpp-events but I couldn't find any feedback from users, including on SO.

Any hints/pointers apreciated.

Best regards D.

Update: An article with the complete source code and a more detailed discussion has been posted on The Code Project.

Well, the problem with pointers to methods is that they're not all the same size. So instead of storing pointers to methods directly, we need to "standardize" them so that they are of a constant size. This is what Don Clugston attempts to achieve in his Code Project article. He does so using intimate knowledge of the most popular compilers. I assert that it's possible to do it in "normal" C++ without requiring such knowledge.

Consider the following code:

void DoSomething(int)
{
}

void InvokeCallback(void (*callback)(int))
{
    callback(42);
}

int main()
{
    InvokeCallback(&DoSomething);
    return 0;
}

This is one way to implement a callback using a plain old function pointer. However, this doesn't work for methods in objects. Let's fix this:

class Foo
{
public:
    void DoSomething(int) {}

    static void DoSomethingWrapper(void* obj, int param)
    {
        static_cast<Foo*>(obj)->DoSomething(param);
    }
};

void InvokeCallback(void* instance, void (*callback)(void*, int))
{
    callback(instance, 42);
}

int main()
{
    Foo f;
    InvokeCallback(static_cast<void*>(&f), &Foo::DoSomethingWrapper);
    return 0;
}

Now, we have a system of callbacks that can work for both free and member functions. This, however, is clumsy and error-prone. However, there is a pattern - the use of a wrapper function to "forward" the static function call to a method call on the proper instance. We can use templates to help with this - let's try generalizing the wrapper function:

template<typename R, class T, typename A1, R (T::*Func)(A1)>
R Wrapper(void* o, A1 a1)
{
    return (static_cast<T*>(o)->*Func)(a1);

}

class Foo
{
public:
    void DoSomething(int) {}
};

void InvokeCallback(void* instance, void (*callback)(void*, int))
{
    callback(instance, 42);
}

int main()
{
    Foo f;
    InvokeCallback(static_cast<void*>(&f),
        &Wrapper<void, Foo, int, &Foo::DoSomething> );
    return 0;
}

This is still extremely clumsy, but at least now we don't have to write out a wrapper function every single time (at least for the 1 argument case). Another thing we can generalize is the fact that we're always passing a pointer to void*. Instead of passing it as two different values, why not put them together? And while we're doing that, why not generalize it as well? Hey, let's throw in an operator()() so we can call it like a function!

template<typename R, typename A1>
class Callback
{
public:
    typedef R (*FuncType)(void*, A1);

    Callback(void* o, FuncType f) : obj(o), func(f) {}
    R operator()(A1 a1) const
    {
        return (*func)(obj, a1);
    }

private:
    void* obj;
    FuncType func;
};

template<typename R, class T, typename A1, R (T::*Func)(A1)>
R Wrapper(void* o, A1 a1)
{
    return (static_cast<T*>(o)->*Func)(a1);

}

class Foo
{
public:
    void DoSomething(int) {}
};

void InvokeCallback(Callback<void, int> callback)
{
    callback(42);
}

int main()
{
    Foo f;
    Callback<void, int> cb(static_cast<void*>(&f),
        &Wrapper<void, Foo, int, &Foo::DoSomething>);
    InvokeCallback(cb);
    return 0;
}

We're making progress! But now our problem is the fact that the syntax is absolutely horrible. The syntax appears redundant; can't the compiler figure out the types from the pointer to method itself? Unfortunately no, but we can help it along. Remember that a compiler can deduce types via template argument deduction in a function call. So why don't we start with that?

template<typename R, class T, typename A1>
void DeduceMemCallback(R (T::*)(A1)) {}

Inside the function, we know what R, T and A1 is. So what if we can construct a struct that can "hold" these types and return them from the function?

template<typename R, class T, typename A1>
struct DeduceMemCallbackTag
{
};

template<typename R, class T, typename A1>
DeduceMemCallbackTag2<R, T, A1> DeduceMemCallback(R (T::*)(A1))
{
    return DeduceMemCallbackTag<R, T, A1>();
}

And since DeduceMemCallbackTag knows about the types, why not put our wrapper function as a static function in it? And since the wrapper function is in it, why not put the code to construct our Callback object in it?

template<typename R, typename A1>
class Callback
{
public:
    typedef R (*FuncType)(void*, A1);

    Callback(void* o, FuncType f) : obj(o), func(f) {}
    R operator()(A1 a1) const
    {
        return (*func)(obj, a1);
    }

private:
    void* obj;
    FuncType func;
};

template<typename R, class T, typename A1>
struct DeduceMemCallbackTag
{
    template<R (T::*Func)(A1)>
    static R Wrapper(void* o, A1 a1)
    {
        return (static_cast<T*>(o)->*Func)(a1);
    }

    template<R (T::*Func)(A1)>
    inline static Callback<R, A1> Bind(T* o)
    {
        return Callback<R, A1>(o, &DeduceMemCallbackTag::Wrapper<Func>);
    }
};

template<typename R, class T, typename A1>
DeduceMemCallbackTag<R, T, A1> DeduceMemCallback(R (T::*)(A1))
{
    return DeduceMemCallbackTag<R, T, A1>();
}

The C++ standard allows us to call static functions on instances (!):

class Foo
{
public:
    void DoSomething(int) {}
};

void InvokeCallback(Callback<void, int> callback)
{
    callback(42);
}

int main()
{
    Foo f;
    InvokeCallback(
        DeduceMemCallback(&Foo::DoSomething)
        .Bind<&Foo::DoSomething>(&f)
    );
    return 0;
}

Still, it's a lengthy expression, but we can put that into a simple macro (!):

template<typename R, typename A1>
class Callback
{
public:
    typedef R (*FuncType)(void*, A1);

    Callback(void* o, FuncType f) : obj(o), func(f) {}
    R operator()(A1 a1) const
    {
        return (*func)(obj, a1);
    }

private:
    void* obj;
    FuncType func;
};

template<typename R, class T, typename A1>
struct DeduceMemCallbackTag
{
    template<R (T::*Func)(A1)>
    static R Wrapper(void* o, A1 a1)
    {
        return (static_cast<T*>(o)->*Func)(a1);
    }

    template<R (T::*Func)(A1)>
    inline static Callback<R, A1> Bind(T* o)
    {
        return Callback<R, A1>(o, &DeduceMemCallbackTag::Wrapper<Func>);
    }
};

template<typename R, class T, typename A1>
DeduceMemCallbackTag<R, T, A1> DeduceMemCallback(R (T::*)(A1))
{
    return DeduceMemCallbackTag<R, T, A1>();
}

#define BIND_MEM_CB(memFuncPtr, instancePtr) \
    (DeduceMemCallback(memFuncPtr).Bind<(memFuncPtr)>(instancePtr))

class Foo
{
public:
    void DoSomething(int) {}
};

void InvokeCallback(Callback<void, int> callback)
{
    callback(42);
}

int main()
{
    Foo f;
    InvokeCallback(BIND_MEM_CB(&Foo::DoSomething, &f));
    return 0;
}

We can enhance the Callback object by adding a safe bool. It's also a good idea to disable the equality operators since it's not possible to compare two Callback objects. Even better, is to use partial specialization to allow for a "preferred syntax". This gives us:

template<typename FuncSignature>
class Callback;

template<typename R, typename A1>
class Callback<R (A1)>
{
public:
    typedef R (*FuncType)(void*, A1);

    Callback() : obj(0), func(0) {}
    Callback(void* o, FuncType f) : obj(o), func(f) {}

    R operator()(A1 a1) const
    {
        return (*func)(obj, a1);
    }

    typedef void* Callback::*SafeBoolType;
    operator SafeBoolType() const
    {
        return func != 0? &Callback::obj : 0;
    }

    bool operator!() const
    {
        return func == 0;
    }

private:
    void* obj;
    FuncType func;
};

template<typename R, typename A1> // Undefined on purpose
void operator==(const Callback<R (A1)>&, const Callback<R (A1)>&);
template<typename R, typename A1>
void operator!=(const Callback<R (A1)>&, const Callback<R (A1)>&);

template<typename R, class T, typename A1>
struct DeduceMemCallbackTag
{
    template<R (T::*Func)(A1)>
    static R Wrapper(void* o, A1 a1)
    {
        return (static_cast<T*>(o)->*Func)(a1);
    }

    template<R (T::*Func)(A1)>
    inline static Callback<R (A1)> Bind(T* o)
    {
        return Callback<R (A1)>(o, &DeduceMemCallbackTag::Wrapper<Func>);
    }
};

template<typename R, class T, typename A1>
DeduceMemCallbackTag<R, T, A1> DeduceMemCallback(R (T::*)(A1))
{
    return DeduceMemCallbackTag<R, T, A1>();
}

#define BIND_MEM_CB(memFuncPtr, instancePtr) \
    (DeduceMemCallback(memFuncPtr).Bind<(memFuncPtr)>(instancePtr))

Usage example:

class Foo
{
public:
    float DoSomething(int n) { return n / 100.0f; }
};

float InvokeCallback(int n, Callback<float (int)> callback)
{
    if(callback) { return callback(n); }
    return 0.0f;
}

int main()
{
    Foo f;
    float result = InvokeCallback(97, BIND_MEM_CB(&Foo::DoSomething, &f));
    // result == 0.97
    return 0;
}

I have tested this on the Visual C++ compiler (version 15.00.30729.01, the one that comes with VS 2008), and you do need a rather recent compiler to use the code. By inspection of the disassembly, the compiler was able to optimize away the wrapper function and the DeduceMemCallback call, reducing down to simple pointer assignments.

It's simple to use for both sides of the callback, and uses only (what I believe to be) standard C++. The code I've shown above works for member functions with 1 argument, but can be generalized to more arguments. It can also be further generalized by allowing support for static functions.

Note that the Callback object requires no heap allocation - they are of a constant size thanks to this "standardization" procedure. Because of this, it's possible to have a Callback object be a member of larger class, since it has a default constructor. It is also assignable (the compiler generated copy assignment functions are sufficient). It is also typesafe, thanks to the templates.

Why C-forkbombs don't work like bash ones?

34 votes

If I run the classical bash forkbomb:

:(){ :&:&};:

my system hangs after a few seconds.

I tried to write a forkbomb in C, here is the code:

#include <unistd.h>

int main( )
{
    while(1) {
        fork();
    }
    return 0;
}

When I run it the system gets less responsive, but I can kill that process (even after minutes) just pressing ^C.


The above code is different from the original bash forkbomb I posted: it's something more like:

:( )
{
    while true
    do
        :
    done
}

(I didn't test it; don't know if it'd hang the system).

So I also tried to implement the original version; here the code:

#include <unistd.h>

inline void colon( const char *path )
{
    pid_t pid = fork( );
    if( pid == 0 ) {
        execl( path, path, 0 );
    }
}

int main( int argc, char **argv )
{
    colon( argv[0] );
    colon( argv[0] );
    return 0;
}

But still nothing: I can run it and then easily kill it. It's not hanging my system.


Why?

What's so special about bash forkbombs? Is it because bash uses a lot more memory/CPU? Because bash processes call a lot more system calls (eg, to access filesystem) than mine?

That C program is tiny, seriously tiny. In addition, fork()'ing a program like that is very, very efficient. An interpreter, such as Bash, however, is much more expensive in terms of RAM usage, and needs to access the disk all the time.

Try running it for much longer. :)

Javascript error when using Messenger Connect client-side library in ASP.NET

33 votes

I'm having a problem implementing the new Messenger Connect functionality inside an existing Sitefinity site to enable client logins using Live IDs.

Namely, when I'm using the following code to implement the client-side functionality:

<head runat="server">
  <script type="text/javascript" src="http://js.live.net/4.1/loader.js"></script>
</head>
<body>
  <form runat="server" id="form1">
    <asp:ScriptManager ID="ScriptManager1" runat="server"/>
    <wl:app
        client-id="<%= ConfigurationManager.AppSettings["wl_wrap_client_id"] %>"
        scope="WL_Profiles.View"
        callback-url="<%= ConfigurationManager.AppSettings["wl_wrap_client_callback"] %>?wl_session_id=<%=SessionId %>"
        channel-url="/channel.htm">
    </wl:app>

... I get three errors in Firebug that I can't quite identify correctly:

Sys.ArgumentTypeException: Object of type 'Sys._Application' cannot be converted to type 'Sys.IDisposable'. Parameter name: object

(in ScriptResource.axd?d=.... line 4993)

Sys.Application._doInitialize is not a function

(in MicrosoftAjaxBase.js line 1)

Sys.InvalidOperationException: The script 'MicrosoftAjaxGlobalization.js' has been referenced multiple times. If referencing Microsoft AJAX scripts explicitly, set the MicrosoftAjaxMode property of the ScriptManager to Explicit.

(in ScriptResource.axd?d=.... line 984)

The errors are only triggered when I include the loader.js script from js.live.net.

EDIT: Seems the errors aren't necessarily triggered in that order. Refreshing the page seems to shuffle those errors and/or introduce other ones, such as a Sys.ParameterCountException in ScriptResource.axd?... on line 1842, for example.

Hey, I tried some combinations here, and the one that worked was:

1) Set the ScriptMode property of the ScriptManager to Release;

2) Load the MSN library in the CodeBehind Page_Load event, using the ClientScript class:

protected void Page_Load(object sender, EventArgs e)
{
    ClientScript.RegisterClientScriptInclude(this.GetType(), "live", "http://js.live.net/4.0/loader.js");
}

Firebug isn't showing any error anymore, and in my case, the authentication window is opening as desired.

Hope it helps!

EDIT

As told before, here follows the whole code I use to avoid this issue:

Default.aspx

<%@ Page Language="C#" AutoEventWireup="true" CodeFile="Default.aspx.cs" Inherits="_Default" %>

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xmlns:wl="http://apis.live.net/js/2010">
<head>
    <title>SignIn Example</title>
    <script type="text/javascript">
        function appLoaded(appLoadedEventArgs) {
        }
        function signInCallback(signInCompletedEventArgs) {
            if (signInCompletedEventArgs.get_resultCode() === Microsoft.Live.AsyncResultCode.success)
            {
                alert('Sign-in successful.');
            }
            else
            {
                alert('Sign-in failed.');
            }
        }
    </script>
</head>
<body>
    <form runat="server" id="form1">

    <asp:ScriptManager ID="ScriptManager1" runat="server" ScriptMode="Release"></asp:ScriptManager>

    <wl:app channel-url="http://labs.asteria.com.br/wlm/Channel.html" 
        callback-url="http://labs.asteria.com.br/wlm/Callback.aspx?wl_session_id=<%= Session.SessionID %>"
        client-id="0000000044052209" 
        scope="WL_Profiles.View" 
        onload="{{appLoaded}}">
    </wl:app>
    <wl:signin 
        id="signInControl" 
        signedintext="Signed in. Click to sign out." 
        signedouttext="Click to sign in."
        onsignin="{{signInCallback}}" />
    </form>
</body>
</html>

Default.aspx.cs

using System;
using System.Collections.Generic;
using System.Linq;
using System.Web;
using System.Web.UI;
using System.Web.UI.WebControls;

public partial class _Default : System.Web.UI.Page
{
    protected void Page_Load(object sender, EventArgs e)
    {
        ClientScript.RegisterClientScriptInclude(this.GetType(), "live", "http://js.live.net/4.0/loader.js");
    }
}

Web.config

<?xml version="1.0"?>
<configuration>
<appSettings>
    <add key="wl_wrap_client_secret" value="[YOUR SECRET KEY]"/>
    <add key="wl_wrap_client_id" value="0000000044052209"/>
    <add key="wl_wrap_client_callback" value="http://labs.asteria.com.br/wlm/Callback.aspx"/>
</appSettings>

<connectionStrings/>
<system.web>
    <customErrors mode="Off"/>
    <compilation debug="true" targetFramework="4.0"></compilation>
    <pages controlRenderingCompatibilityVersion="3.5" clientIDMode="AutoID"/>
</system.web>
</configuration>

To see it running, you can access http://labs.asteria.com.br/wlm. It seems that the Consent URL (https://consent.live.com/AccessToken.aspx) is not responding at this time.

Understanding boost::disjoint_sets

32 votes

I need to use boost::disjoint_sets, but the documentation is unclear to me. Can someone please explain what each template parameter means, and perhaps give a small example code for creating a disjoint_sets?

As per the request, I am using disjoint_sets to implement Tarjan's off-line least common ancestors algorithm, i.e - the value type should be vertex_descriptor.

What I can understand from the documentation :

Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).

You'll need two "properties" :

  • one to associate an integer to each element (first template argument), the rank
  • one to associate an element to an other one (second template argument), the fathers

On an example :

std::vector<int>  rank (100);
std::vector<int>  parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);

Arrays are used &rank[0], &parent[0] to the type in the template is int*

For a more complex example (using maps) you can look at Ugo's answer.

You are just giving to the algorithm two structures to store the data (rank/parent) he needs.

Does being a competent scala programmer require you to be a competent java programmer?

32 votes

I am a big fan of Scala aesthetically, and of a lot of the conceptual work put into things like its typing system and libraries.

However, as I have begun tinkering with Scala (and seen some of my coworkers tinker with it) i find myself having to dig for more and more Java knowledge (especially in the way of libraries).

This presents me with a few problems:

  • Having never been a Java programmer, i'm not familiar or comfortable with the Java standard library, or additional popular libraries (like Apache Commons).
  • My google-fu in the Java-sphere is weak. It's hard to know what to search for – a problem exacerbated by the ponderously large number of irrelevant or rudimentary java tutorials for programming newbies.

At this point though, i'm not sure whether i should bite the bullet and try and find the quickest and most comprehensive tour through Java to catch myself up on 20 years of Java developments, or whether its reasonable to continue trying to incrementally patch my knowledge as i wander around scala.

Any wisdom that scala heads amongst us could offer would be greatly appreciated.

P.S. I have no doubt in my ability to familiarize myself with Scala syntax, and i'm perfectly comfortable and happy with functional programming and the paradigms in the scala community. But a programmer's competence is not just based on one's ability to teach oneself, but also one's ability to learn from, and adopt tools and skills from other people.

You should take a lazy approach to learning Java. Learn it when you need it.

In my opinion, much of the old Java knowledge is out of date, much of the new tutorials are redundant. You certainly don't want to bother yourself with Java's antiquated Collections, for example. Many Java-based frameworks can be safely ignored. And the heavyweight JavaEE stack can be safely bypassed until you were forced to use a part of it.

Many common patterns in Java are much simpler in Scala, with the former being burdened with much boilerplate code. Core logic should always be implemented in Scala. I believe you can do most of your work directly in Scala and only need to dip down into Java when building things like Swing or integrating with Spring, etc.

In regard to choosing and using Java libraries, my person guidelines are:

  1. If Spring can do it, use Spring
  2. If Spring is too heavyweight, use what Spring uses.
  3. If Spring can't do it, check github projects
  4. If there's nothing on github, check Apache projects
  5. If there's nothing from Apache, check sourceforge(t).
  6. Finally, Google randomly or just build it yourself.

That's a bit tounge-in-cheek, but is the impression I get about the maturity and stability of third party libraries after having done Java for the last 12 years.

Pure virtual functions may not have an inline definition. Why?

30 votes

Pure virtual functions are those member functions that are virtual and have the pure-specifier ( = 0; )

Clause 10.4 paragraph 2 of C++03 tells us what an abstract class is and, as a side note, the following:

[Note: a function declaration cannot provide both a pure-specifier and a definition —end note] [Example:

struct C {
virtual void f() = 0 { }; // ill-formed
};

—end example]

For those who are not very familiar with the issue, please note that pure virtual functions can have definitions but the above-mentioned clause forbids such definitions to appear inline (lexically in-class). (For uses of defining pure virtual functions you may see, for example, this GotW)

Now for all other kinds and types of functions it is allowed to provide an in-class definition, and this restriction seems at first glance absolutely artificial and inexplicable. Come to think of it, it seems such on second and subsequent glances :) But I believe the restriction wouldn't be there if there weren't a specific reason for that. My question is: does anybody know that specific reasons? Good guesses are also welcome. Thanks in advance

Notes:

  • MSVC does allow PVF's to have inline definitions. So don't get surprised :)
  • the word inline in this question does not refer to the inline keyword. It is supposed to mean lexically in-class

In the SO thread “Why pure virtual function is initialized by 0?” Jerry Coffin provided this quote from Bjarne Stroustrup’s The Design & Evolution of C++, section §13.2.3, where I've added some emphasis of the part I think is relevant:

The curious =0 syntax was chosen over the obvious alternative of introducing a new keyword pure or abstract because at the time I saw no chance of getting a new keyword accepted. Had I suggested pure, Release 2.0 would have shipped without abstract classes. Given a choice between a nicer syntax and abstract classes, I chose abstract classes. Rather than risking delay and incurring the certain fights over pure, I used the tradition C and C++ convention of using 0 to represent "not there." The =0 syntax fits with my view that a function body is the initializer for a function and also with the (simplistic, but usually adequate) view of the set of virtual functions being implemented as a vector of function pointers. [ … ]

So, when choosing the syntax Bjarne was thinking of a function body as a kind of initializer part of the declarator, and =0 as an alternate form of initializer, one that indicated “no body” (or in his words, “not there”).

It stands to reason that one cannot both indicate “not there” and have a body – in that conceptual picture.

Or, still in that conceptual picture, having two initializers.

Now, that's as far as my telepathic powers, google-foo and soft-reasoning goes. I surmise that nobody's been Interested Enough™ to formulate a proposal to the committee about having this purely syntactical restriction lifted, and following up with all the work that that entails. Thus it's still that way.

Cheers & hth.,

How to reliably guess the encoding between MacRoman, CP1252, Latin1, UTF-8, and ASCII

30 votes

At work it seems like no week ever passes without some encoding-related conniption, calamity, or catastrophe. The problem usually derives from programmers who think they can reliably process a “text” file without specifying the encoding. But you can't.

So it's been decided to henceforth forbid files from ever having names that end in *.txt or *.text. The thinking is that those extensions mislead the casual programmer into a dull complacency regarding encodings, and this leads to improper handling. It would almost be better to have no extension at all, because at least then you know that you don’t know what you’ve got.

However, we aren’t goint to go that far. Instead you will be expected to use a filename that ends in the encoding. So for text files, for example, these would be something like README.ascii, README.latin1, README.utf8, etc.

For files that demand a particular extension, if one can specify the encoding inside the file itself, such as in Perl or Python, then you shall do that. For files like Java source where no such facility exists internal to the file, you will put the encoding before the extension, such as SomeClass-utf8.java.

For output, UTF-8 is to be strongly preferred.

But for input, we need to figure out how to deal with the thousands of files in our codebase named *.txt. We want to rename all of them to fit into our new standard. But we can’t possibly eyeball them all. So we need a library or program that actually works.

These are variously in ASCII, ISO-8859-1, UTF-8, Microsoft CP1252, or Apple MacRoman. Although we're know we can tell if something is ASCII, and we stand a good change of knowing if something is probably UTF-8, we’re stumped about the 8-bit encodings. Because we’re running in a mixed Unix environment (Solaris, Linux, Darwin) with most desktops being Macs, we have quite a few annoying MacRoman files. And these especially are a problem.

For some time now I’ve been looking for a way to programmatically determine which of

  1. ASCII
  2. ISO-8859-1
  3. CP1252
  4. MacRoman
  5. UTF-8

a file is in, and I haven’t found a program or library that can reliably distinguish between those the three different 8-bit encodings. We probably have over a thousand MacRoman files alone, so whatever charset detector we use has to be able to sniff those out. Nothing I’ve looked at can manage the trick. I had big hopes for the ICU charset detector library, but it cannot handle MacRoman. I’ve also looked at modules to do the same sort of thing in both Perl and Python, but again and again it’s always the same story: no support for detecting MacRoman.

What I am therefore looking for is an existing library or program that reliably determines which of those five encodings a file is in—and preferably more than that. In particular it has to distinguish between the three 3-bit encoding I’ve cited, especially MacRoman. The files are more than 99% English language text; there are a few in other languages, but not many.

If it’s library code, our language preference is for it to be in Perl, C, Java, or Python, and in that order. If it’s just a program, then we don’t really care what language it’s in so long as it comes in full source, runs on Unix, and is fully unencumbered.

Has anyone else had this problem of a zillion legacy text files randomly encoded? If so, how did you attempt to solve it, and how successful were you? This is the most important aspect of my question, but I’m also interested in whether you think encouraging programmers to name (or rename) their files with the actual encoding those files are in will help us avoid the problem in the future. Has anyone ever tried to enforce this on an institutional basis, and if so, was that successful or not, and why?

And yes, I fully understand why one cannot guarantee a definite answer given the nature of the problem. This is especially the case with small files, where you don’t have enough data to go on. Fortunately, our files are seldom small. Apart from the random README file, most are in the size range of 50k to 250k, and many are larger. Anything more than a few K in size is guaranteed to be in English.

The problem domain is biomedical text mining, so we sometimes deal with extensive and extremely large corpora, like all of PubMedCentral’s Open Access respository. A rather huge file is the BioThesaurus 6.0, at 5.7 gigabytes. This file is especially annoying because it is almost all UTF-8. However, some numbskull went and stuck a few lines in it that are in some 8-bit encoding—Microsoft CP1252, I believe. It takes quite a while before you trip on that one. :(

First, the easy cases:

ASCII

If your data contains no bytes above 0x7F, then it's ASCII. (Or a 7-bit ISO646 encoding, but those are very obsolete.)

UTF-8

If your data validates as UTF-8, then you can safely assume it is UTF-8. Due to UTF-8's strict validation rules, false positives are extremely rare.

ISO-8859-1 vs. windows-1252

The only difference between these two encodings is that ISO-8859-1 has the C1 control characters where windows-1252 has the printable characters €‚ƒ„…†‡ˆ‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ. I've seen plenty of files that use curly quotes or dashes, but none that use C1 control characters. So don't even bother with them, or ISO-8859-1, just detect windows-1252 instead.

That now leaves you with only one question.

How do you distinguish MacRoman from cp1252?

This is a lot trickier.

Undefined characters

The bytes 0x81, 0x8D, 0x8F, 0x90, 0x9D are not used in windows-1252. If they occur, then assume the data is MacRoman.

Identical characters

The bytes 0xA2 (¢), 0xA3 (£), 0xA9 (©), 0xB1 (±), 0xB5 (µ) happen to be the same in both encodings. If these are the only non-ASCII bytes, then it doesn't matter whether you choose MacRoman or cp1252.

Statistical approach

Count character (NOT byte!) frequencies in the data you know to be UTF-8. Determine the most frequent characters. Then use this data to determine whether the cp1252 or MacRoman characters are more common.

For example, in a search I just performed on 100 random English Wikipedia articles, the most common non-ASCII characters are ·•–é°®’èö—. Based on this fact,

  • The bytes 0x92, 0x95, 0x96, 0x97, 0xAE, 0xB0, 0xB7, 0xE8, 0xE9, or 0xF6 suggest windows-1252.
  • The bytes 0x8E, 0x8F, 0x9A, 0xA1, 0xA5, 0xA8, 0xD0, 0xD1, 0xD5, or 0xE1 suggest MacRoman.

Count up the cp1252-suggesting bytes and the MacRoman-suggesting bytes, and go with whichever is greatest.

How do I check if a number is positive or negative in c#?

29 votes

This must be a common question; however, I cannot seem to find a neat way of doing it.

How do I check if a number is positive or negative?

Thanks.

Edit: APOLOGIES FOR MY QUESTION. I was very very tired + I had a "liquid lunch" if you see what I mean. Oh well I will put it down as one of those embarrassing moments of mine. Thanks anyway.

bool positive = number > 0;

Is a main() required for a C program.

28 votes

Well the title says it all.Is a main() function absolutely essential for a c program. I am asking this because,I was looking at the Linux kernel code,and I didn't see a main() function.

No, the ISO C standard states that a main function is only required for a hosted environment (one with an underlying OS).

For a freestanding environment like an embedded system (or an operating system itself), it's implementation defined. From C99 5.1.2:

Two execution environments are defined: freestanding and hosted. In both cases, program startup occurs when a designated C function is called by the execution environment.

In a freestanding environment (in which C program execution may take place without any benefit of an operating system), the name and type of the function called at program startup are implementation-defined.

As to how Linux itself starts, the start point for the Linux kernel is start_kernel though, for a more complete picture of the entire boot process, you should start here.