Best c++ questions in October 2011

Throwing the fattest people off of an overloaded airplane.

91 votes

Let's say you've got an airplane, and it is low on fuel. Unless the plane drops 3000 pounds of passenger weight, it will not be able to reach the next airport. To save the maximum number of lives, we would like to throw the heaviest people off of the plane first.

And oh yeah, there are millions of people on the airplane, and we would like an optimal algorithm to find the heaviest passengers, without necessarily sorting the entire list.

This is a proxy problem for something I'm trying to code in C++. I would like to do a "partial_sort" on the passenger manifest by weight, but I don't know how many elements I'm going to need. I could implement my own "partial_sort" algorithm ("partial_sort_accumulate_until"), but I'm wondering if there's any easier way to do this using standard STL.

One way would be to use a min heap. Here's how you'd do it, assuming you had a MinHeap class. (Yes, my example is in C#. I think you get the idea.)

int targetTotal = 3000;
int totalWeight = 0;
// this creates an empty heap!
var myHeap = new MinHeap<Passenger>(/* need comparer here to order by weight */);
foreach (var pass in passengers)
{
    if (totalWeight < targetTotal)
    {
        // unconditionally add this passenger
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
    else if (pass.Weight > myHeap.Peek().Weight)
    {
        // If this passenger is heavier than the lightest
        // passenger already on the heap,
        // then remove the lightest passenger and add this one
        var oldPass = myHeap.RemoveFirst();
        totalWeight -= oldPass.Weight;
        myHeap.Add(pass);
        totalWeight += pass.Weight;
    }
}

// At this point, the heaviest people are on the heap,
// but there might be too many of them.
// Remove the lighter people until we have the minimum necessary
while ((totalWeight - myHeap.Peek().Weight) > targetTotal)
{
    var oldPass = myHeap.RemoveFirst();
    totalWeight -= oldPass.Weight; 
}
// The heap now contains the passengers who will be thrown overboard.

According to the standard references, running time should be proportional to n log k, where n is the number of passengers and k is the maximum number of items on the heap. If we assume that passengers' weights will typically be 100 lbs or more, then it's unlikely that the heap will contain more than 30 items at any time.

The worst case would be if the passengers are presented in order from lowest weight to highest. That would require that every passenger be added to the heap, and every passenger be removed from the heap. Still, with a million passengers and assuming that the lightest weighs 100 lbs, the n log k works out to a reasonably small number.

If you get the passengers' weights randomly, performance is much better. I use something quite like this for a recommendation engine (I select the top 200 items from a list of several million). I typically end up with only 50,000 or 70,000 items actually added to the heap.

I suspect that you'll see something quite similar: the majority of your candidates will be rejected because they're lighter than the lightest person already on the heap. And Peek is an O(1) operation.

For a more information about the performance of heap select and quick select, see When theory meets practice. Short version: if you're selecting fewer than 1% of the total number of items, then heap select is a clear winner over quick select. More than 1%, then use quick select or a variant like Introselect.

What's the point of const pointers?

74 votes

I'm not talking about pointers to const values, but const pointers themselves.

I'm learning C and C++ beyond the very basic stuff and just until today I realized that pointers are passed by value to functions, which makes sense. This means that inside a function I can make the copied pointer point to some other value without affecting the original pointer from the caller.

So what's the point of having a function header that says:

void foo(int* const ptr);

Inside such a function you cannot make ptr point to something else because it's const and you don't want it to be modified, but a function like this:

void foo(int* ptr);

Does the work just as well! because the pointer is copied anyways and the pointer in the caller is not affected even if you modify the copy. So what's the advantage of const?

Update: There are illegal sites like programmersgoodies.com linking to this question / answer. Sites like these are violating the StackOverflow attribution requirements.


Answer:

const is a tool which you should use in pursuit of a very important C++ concept:

Find bugs at compile-time, rather than run-time, by getting the compiler to enforce what you mean.

Even though it doesn't change the functionality, adding const generates a compiler error when you're doing things you didn't mean to do. Imagine the following typo:

void foo(int* ptr)
{
    ptr = 0;// oops, I meant *ptr = 0
}

If you use int* const, this would generate a compiler error because you're changing the value to ptr. Adding restrictions via syntax is a good thing in general. Just don't take it too far -- the example you gave is a case where most people don't bother using const.

Efficiency of premature return in a function

61 votes

This is a situation I encounter frequently as an inexperienced programmer and am wondering about particularly for an ambitious, speed-intensive project of mine I'm trying to optimize. For the major C-like languages (C, objC, C++, Java, C#, etc) and their usual compilers, will these two functions run just as efficiently? Is there any difference in the compiled code?

void foo1(bool flag)
{
    if (flag)
    {
        //Do stuff
        return;
    }

    //Do different stuff
}

void foo2(bool flag)
{
    if (flag)
    {
        //Do stuff
    }
    else
    {
        //Do different stuff
    }
}

Basically, is there ever a direct efficiency bonus/penalty when breaking or returning early? How is the stackframe involved? Are there optimized special cases? Are there any factors (like inlining or the size of "Do stuff") that could affect this significantly?

I'm always a proponent of improved legibility over minor optimizations (I see foo1 a lot with parameter validation), but this comes up so frequently that I'd like to set aside all worry once and for all.

And I'm aware of the pitfalls of premature optimization... ugh, those are some painful memories.

EDIT: I accepted an answer, but EJP's answer explains pretty succinctly why the use of a return is practically negligible (in assembly, the return creates a 'branch' to the end of the function, which is extremely fast. The branch alters the PC register and may also affect the cache and pipeline, which is pretty minuscule.) For this case in particular, it literally makes no difference because both the if/else and the return create the same branch to the end of the function.

There is no difference at all:

=====> cat test_return.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
    }
    else
        something2();
}
=====> cat test_return2.cpp
extern void something();
extern void something2();

void test(bool b)
{
    if(b)
    {
        something();
        return;
    }
    something2();
}
=====> rm -f test_return.s test_return2.s
=====> g++ -S test_return.cpp 
=====> g++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> rm -f test_return.s test_return2.s
=====> clang++ -S test_return.cpp 
=====> clang++ -S test_return2.cpp 
=====> diff test_return.s test_return2.s
=====> 

Meaning no difference in generated code whatsoever even without optimization in two compilers

What is happening here in this C++ code?

54 votes

Can anyone please explain what is going in this C++ code. It compiles and executes fine on Linux.

#include <iostream>
using namespace std;
int main = ( cout << "Hello world!\n", 195 );

The number "195" is the code of RET instruction on x86.

The C++ compiler (gcc in my case) is unable to recognize that "main" wasn't declared as a function. The compiler only sees that there is the "main" symbol, and presumes that it refers to a function.

The C++ code

int main = ( cout << "Hello world!\n", 195 );

is initializing a variable at file-scope. This initialization code is executed before the C/C++ environment calls main(), but after it initializes the "cout" variable. The initialization prints "Hello, world!\n", and sets the value of variable "main" to 195. After all initialization is done, the C/C++ environment makes a call to "main". The program returns immediately from this call because we put a RET instruction (code 195) at the address of "main".

Sample GDB output:

$ gdb ./a
(gdb) break _fini
Breakpoint 1 at 0x8048704
(gdb) print main
$1 = 0
(gdb) disass &main
Dump of assembler code for function main:
   0x0804a0b4 <+0>:     add    %al,(%eax)
   0x0804a0b6 <+2>:     add    %al,(%eax)
End of assembler dump.
(gdb) run
Starting program: /home/atom/a 
Hello world!

Breakpoint 1, 0x08048704 in _fini ()
(gdb) print main
$2 = 195
(gdb) disass &main
Dump of assembler code for function main:
   0x0804a0b4 <+0>:     ret    
   0x0804a0b5 <+1>:     add    %al,(%eax)
   0x0804a0b7 <+3>:     add    %al,(%eax)
End of assembler dump.

Are unused includes harmful in C/C++?

Asked on Thu, 27 Oct 2011 by anio c++
27 votes

What are the negative consequences of unused includes?

I'm aware they result in increased binary size (or do they?), anything else?

  • Increases compilation time (potentially serious issue)
  • Pollutes global namespace.
  • Potential clash of preprocessor names.
  • If unused headers are included from third-party libraries, it may make such libraries unnecessarily maintained as dependencies.

C++, variable declaration in 'if' expression

24 votes

What's going on here?

if(int a = Func1())
{
    // Works.
}

if((int a = Func1()))
{
    // Fails to compile.
}

if((int a = Func1())
    && (int b = Func2()))
)
{
    // Do stuff with a and b.
    // This is what I'd really like to be able to do.
}

Section 6.4.3 in the 2003 standard expains how variables declared in a selection statement condition have scope that extends to the end of the substatements controlled by the condition. But I don't see where it says anything about not being able to put parenthesis around the declaration, nor does it say anything about only one declaration per condition.

This limitation is annoying even in cases where only one declaration in the condition is required. Consider this.

bool a = false, b = true;

if(bool x = a || b)
{

}

If I want to enter the 'if"-body scope with x set to false then the declaration needs parenthesis (since the assignment operator has lower precedence than the logical OR), but since parenthesis can't be used it requires declaration of x outside the body, leaking that declaration to a greater scope than is desired. Obviously this example is trivial but a more realistic case would be one where a and b are functions returning values that need to be tested

So is what I want to do non-conformant to the standard, or is my compiler just busting my balls (VS2008)?

The condition in an if or while statement can be either an expression, or a single variable declaration (with initialisation).

Your second and third examples are neither valid statements, nor valid declarations, since a declaration can't form part of an expression. While it would be useful to be able to write code like your third example, it would require a significant change to the language syntax.

I don't see where it says anything about not being able to put parenthesis around the declaration, nor does it say anything about only one declaration per condition.

The syntax specification in 6.4/1 gives the following for the condition:

condition:
    expression
    type-specifier-seq declarator = assignment-expression

specifying a single declaration, with no parentheses or other adornments.

What is the type of lambda when deduced with "auto" in C++11?

20 votes

I had a perception that, type of a lambda is a function pointer. When I performed following test, I found it to be wrong (demo).

#define LAMBDA [] (int i) -> long { return 0; }
int main ()
{
  long (*pFptr)(int) = LAMBDA;  // ok
  auto pAuto = LAMBDA;  // ok
  assert(typeid(pFptr) == typeid(pAuto));  // assertion fails !
}

Is above code missing any point ? If not then, what is the typeof a lambda expression when deduced with auto keyword ?

The type of a lambda expression is unspecified.

But they are generally mere syntactic sugar for functors. A lambda is translated directly into a functor. Anything inside the [] are turned into constructor parameters and members of the functor object, and the parameters inside () are turned into parameters for the functor's operator().

A lambda which captures no variables (nothing inside the []'s) can be converted into a function pointer (MSVC2010 doesn't support this, if that's your compiler, but this conversion is part of the standard).

But the actual type of the lambda isn't a function pointer. It's some unspecified functor type.

How to measure elapsed time in C# and C++

18 votes

I have a simple C# and C++ code that computes a sum of dot products.

The C# code is:

using System;

namespace DotPerfTestCS
{
    class Program
    {
        struct Point3D
        {
            public double X, Y, Z;

            public Point3D(double x, double y, double z)
            {
                X = x;
                Y = y;
                Z = z;
            }
        }

        static void RunTest()
        {
            unchecked
            {
                const int numPoints = 100000;
                const int numIters = 100000000;

                Point3D[] pts = new Point3D[numPoints];
                for (int i = 0; i < numPoints; i++) pts[i] = new Point3D(i, i + 1, i + 2);

                var begin = DateTime.Now;
                double sum = 0.0;
                var u = new Point3D(1, 2, 3);
                for (int i = 0; i < numIters; i++)
                {
                    var v = pts[i % numPoints];
                    sum += u.X * v.X + u.Y * v.Y + u.Z * v.Z;
                }
                var end = DateTime.Now;
                Console.WriteLine("Sum: {0} Time elapsed: {1} ms", sum, (end - begin).TotalMilliseconds);
            }
        }

        static void Main(string[] args)
        {
            for (int i = 0; i < 5; i++) RunTest();
        }
    }
}

and the C++ is

#include <iostream>
#include <vector>
#include <time.h>

using namespace std;

typedef struct point3d
{
    double x, y, z;

    point3d(double x, double y, double z)
    {
        this->x = x;
        this->y = y;
        this->z = z;
    }
} point3d_t;

double diffclock(clock_t clock1,clock_t clock2)
{
    double diffticks=clock1-clock2;
    double diffms=(diffticks*10)/CLOCKS_PER_SEC;
    return diffms;
}

void runTest()
{
    const int numPoints = 100000;
    const int numIters = 100000000;

    vector<point3d_t> pts;
    for (int i = 0; i < numPoints; i++) pts.push_back(point3d_t(i, i + 1, i + 2));

    auto begin = clock();
    double sum = 0.0, dum = 0.0;
    point3d_t u(1, 2, 3);
    for (int i = 0; i < numIters; i++) 
    {
        point3d_t v = pts[i % numPoints];
        sum += u.x * v.x + u.y * v.y + u.z * v.z;
    }
    auto end = clock();
    cout << "Sum: " << sum << " Time elapsed: " << double(diffclock(end,begin)) << " ms" << endl;

}

int main()
{
    for (int i = 0; i < 5; i++) runTest();
    return 0;
}

The C# version (Release x86 with optimization on, x64 is even slower) output is

Sum: 30000500000000 Time elapsed: 551.0299 ms 
Sum: 30000500000000 Time elapsed: 551.0315 ms 
Sum: 30000500000000 Time elapsed: 552.0294 ms
Sum: 30000500000000 Time elapsed: 551.0316 ms 
Sum: 30000500000000 Time elapsed: 550.0315 ms

while C++ (default VS2010 Release build settings) yields

Sum: 3.00005e+013 Time elapsed: 4.27 ms
Sum: 3.00005e+013 Time elapsed: 4.27 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms
Sum: 3.00005e+013 Time elapsed: 4.25 ms

Now I would expect the C# code would be a little slower. But 130 times slower seems way too much to me. Can someone please explain to me what is going on here?

EDIT

I am not a C++ programmer and I just took the diffclock code somewhere from the internet without really checking if it's correct.

Using std::difftime the C++ results are

Sum: 3.00005e+013 Time elapsed: 457 ms
Sum: 3.00005e+013 Time elapsed: 452 ms
Sum: 3.00005e+013 Time elapsed: 451 ms
Sum: 3.00005e+013 Time elapsed: 451 ms
Sum: 3.00005e+013 Time elapsed: 451 ms

which seems about right.

Your diffclock code is wrong.

If you change your C++ code to use the std::clock and std::difftime it appears to show the actual runtime:

#include <iostream>
#include <vector>
#include <ctime>

using namespace std;

typedef struct point3d
{
    double x, y, z;

    point3d(double x, double y, double z)
    {
        this->x = x;
        this->y = y;
        this->z = z;
    }
} point3d_t;

void runTest()
{
    const int numPoints = 100000;
    const int numIters = 100000000;

    vector<point3d_t> pts;
    for (int i = 0; i < numPoints; i++) pts.push_back(point3d_t(i, i + 1, i + 2));

    auto begin = clock();
    double sum = 0.0, dum = 0.0;
    point3d_t u(1, 2, 3);
    for (int i = 0; i < numIters; i++) 
    {
        point3d_t v = pts[i % numPoints];
        sum += u.x * v.x + u.y * v.y + u.z * v.z;
    }
    auto end = clock();
    cout << "Sum: " << sum << " Time elapsed: " << double(std::difftime(end,begin)) << " ms" << endl;

}

int main()
{
    for (int i = 0; i < 5; i++) runTest();
    return 0;
}

Results:

Sum: 3.00005e+013 Time elapsed: 346 ms
Sum: 3.00005e+013 Time elapsed: 344 ms
Sum: 3.00005e+013 Time elapsed: 346 ms
Sum: 3.00005e+013 Time elapsed: 347 ms
Sum: 3.00005e+013 Time elapsed: 347 ms

That is running the application in default release mode optimizations, outside of vs2010.

EDIT

As others have pointed out, in C++ using clock() is not the most accurate way to time a function (as in C#, Stopwatch is better than DateTime).

If you're using windows, you can always use the QueryPerformanceCounter for high-resolution timing.

No O(1) operation to join elements from two forward_lists?

17 votes

When reading about forward_list in the FCD of C++11 and N2543 I stumbled over one specific overload of splice_after (slightly simplified and let cit be const_iterator):

void splice_after(cit pos, forward_list<T>& x, cit first, cit last);

The behavior is that after pos everything between (first,last) is moved to this. Thus:

  this: 1 2 3 4 5 6           x: 11 12 13 14 15 16
          ^pos                      ^first   ^last
will become:
  this: 1 2 13 14 3 4 5 6     x: 11 12       15 16
          ^pos                      ^first   ^last

The description includes the complexity:

Complexity: O(distance(first, last))

I can see that this is because one needs to adjust PREDECESSOR(last).next = pos.next, and the forward_list does not allow this to happen in O(1).

Ok, but isn't joining two singly linked lists in O(1) one of the strengths of this simple data structure? Therefore I wonder -- is there no operation on forward_list that splices/merges/joins an arbitrary number of elements in O(1)?

The algorithm would be quite simple, of course. One would just need a name for the operation (pseudocode): (Updated by integrating Kerreks answer)

temp_this  =   pos.next;
temp_that  =  last.next;
  pos.next = first.next;
 last.next =  temp_this;
first.next =  temp_that;

The result is a bit different, because not (first,last) is moved, but (first,last].

  this: 1 2 3 4 5 6 7               x: 11 12 13 14 15 16 17
          ^pos                            ^first      ^last
will become:
  this: 1 2 13 14 15 16 3 4 5 6 7   x: 11 12             17
          ^pos       ^last                ^first   

I would think this is an as reasonable operation like the former one, that people might would like to do -- especially if it has the benefit of being O(1).

  • Am I overlooking a operation that is O(1) on many elements?
  • Or is my assumption wrong that (first,last] might be useful as the moved range?
  • Or is there an error in the O(1) algorithm?

Your algorithm fails when you pass in end() as last because it will try to use the one-past-end node and relink it into the other list. It would be a strange exception to allow end() to be used in every algorithm except this one.

Also I think first.next = &last; needs to be first.next = last.next; because otherwise last will be in both lists.

C++11 lambdas: member variable capture gotcha

17 votes

Consider this code:

#include <memory>
#include <iostream>

class A
{
public:
    A(int data) : data_(data)
    { std::cout << "A(" << data_ << ")" << std::endl; }
    ~A() { std::cout << "~A()" << std::endl; }
    void a() { std::cout << data_ << std::endl; }
private:
    int data_;
};

class B
{
public:
    B(): a_(new A(13)) { std::cout << "B()" << std::endl; }
    ~B() { std::cout << "~B()" << std::endl; }
    std::function<void()> getf()
    {
        return [=]() { a_->a(); };
    }
private:
    std::shared_ptr<A> a_;
};

int main()
{
    std::function<void()> f;
    {
        B b;
        f = b.getf();
    }
    f();
    return 0;
}

Here it looks like I'm capturing a_ shared pointer by value, but when I run it on Linux (GCC 4.6.1), this is printed:

A(13)
B()
~B()
~A()
0

Obviously, 0 is wrong, because A is already destroyed. It looks like this is actually captured and is used to look up this->a_. My suspicion is confirmed when I change the capture list from [=] to [=,a_]. Then the correct output is printed and the lifetime of the objects is as expected:

A(13)
B()
~B()
13
~A()

The question:

Is this behaviour specified by the standard, implementation-defined, or undefined? Or I'm crazy and it's something entirely different?

Is this behaviour specified by the standard

Yes. Capturing member variables is always done via capturing this; it is the only way to access a member variable. In the scope of a member function a_ is equivalent to (*this).a_. This is true in Lambdas as well.

Therefore, if you use this (implicitly or explicitly), then you must ensure that the object remains alive while the lambda instance is around.

If you want to capture it by value, you must explicitly do so:

std::function<void()> getf()
{
    auto varA = a_;
    return [=]() { varA->a(); };
}

If you need a spec quote:

The lambda-expression’s compound-statement yields the function-body ( 8.4 ) of the function call operator, but for purposes of name lookup (3.4), determining the type and value of this (9.3.2) and transforming id-expressions referring to non-static class members into class member access expressions using (*this) ( 9.3.1 ), the compound-statement is considered in the context of the lambda-expression.

Is it legal to modify the result of std::string::op[]?

17 votes

Consider the following from C++11:

[n3290: 21.4.5]: basic_string element access                           [string.access]

const_reference operator[](size_type pos) const;
reference       operator[](size_type pos);

1     Requires: pos <= size().

2     Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

3     Throws: Nothing.

4     Complexity: constant time.

This means either:

  • The referenced value in the pos == size() case shall not be modified, or
  • In any case, the referenced value returned by op[] shall not be modified, even for the non-const overload.

The second scenario seems completely ridiculous, but I think it's what the wording most strongly implies.

Can we modify what we get from std::string::op[], or not? And is this not rather ambiguous wording?

The quote means that you cannot modify the return of operator[]( size() ), even if the value is well defined. That is, you must not modify the NUL terminator in the string even through the non-const overload.

This is basically your first option: i.e. pos >= size(), but because of the requirement pos <= size() the only possible value for that condition is pos == size().

The actual English description of the clause can be ambiguous (at least to me), but Appendix C, and in particular C.2.11 deals with changes in semantics in the string library, and there is no mention to this change --that would break user code. In C++03 the "referenced value shall not be modified" bit is not present and there is no ambiguity. The lack of mention in C.2.11 is not normative, but can be used as a hint that when they wrote the standard there was no intention on changing this particular behavior.

C++0x: callback vs lambda

16 votes

Suppose I have the following code that I wish to refactor:

int toFuture()
{
  precalc();
  int calc = 5 * foobar_x() + 3;
  postcalc();
  return calc;
}

int toPast()
{
  precalc();
  int calc = 5 * foobar_y() - 9;
  postcalc();
  return calc;
}

In classic-C, I would refactor this code into a worker() which accepts a function pointer that does the calculation: common code in worker(), specific code provided by function pointer.

With C++0x, should I be using a lambda instead? If so, how would I implement it, in this case?

Edit: it just crossed my mind that a template may also work. How would a template implementation compare against the other two?

One approach:

template<typename CalcFuncT>
int perform_calc(CalcFuncT&& calcfunc)
{
    precalc();
    int const calc = calcfunc();
    postcalc();
    return calc;
}

int main()
{
    perform_calc([]{ return 5 * foobar_x() + 3; }); // toFuture
    perform_calc([]{ return 5 * foobar_y() - 9; }); // toPast
}

Removing namespace from type name C++

15 votes

In C++, when we use typeid to get type name of an object or class, it will show a decorated(mangled) string. I use cxxabi to demangle it:

#include <cstdlib>
#include <iostream>
#include <string>
#include <cxxabi.h>
#include <typeinfo>

using namespace std;

namespace MyNamespace
{

class MyBaseClass
{
public:

    const string name()
    {
        string n;

        int status;
        char *realname = abi::__cxa_demangle(typeid (*this).name(), 0, 0, &status);
        n = realname;
        free(realname);

        return n;
    }

};

}

int main()
{

    MyNamespace::MyBaseClass h;

    cout << h.name() << endl;

    return 0;
}

The output in gcc is: MyNamespace::MyBaseClass

I need to remove MyNamespace:: from above string. i can remove them by string manipulating .

But is there a standard way with cxxabi or other libraries to do this or a clear solution?(At least portable between gcc and Visual C++)

I investigated cxxabi and other c++ libraries for this issue and there isn't any pre-defined method to remove namespaces from that(at least in my googling).

I wonder why you dont want to manipulate the string?!

the best solution and fully portable (tested in gcc and vc++) is simipily below:

return n;

return n.substr(n.find_last_of(':')+1);

When you search n for a colon from last (= lastColorPos) and capture string from lastColorPos to end, it is definetly the class name.

Dynamic Programming algorithm to find Heavy integers

15 votes

Recently I was asked the following question:

A non-negative integer is called heavy if the average value of its digits in decimal representation exceeds 7. For example the number 8698 is heavy, because the average value of its digits equal to (8+6+9+8)/4 = 7.75

Given two non-negative integers A and B find the number of heavy integers in the interval [A..B] (A and B included)

A and B are integers within the range [0..200,000,000].

...which is the same problem explained in this post:

Interview Question: Optimal Solution to the problem of finding Heavy integers

Now, during the interview I used the naive solution, and later that day improved it following the idea of "skipping the numbers that are surely not-heavy".

The problem is, in my interview the requested Time Complexity was O(log(A)+log(B))...which is not met by either of these solution, if I am not wrong.

So, out of curiosity, I wanted to ask you: can you think of any way to meet that complexity? Also, following this other post about the problem: http://math.stackexchange.com/questions/47329/calculating-heavy-numbers-in-a-given-range, it looks like this problem can be modeled as Dynamic Programming.....only I can't see how....any suggestions?

I have a solution for a sub-problem with additional restrictions:

1) I only use digits 1 to 9 (0 is complicated because you don't include it in the average when it starts a number).

2) I compute the numbers of combinations for "n" digits, so A and B are fixed (for n=3, A=111 and B=999).

I haven't checked how difficult it might be to generalize my solution with 0's and arbitrary A and B. Also, as I explain in the end, it doesn't look like my solution reduces the complexity down to O(log(B)).

I define a function c(n, s) which is the number (count) of n-digits "integers" with a sum above or equal to s. For example, c(1, 9) = 1 since only the single digit integer 9 satisfies i >= 9. More generally c(1, i) = 10 - i.

For c(2, s), we want all combinations {ij: 1 <= i <= 9, 1 <= j <= 9, i + j >= s}. We can use dynamic programming since we know the values of c(1, s). We will consider the 9 cases i = 1, .., 9 and figure out how many possible values of j there are from c(1, s). Let's try for example s = 18: there is only possible combination with i = 9 and the number of possible j given by c(1, 18 - 9) = 1. So c(2, 18) = 1, which is obvious since 99 is the only solution. For a general s, the equation that gives the number of combinations with a digit sum above or equal to s is:

c(2, s) = \sum_{i=1}^9 c(1, s - i) 

And you can convince your self more generally that for any number of digits n:

c(n, s) =  \sum_{i=1}^9 c(n - 1, s - i) 

Back to your original problem: The number of n-digits heavy "integers" is given by c(5, 5 * 7). Of course in order to get that value you'll first have to compute c(1, s), ... c(4, s) for all s's (but only a small number of s's are non-zero: for c(n, s) only s = n, ..., 10^n - 1 are non-zero).

For computational complexity, at each digit d, you compute c(d, s) for s = d, ..., 10^d - 1. Each term just requires a sum of 9 terms and you have 10^d - 1 - d such terms. So since d goes from 1 to n, you get $O(n * 10^n)$. But the number of digits n is just log B (where log is in base 10), so O(log(B) * B), which is not what you wanted.

Improved solution:
There is actually no need to compute all values of s at each digit. If we go back to the basic equation c(n, s) = \sum_{i=1}^9 c(n - 1, s - i) we can see that to get c(n, s) we do no need all values of c(n - 1, t), but only 9 such values (t = s - 1, ... , s - 9). Things get a bit complicated to measure the computational complexity since if you go to c(n - 2, k), you then need roughly 2 * 9 values and roughly d * 9 values for c(n - d, m). When you sum all those operations you get a computational complexity of O(n^2) = O(log(B)^2), which is still not quite what you wanted.

There's a "pruning" of terms going on too such that you'll actually get less than d * 9 values for c(n - d, k) (for example c(1, k) only has 9 possible values of k, much less than d * 9), but I doubt it's enough to bring the complexity down to O(log(B)).

Update:
I found a solution nearly identical to mine here. It's even better since there is apparently an exact formula for the function c(n, s), which they call f(n, s). It's not exactly the same function however since I simplified the problem. I haven't checked that it is exact either.

Merging interfaces, without merging

14 votes

I was thinking, does C++ or Java have a way to do something like this

Interface IF1{
    ....
};

Interface IF2{
    ....
};


function f(Object o : Implements IF1, IF2){
    ...
}

meaning a typesystem that allows you to require implementation of interfaces.

You can do this in Java:

public <I extends IF1 & IF2> void methodName(I i){

....

}

This way you force I to implement your two interfaces, otherwise it won't even compile.

Why do C# and VB.NET implicitly marshal char* differently?

8 votes

So I have a function, written in C++, that looks like this...

extern "C" __declspec(dllexport) int __stdcall SomeFunction(char *theData)
{
    // stuff
}

... and I'm using it in my current project (written in C#). There are other projects that use this function written in VB, looking like this:

Public Declare Function SomeFunction Lib "MyDLL.dll" _
    Alias "_SomeFunction@4" (ByVal theData As String) As Integer

So I tried writing an equivalent in C#, but found that using the string type didn't actually work for me - the string would come back with the same data I passed it in with. I tried using "ref string" instead to pass the string by reference and I got a memory access violation.

After doing some digging, I found that this was the correct implementation in C#:

[DllImport("MyDLL.dll", EntryPoint = "_SomeFunction@4")]
public static extern int SomeFunction(StringBuilder theData);

Now I know that VB.NET and C# are quite different, but I suppose I always assumed that strings were strings. If one language can marshal char* to String implicitly, why can't the other, requiring a different class altogether?

(edited the title for clarity)

Now I know that VB.NET and C# are quite different, but I suppose I always assumed that strings were strings

Strings are immutable in .net. Ask yourself why it is that ByVal passing of an immutable data type can result in the value changing. That doesn't happen for normal functions, just for Declare.

I'd guess it all has to do with maintaining some backwards compatibility with Declare statements from classic VB6 which were done this way. To my mind the black sheep here is the VB.net code rather than the C# code.

How to get total memory in bytes used by OpenGL in C++?

7 votes

How to get total memory in bytes used by OpenGL in C++?

I'm building an OpenGL application and the total memory used seems to be rising, I can get the info about the total memory used by variables & objects created by myself but can't guarantee how much memory OpenGL is using for its variables & objects & textures, etc. So is it possible to get the total memory in bytes used by OpenGL in C++?

In general, you don't. OpenGL is ultimately a hardware abstraction. And OpenGL simply doesn't provide a way to get that sort of information.

There are vendor-specific extensions that will give you ways to ask, though what you get back depends on the architecture. AMD hardware provides the ATI_meminfo extension. It breaks memory down into types of objects: buffer objects, textures, and renderbuffers.

NVIDIA provides the experimental extension NVX_gpu_memory_info. There's no information in the registry about how to use it, so I can't link you to anything.

In any case, the most effective way to know what the GPU is using is to just keep track of it yourself. Always use internal image formats with sizes; this means you can compute a pretty good estimate of how much memory a texture takes up. The same goes for buffer objects and so forth.

You won't get exact numbers, as padding, alignment, and the like can confound you. But you'll get something pretty decent.

Port ruby solution to C++

7 votes

Is there any way to do this in C++ especially the range section.

answer = (0..999).select { |a| a%3 ==0 || a%5==0 }
puts answer.inject { |sum, n| sum+n }

I have created my own c++ solution but using a more standard for loop and wondered if there was a cooler way to do it?

Template metaprogramming solution:

The following assumes the lower bound of the range is 0.

template <int N>
struct sum
{
  static const int value = sum<N-1>::value + (N % 3 == 0 || N % 5 == 0 ? N : 0);
};

template <>
struct sum<0>
{
  static const int value = 0;
};

int main(int argc, char** argv)
{
  int n = sum<999>::value;
  return 0;
}

The following will allow you to specify a range of numbers (e.g. 0-999, 20-400). I'm not a master of template metaprogramming so I couldn't think of a cleaner solution (and I did this for my own benefit and practice).

template <int N, int Upper, bool IsLast>
struct sum_range_helper
{
  static const int value = (N % 3 == 0 || N % 5 == 0 ? N : 0) + sum_range_helper<N + 1, Upper, N + 1 == Upper>::value;
};

template <int N, int Upper>
struct sum_range_helper<N, Upper, true>
{
  static const int value = (N % 3 == 0 || N % 5 == 0 ? N : 0);
};

template <int Lower, int Upper>
struct sum_range
{
  static const int value = sum_range_helper<Lower, Upper, Lower == Upper>::value;
};

int main(int argc, char** argv)
{
  int n = sum_range<0, 999>::value;
  return 0;
}

Can I use $1 in regex_replace?

7 votes

From reading the FCD for regex_replace (28.11.4) I can only guess that the function can also use parts of the original string for replacing? I can not test it with my gcc, is this correct?

using namespace std;
regex rx{ R"((\d+)-(\d+))" }; // regex: (\d+)-(\d+)
cout << regex_replace("123-456", rx, "b: $2, a:$1");
// "b: 456, a:123"

As you can see, I assume $1 and $2 refer to the "()" capturing groups (and not \1 and \2 like elsewhere).

Update. So, I guess this is a two-part question

  • Is this use of capturing groups in the replacement text supported at all?
  • Is the default ECMAScript syntax using $n? Or \n?

Table 139 in the C++ 2011 FDIS lists two constants that can be used to affect the rules used for the format string in regex_replace, format_default and format_sed. format_default is described as using "the rules used by the ECMAScript replace function in ECMA-262, part 15.5.4.11 String.prototype.replace." This standard does indicate the use of $ for backreferences. See: ECMA-262

Using the format_sed flag instead uses the rules for the sed utility in POSIX. Sed doesn't appear to support $ backreferences.

Why does std::sub_match<T> publicly inherit from std::pair<T, T>?

7 votes

I was reading the documentation of std::sub_match<BidirectionalIterator> and saw that it publicly inherits from std::pair<BidirectionalIterator, BidirectionalIterator>. Since a sub_match is simply a pair of iterators into a sequence of characters, with some additional functions, I can understand that it is implemented with a pair, but why use public inheritance?

The problem with inheriting publicly from std::pair<T,U> is the same as inheriting publicly from most other standard classes: they are not meant to be manipulated polymorphically (notably they do not define a virtual destructor). Other members will also fail to work properly, namely the assignment operator and the swap member function (they will not copy the matched member of sub_match).

Why did Boost developers and then the committee decided to implement sub_match by inheriting publicly from pair instead of using composition (or private inheritance with using declarations if they wanted to keep member access through first and second)?

It's an interesting question. Presumably, they considered it safe because no one would ever dynamically allocate one anyway. About the only way you're going to get sub_match objects is as a return value from some of the functions of basic_regex, or as copies of other sub_match, and all of these will be either temporaries or local variables.

Note that it's not safe to keep sub_match objects around anyway, since they contain iterators whose lifetime... doesn't seem to be specified in the standard. Until the match_results object is reused? Until the string operand to the function which filled in the match_results object is destructed? Or?

I'd still have avoided the public inheritence. But in this case, it's not as dangerous as it looks, because there's really no reason you'd ever want to dynamically allocate a sub_match.