Best arrays questions in October 2011

Why presize a JavaScript Array?

18 votes

Firebug represents (new Array(N)) as an array with N undefineds in it. I recently ran across a scenario that demonstrated that a sized array with all undefined values in it is different from a newly constructed, sized array. I'd like to understand the difference.

Suppose you want to generate a list of random integers between 0 and 1000.

function kilorange() {
    return Math.floor(Math.random() * (1001));
}

no_random_numbers = (new Array(6)).map(kilorange);
my_random_numbers = [undefined, undefined, undefined,
                     undefined, undefined, undefined].map(kilorange);

I would have expected no_random_numbers and my_random_numbers to be equivalent, but they're not. no_random_numbers is another array of undefineds, whereas my_random_numbers is an array with six random integers in it. Furthermore, after throwing a console.count statement into kilorange, I learned that my function never gets called for the array created with the Array constructor.

What is the difference, and why does map (and presumably other iterable methods) not treat the above arrays the same?

The ES standard (15.4.4.19) defines the algorithm for map, and it's quite clear from step 8b that since your array doesn't actually have any of those elements, it will return an "empty" array with length 6.

As others have mentioned, it has to do with array objects in js being (unlike their rigid C counterparts) very dynamic, and potentially sparse (see 15.4 for the sparsity test algorithm).

How Are C Arrays Represented In Memory?

15 votes

I believe I understand how normal variables and pointers are represented in memory if you are using C.

For example, it's easy to understand that a pointer Ptr will have an address, and its value will be a different address, which is the space in memory it's pointing to. The following code:

int main(){
    int x = 10;
    int *Ptr;
    Ptr = &x;
return 0;
}

Would have the following representation in memory:

+---------------------+-------------+---------+
| Variable Name       | Address     | Value   | 
+---------------------+-------------+---------+
| x                   | 3342        | 10      |
+---------------------+-------------+---------+
| Ptr                 | 5466        | 3342    |
+---------------------+-------------+---------+

However I find it difficult to understand how arrays are represented in memory. For example the code:

int main(){
    int x[5];
        x[0]=12;
        x[1]=13;
        x[2]=14;

    printf("%p\n",(void*)x);
    printf("%p\n",(void*)&x);

return 0;
}

outputs the same address twice (for the sake of simplicity 10568). Meaning that x==&x. Yet *x (or x[0] in array notation) is equal to 12, *(x+1) (or x[1] in array notation) is equal to 13 and so on. How can this be represented? One way could be this:

+---------------------+-------------+----------+----------------------+
| Variable Name       | Address     | Value    | Value IF array       |
+---------------------+-------------+----------+----------------------+
| x                   | 10568       | 10568    | 12                   |
+---------------------+-------------+----------+----------------------+
|                     | 10572       |          | 13                   | 
+---------------------+-------------+----------+----------------------+
|                     | 10576       |          | 14                   | 
+---------------------+-------------+----------+----------------------+
|                     | 10580       |          | trash                | 
+---------------------+-------------+----------+----------------------+
|                     | 10584       |          | trash                | 
+---------------------+-------------+----------+----------------------+

Is this close to what happens, or completely off?

An array is a block of contiguous objects with no spaces in between. This means that x in your second example is represented in memory as:

+---------------------+-------------+---------+
| Variable Name       | Address     | Value   | 
+---------------------+-------------+---------+
| x                   | 10568       | 12      |
|                     |             +---------+
|                     |             | 13      |
|                     |             +---------+
|                     |             | 14      |
|                     |             +---------+
|                     |             | ??      |
|                     |             +---------+
|                     |             | ??      |
+---------------------+-------------+---------+

That is, x is five ints big, and has a single address.

The weird part about arrays isn't in how they're stored - it's how they're evaluated in expressions. If you use an array name somewhere that it isn't the subject of the unary & or sizeof operators, it evaluates to the address of its first member.

That is, if you just write x, you will get a value 10568 with type int *.

If, on the other hand you write &x, then the special rule doesn't apply - so the & operator works like it normally does, which means that it fetches the address of the array. In the example, this will be a value 10568 with type int (*)[5].

The reason that x == &x is that the address of the first member of an array is necessarily equal to the address of the array itself, since an array starts with its first member.

Combining arrays of strings together

12 votes

I'm looking to combine the contents of two string arrays, into a new list that has the contents of both joined together.

string[] days = { "Mon", "Tue", "Wed" };
string[] months = { "Jan", "Feb", "Mar" };

// I want the output to be a list with the contents
// "Mon Jan", "Mon Feb", "Mon Mar", "Tue Jan", "Tue Feb" etc...

How can I do it ? For when it's only two arrays, the following works and is easy enough:

List<string> CombineWords(string[] wordsOne, string[] wordsTwo)
{
    var combinedWords = new List<string>();
    foreach (var wordOne in wordsOne)
    {
        foreach (string wordTwo in wordsTwo)
        {
            combinedWords.Add(wordOne + " " + wordTwo);
        }
    }
    return combinedWords;
}

But I'd like to be able to pass varying numbers of arrays in (i.e. to have a method with the signature below) and have it still work.

List<string> CombineWords(params string[][] arraysOfWords)
{
    // what needs to go here ?
}

Or some other solution would be great. If it's possible to do this simply with Linq, even better!

Code below works for any number of arrays (and uses linq to some degree):

List<string> CombineWords(params string[][] wordsToCombine)
{
     if (wordsToCombine.Length == 0)
         return new List<string>();

     IEnumerable<string> combinedWords = wordsToCombine[0].ToList();
     for (int i = 1; i < wordsToCombine.Length; ++i)
     {
         var temp = i;
         combinedWords = (from x in combinedWords from y in wordsToCombine[temp]
                       select x + " " + y);
     }
     return combinedWords.ToList();
 }

php foreach, why using pass by reference of a array is fast?

10 votes

Below is a test of php foreach loop of a big array, I thought that if the $v don't change, the real copy will not happen because of copy on write, but why it is fast when pass by reference?

Code 1:

function test1($a){
  $c = 0;
  foreach($a as $v){ if($v=='xxxxx') ++$c; }
}

function test2(&$a){
  $c = 0;
  foreach($a as $v){ if($v=='xxxxx') ++$c; }
}

$x = array_fill(0, 100000, 'xxxxx');

$begin = microtime(true);
test1($x);
$end1 = microtime(true);
test2($x);
$end2 = microtime(true);

echo $end1 - $begin . "\n";   //0.03320002555847
echo $end2 - $end1;           //0.02147388458252

But this time, using pass by reference is slow.

Code 2:

function test1($a){
  $cnt = count($a); $c = 0;
  for($i=0; $i<$cnt; ++$i)
    if($a[$i]=='xxxxx') ++$c;
}
function test2(&$a){
  $cnt = count($a); $c = 0;
  for($i=0; $i<$cnt; ++$i)
    if($a[$i]=='xxxxx') ++$c;
}
$x = array_fill(0, 100000, 'xxxxx');

$begin = microtime(true);
test1($x);
$end1 = microtime(true);
test2($x);
$end2 = microtime(true);

echo $end1 - $begin . "\n";   //0.024326801300049
echo $end2 - $end1;           //0.037616014480591

Can someone explain why passing by reference is fast in code1 but slow in code2?

Edit: With Code 2, the count($a) makes the main difference, so the time of the loop took is almost the same.

I thought that if the $v don't change [foreach($a as $v)], the real copy will not happen because of copy on write, but why it is fast when pass by reference?

The impact is not on $v but on $a, the huge array. You either pass it as value or as reference to the function. Inside the function it's then value (test1) or reference (test2).

You have two codes (code 1 and code 2).

Code 1: Is using foreach. With foreach you've got two options: iterate over a value or a reference (Example). When you iterate over a value, the iteration is done on a copy of the value. If you iterate over a reference, no copy is done.

As you use the reference in test2, it's faster. The values do not need to be copied. But in test1, you pass the array as value, the array gets copied.

Code 2: Is using for. For does nothing actually here. In both cases. You access the variable and read value from the array. That's pretty much the same regardless if it's a reference or a copy (thanks to the copy on write optimization in PHP).

You might now wonder, why there is a difference in code 2. The difference is not because of for but because of count. If you pass a reference to count PHP internally creates a copy of it because it count needs a copy, not a reference.

Read as well: Do not use PHP references by Johannes Schlüter


I've compiled a set of tests as well. But I more specifically put code into the test functions.

  • Blank - What's the difference in calling the function?
  • Count - Does count make a difference?
  • For - What happens with foronly (not count)?
  • Foreach - Just foreach - even breaking on first element.

Every test is in two versions, one called _copy (passing the array as copy into the function) and one called _ref (passing the array as reference).

It's not always that these micro-benchmarks tell you the truth, but if you're able to isolate specific points, you can quite well do an educated guess, for example that not for but count had the impact:

function blank_copy($a){
}
function blank_ref(&$a){
}
function foreach_copy($a){
    foreach($a as $v) break;
}
function foreach_ref(&$a){
    foreach($a as $v) break;
}
function count_copy($a){
  $cnt = count($a);
}
function count_ref(&$a){
  $cnt = count($a);
}
function for_copy($a){
    for($i=0;$i<100000;$i++)
        $a[$i];
}
function for_ref(&$a){
    for($i=0;$i<100000;$i++)
        $a[$i];
}

$tests = array('blank_copy', 'blank_ref', 'foreach_copy', 'foreach_ref', 'count_copy', 'count_ref', 'for_copy', 'for_ref');


$x = array_fill(0, 100000, 'xxxxx');
$count = count($x);
$runs = 10;

ob_start();

for($i=0;$i<10;$i++)
{
    shuffle($tests);
    foreach($tests as $test)
    {
        $begin = microtime(true);
        for($r=0;$r<$runs;$r++)
            $test($x);
        $end = microtime(true);
        $result = $end - $begin;
        printf("* %'.-16s: %f\n", $test, $result);
    }
}

$buffer = explode("\n", ob_get_clean());
sort($buffer);
echo implode("\n", $buffer);

Output:

* blank_copy......: 0.000011
* blank_copy......: 0.000011
* blank_copy......: 0.000012
* blank_copy......: 0.000012
* blank_copy......: 0.000012
* blank_copy......: 0.000015
* blank_copy......: 0.000015
* blank_copy......: 0.000015
* blank_copy......: 0.000015
* blank_copy......: 0.000020
* blank_ref.......: 0.000012
* blank_ref.......: 0.000012
* blank_ref.......: 0.000014
* blank_ref.......: 0.000014
* blank_ref.......: 0.000014
* blank_ref.......: 0.000014
* blank_ref.......: 0.000015
* blank_ref.......: 0.000015
* blank_ref.......: 0.000015
* blank_ref.......: 0.000015
* count_copy......: 0.000020
* count_copy......: 0.000022
* count_copy......: 0.000022
* count_copy......: 0.000023
* count_copy......: 0.000024
* count_copy......: 0.000025
* count_copy......: 0.000025
* count_copy......: 0.000025
* count_copy......: 0.000026
* count_copy......: 0.000031
* count_ref.......: 0.113634
* count_ref.......: 0.114165
* count_ref.......: 0.114390
* count_ref.......: 0.114878
* count_ref.......: 0.114923
* count_ref.......: 0.115106
* count_ref.......: 0.116698
* count_ref.......: 0.118077
* count_ref.......: 0.118197
* count_ref.......: 0.123201
* for_copy........: 0.190837
* for_copy........: 0.191883
* for_copy........: 0.193080
* for_copy........: 0.194947
* for_copy........: 0.195045
* for_copy........: 0.195944
* for_copy........: 0.198314
* for_copy........: 0.198878
* for_copy........: 0.200016
* for_copy........: 0.227953
* for_ref.........: 0.191918
* for_ref.........: 0.194227
* for_ref.........: 0.195952
* for_ref.........: 0.196045
* for_ref.........: 0.197392
* for_ref.........: 0.197730
* for_ref.........: 0.201936
* for_ref.........: 0.207102
* for_ref.........: 0.208017
* for_ref.........: 0.217156
* foreach_copy....: 0.111968
* foreach_copy....: 0.113224
* foreach_copy....: 0.113574
* foreach_copy....: 0.113575
* foreach_copy....: 0.113879
* foreach_copy....: 0.113959
* foreach_copy....: 0.114194
* foreach_copy....: 0.114450
* foreach_copy....: 0.114610
* foreach_copy....: 0.118020
* foreach_ref.....: 0.000015
* foreach_ref.....: 0.000016
* foreach_ref.....: 0.000016
* foreach_ref.....: 0.000016
* foreach_ref.....: 0.000018
* foreach_ref.....: 0.000019
* foreach_ref.....: 0.000019
* foreach_ref.....: 0.000019
* foreach_ref.....: 0.000019
* foreach_ref.....: 0.000020

How to flatten array in jQuery?

10 votes

How to simply flatten array in jQuery? I have: [1, 2, [3, 4], [5, 6], 7] And want: [1, 2, 3, 4, 5, 6, 7]

You can use jQuery.map, which is the way to go if you have the jQuery Library already loaded.

$.map( [1, 2, [3, 4], [5, 6], 7], function(n){
   return n;
});

Returns

[1, 2, 3, 4, 5, 6, 7]

How do I quickly reorder a Ruby Array given an order?

9 votes

I have an array of values, and an array which determines the order.

How can I quickly re-arrange the array in the given order?

data = ['0','1','2','3','4','5']

order = [3,1,2,0,4,5]

I want:

data = ['3','1','2','0','4','5']

You can use the specifically for this kind of task designed method values_at:

data = ['0','1','2','3','4','5']
order = [3,1,2,0,4,5]

data.values_at *order
# => ["3", "1", "2", "0", "4", "5"] 

What are the advantages of using Ruby NArray over Array?

5 votes

I just came across the NArray library for Ruby -- please excuse my ignorance when asking this question :)

What are the advantages of using the NArray library over the standard Ruby Array implementation?

I've seen that NArray is geared towards numerical computing, but looking at the API, it looks like there are only a few extensions over Array geared towards numerical values -- nothing that you couldn't do with Array..

  1. Why not just use Array?
  2. Is there a huge speed advantage?
  3. Is there a huge memory advantage?
  4. Any other advantages over using the regular Ruby Array class?

Google didn't really come up with a useful explanation of this question.

References I found:

http://rubydoc.info/gems/narray-ruby19/0.5.9.7/NArray

http://narray.rubyforge.org/index.html.en

http://raa.ruby-lang.org/project/narray/

See also the slide about NArray: http://www.slideshare.net/masa16tanaka/narray-and-scientific-computing-with-ruby

it looks like there are only a few extensions over Array

No, it's completely different from Array. NArray has many numerical functions and multi-dimensional features. On the other hand, NArray is static; it does not have push/pop methods, etc. NArray's method list is http://narray.rubyforge.org/SPEC.en

_1. Why not just use Array?

Array holds Ruby Objects. It is inefficient to hold numerical values.

_2. Is there a huge speed advantage?

Yes. p.36 of the above slide shows NArray is up to 50 times faster.

Note that Array is faster than NArray if the loop is written in Ruby.

_3. Is there a huge memory advantage?

Yes. As for Float values, Array consumes about 4 times more memory than NArray on my 64bit Linux machine.

_4. Any other advantages over using the regular Ruby Array class?

  • Support of multi-dimensional array
  • Support of Numerical functions
  • No need for garbage collection on Array items. GC takes large time for large Arrays.
  • etc.