Best ruby questions in May 2012

What are recursive arrays good for?

10 votes

Ruby supports recursive arrays (that is, self-containing arrays):

ruby-1.9.2-p180 :156 > a = []
 => [] 
ruby-1.9.2-p180 :157 > a << a
 => [[...]] 
ruby-1.9.2-p180 :158 > a.first == a
 => true 

This is intrinsically cool, but what work can you do with it?

A directed graph with undifferentiated edges could have each vertex represented simply as an array of the the vertices reachable from that vertex. If the graph had cycles, you would have a 'recursive array', especially if an edge could lead back to the same vertex.

For example, this graph:
directed cyclic graph
...could be represented in code as:

nodes = { a:[], b:[], c:[], d:[] }
nodes[:a] << nodes[:a]
nodes[:a] << nodes[:b]
nodes[:b] << nodes[:a]
nodes[:b] << nodes[:c]
p nodes
#=> {:a=>[[[...], []], [...]], :b=>[[[...], [...]], []], :c=>[], :d=>[]}

Usually the representation of each vertex would be more 'robust' (e.g. a class instance with properties for the name and array of outgoing edges), but it's not impossible to imagine a case where you wanted a very lightweight representation of your data (for very large graphs) and so needed to use a minimal representation like this.

capistrano - NameError: uninitialized constant Net::SSH::KnownHosts::SUPPORTED_TYPE

9 votes

I'm trying to deploy my Rails (3.1.3) application to the preprod env. I use capistrano (2.12.0) and rvm-capistrano (1.2.2).

When I call bundle exec cap ssh it works fine. But when I call bundle exec cap deploy I get the following trace:

$ cap deploy
    triggering start callbacks for `deploy'
  * 18:42:19 == Currently executing `multistage:ensure'
*** Defaulting to `preprod'
  * 18:42:19 == Currently executing `preprod'
  * 18:42:19 == Currently executing `deploy'
  * 18:42:19 == Currently executing `deploy:update'
 ** transaction: start
  * 18:42:19 == Currently executing `deploy:update_code'
  * 18:42:19 == Currently executing `deploy:set_previous_revision'
  * executing "cd /rails_apps/com.example.preprod/current; git rev-parse --short HEAD"
    servers: ["preprod.example.com"]
connection failed for: preprod.example.com (NameError: uninitialized constant Net::SSH::KnownHosts::SUPPORTED_TYPE)

Of course example.com is a placeholder, it doesn't come from a mistake in the capistrano config.

Any idea of what could cause that ?

I'm using RVM with Ruby 1.9.3-p194.

Thanks !

Reverting back from net-ssh 2.5.1 to 2.4.0 seems to solve the problem for now.

Why does Ruby tend to assign object IDs in descending order?

8 votes

I've noticed that objects have their IDs assigned in a counterintuitive fashion. The earlier an object is created, the greater its object ID. I would have thought they would have been assigned in ascending order, rather than the other way around.

For example:

obj1 = Object.new
obj2 = Object.new
obj3 = Object.new

p obj1.object_id # => 4806560
p obj2.object_id # => 4806540
p obj3.object_id # => 4806520

Why are they assigned in such a way and also why is there a step of 20, rather than 1 in code run by the Ruby interpreter, but a vastly greater difference between object IDs for code run by Ruby's irb?

Handwaving over many details, ruby allocates a chunk of the heap to put objects in:

1 | 2 | 3 | 4 | 5

Then traverses them in-order and adds them to a linked-list of free objects. This causes them to be in reverse order on the linked-list:

freelist → NULL
freelist → 1 → NULL
freelist → 2 → 1 → NULL
freelist → 3 → 2 → 1 → NULL
freelist → 4 → 3 → 2 → 1 → NULL
freelist → 5 → 4 → 3 → 2 → 1 → NULL

When allocating an object ruby uses the first item on the linked list:

object = freelist
freelist = object.next_free

So the freelist now looks like:

freelist → 4 → 3 → 2 → 1 → NULL

and further allocated objects will appear in reverse order across small allocations.

When ruby needs to allocate a new chunk of heap to store more objects you'll see the object_id jump up then run down again.

Why return an enumerable?

8 votes

I''m curious about why ruby returns an Enumerable instead of an Array for something that seems like Array is an obvious choice. For example:

'foo'.class
# => String

Most people think of a String as an array of chars.

'foo'.chars.class
# => Enumerator

So why does String#chars return an Enumerable instead of an Array? I'm assuming somebody put a lot of thought into this and decided that Enumerable is more appropriate but I don't understand why.

This completely in accordance with the spirit of 1.9: to return enumerators whenever possible. String#bytes, String#lines, String#codepoints, but also methods like Array#permutation all return an enumerator.

In ruby 1.8 String#to_a resulted in an array of lines, but the method is gone in 1.9.

I'm using rbenv, so why are there two Gem paths on my system? (OS X Lion)

7 votes

To clarify, I'm using rbenv to manage my ruby versions. I was under the impression that binaries are managed as shims in their respective ruby version directory.

Here is what my system shows when I run gem environment (I'm excluding the irrelevant parts):

 - GEM PATHS:
     - /Volumes/Data/nathan/.rbenv/versions/1.9.3-p194/lib/ruby/gems/1.9.1
     - /Volumes/Data/nathan/.gem/ruby/1.9.1

Any reason for having two locations? Curious minds want to know.

I think I figured out the answer to this question, so I'll post it.

Rbenv allows for a global and/or local version of ruby. So once a ruby is installed and managed via rbenv, you can declare it as a global ruby version used by your entire system.

Likewise, you can declare a local ruby version within a given directory (ex: a specific rails project).

The .gem file in your home path is used by the global ruby version, where as the one tucked away in the rbenv directory is used by the local ruby version.

Note, you can (for whatever reason) declare a local version that is the same as the global version. In that case, the local version will rely on the gem files that are in the deeper rbenv directory, instead of the ~/.gem directory.

How can I speed up the creation of 5,000 records for my rspec tests?

7 votes

I am using Ruby on Rails 3.2.2, FactoryGirl 3.1.0, FactoryGirlRails 3.1.0, Rspec 2.9.0 and RspecRails 2.9.0. In order to test my application I have to create a lot of records (about 5000) in the database, but that operation is very slow (it takes more than 10 minutes to create records). I proceed like this:

before(:each) do
  5000.times do
    FactoryGirl.create(:article,)
  end
end

How can I improve my spec code so to go faster?

Note: Maybe the slowness is given by (5) article callbacks that run before and after each article creation process, but I can skip those (since the only things I have to test are articles and not associated models) if those slow creation of records... is it possible to make that and is it the right way to proceed?

Doing it your way Rails creates a transaction for every insert. There are a number of ways to improve your bulk inserts.

Described here:

http://www.coffeepowered.net/2009/01/23/mass-inserting-data-in-rails-without-killing-your-performance/

How to find "essential" methods to provide an interface of Ruby mixins?

7 votes

The horribleness of the title of the question is what I'm trying to solve. Example:

in Ruby, Enumerable is an interface in a sense that I can implement something and document it as:

def myfancymethod(please_pass_me_an_Enumerable_here)

but on the other hand, Enumerable is a kind of amplification of the interface that has #each as one of it's methods. If I have a class

class Foo
  def each
    :bar
  end
end

For those unfamiliar with Ruby, if you mixin Enumerable module in a class, you get dozens of methods that only rely on #each method to provide things like #map, #select, etc.

I could say my Foo class is Enumerable-able or Enumerable-compatible or what? What terms describe an answer to "What does it take to be an Enumerable?", "Well you have to have #each"

Similarly, in Ruby

(Array.new.methods - Object.new.methods).size # 111

Does that mean that to fake an Array interface, I have to implement 111 methods? No way, but how to I find out what methods are the "essence" of Array. is it just #[], #[]= and #size? How to make sense of it?

You might be interested in this feature request, which suggests some improvements to the architecture of the widely used Hash class.

The sad truth is forget about it. At this point Ruby has nothing like this. Enumerable and Comparable are about as close as it gets and their "contract" is merely a matter of documentation.

By the way, I believe #size is the other method that Enumerable can make use of, though it is optional.

Ruby return statement does not work with super keyword?

6 votes
class Parent
  def test
    return
  end
end

class Child < Parent
  def test
    super
    p "HOW IS THIS POSSIBLE?!"
  end
end

c = Child.new
c.test

I though that, since the test method from the Parent class immediately uses the return statement, it should not be possible to print the line of the Child class. But it is indeed printed. Why is that?

Ruby 1.8.7, Mac OSX.

super acts like a method call that calls the superclass's method implementation. In your example, the return keyword returns from Parent::test and continues executing Child::test, just like any other method call would.

Is it possible to refer to a parameter passed to a method within the passed block in ruby?

6 votes

I hope I am not repeating anyone here, but I have been searching google and here and not coming up with anything. This question is really more a matter of "sexifying" my code.

What I am specifically trying to do is this:

Dir.new('some_directory').each do |file|
  # is there a way to refer to the string 'some_directory' via a method or variable?
end

Thanks!

Not in general; it's totally up to the method itself what arguments the block gets called with, and by the time each has been called (which calls your block), the fact that the string 'some_directory' was passed to Dir.new has been long forgotten, i.e. they're quite separate things.

You can do something like this, though:

Dir.new(my_dir = 'some_directory').each do |file|
    puts "#{my_dir} contains #{file}"
end

State Machine, Model Validations and RSpec

6 votes

Here's my current class definition and spec:

class Event < ActiveRecord::Base

  # ...

  state_machine :initial => :not_started do

    event :game_started do
      transition :not_started => :in_progress
    end

    event :game_ended do
      transition :in_progress => :final
    end

    event :game_postponed do
      transition [:not_started, :in_progress] => :postponed
    end

    state :not_started, :in_progress, :postponed do
      validate :end_time_before_final
    end
  end

  def end_time_before_final
    return if end_time.blank?
    errors.add :end_time, "must be nil until event is final" if end_time.present?
  end

end

describe Event do
  context 'not started, in progress or postponed' do
    describe '.end_time_before_final' do
      ['not_started', 'in_progress', 'postponed'].each do |state|
        it 'should not allow end_time to be present' do
          event = Event.new(state: state, end_time: Time.now.utc)
          event.valid?
          event.errors[:end_time].size.should == 1
          event.errors[:end_time].should == ['must be nil until event is final']
        end
      end
    end
  end
end

When I run the spec, I get two failures and one success. I have no idea why. For two of the states, the return if end_time.blank? statement in the end_time_before_final method evaluates to true when it should be false each time. 'postponed' is the only state that seems to pass. Any idea as to what might be happening here?

It looks like you're running into a caveat noted in the documentation:

One important caveat here is that, due to a constraint in ActiveModel's validation framework, custom validators will not work as expected when defined to run in multiple states. For example:

 class Vehicle
   include ActiveModel::Validations

   state_machine do
     ...
     state :first_gear, :second_gear do
       validate :speed_is_legal
     end
   end
 end

In this case, the :speed_is_legal validation will only get run for the :second_gear state. To avoid this, you can define your custom validation like so:

 class Vehicle
   include ActiveModel::Validations

   state_machine do
     ...
     state :first_gear, :second_gear do
       validate {|vehicle| vehicle.speed_is_legal}
     end
   end
 end

Run system command in ruby and interact with it

6 votes

I need to run a command on the command-line that asks for a user response. In case it helps the command is:

gpg --recipient "Some Name" --encrypt ~/some_file.txt

when you run this, it warns about something then asks:

Use this key anyway? (y/N)

Responding 'y' let's it finish correctly. I have been trying to use the open4 gem but I have not been able to get it to specify the 'y' correctly. Here is what I tried:

Open4::popen4(cmd) do |pid, stdin, stdout, stderr|
  stdin.puts "y"
  stdin.close

  puts "pid        : #{ pid }"
  puts "stdout     : #{ stdout.read.strip }"
  puts "stderr     : #{ stderr.read.strip }"
end

What am I doing wrong? Is what I am doing even possible?

The Unix way to handle these situations is with expect, which Ruby comes with built-in support for:

require 'pty'
require 'expect'

PTY.spawn("your command here") do |reader, writer|
  reader.expect(/Use this key anyway/, 5) # cont. in 5s if input doesn't match
  writer.puts('y')
  puts "cmd response: #{reader.gets}"
end

Elegantly implementing 'map (+1) list' in ruby

6 votes

The short code in title is in Haskell, it does things like

list.map {|x| x + 1}

in ruby.

While I know that manner, but what I want to know is, is there any more elegant manners to implement same thing in ruby like in Haskell.

I really love the to_proc shortcut in ruby, like this form:

[1,2,3,4].map(&:to_s)
[1,2,3,4].inject(&:+)

But this only accept exactly matching argument number between the Proc's and method.

I'm trying to seek a way that allow passing one or more arguments extra into the Proc, and without using an useless temporary block/variable like what the first demonstration does.

I want to do like this:

[1,2,3,4].map(&:+(1))

Does ruby have similar manners to do this?

Use the ampex gem, which lets you use methods of X to build up any proc one one variable. Here’s an example from its spec:

["a", "b", "c"].map(&X * 2).should == ["aa", "bb", "cc"]

Ruby date equation not returning expected truth value

5 votes

Why do the following differ?

Time.now.end_of_day      == Time.now.end_of_day - 0.days      # false
Time.now.end_of_day.to_s == Time.now.end_of_day - 0.days.to_s # true

Because the number of nanoseconds is different:

ruby-1.9.2-p180 :014 > (Time.now.end_of_day - 0.days).nsec
 => 999999000 
ruby-1.9.2-p180 :015 > Time.now.end_of_day.nsec
 => 999999998 

Memory bloat when creating many new objects

5 votes

When I run this and then watch the memory consumption of my ruby process in OSX Activity Monitor, the memory increases at about 3 MB/s.

If I remove the transaction it about halves the memory consumption but still, the memory footprint keeps going up. I have an issue on my production app where Heroku kills the process because of its memory consumption.

Is there a way of doing the below, in a way that won't increase memory? If I comment out the .save line then it's okay but of course this isn't a solution.

ActiveRecord::Base.transaction do
  10000000.times do |time|
    puts "---- #{time} ----"
    a = Activity.new(:name => "#{time} Activity")
    a.save!(:validate => false)
    a = nil
  end
end

I am running this using delayed_job.

The a = nil line is unnecessary and you can remove that.

You're creating a lot of objects every time you loop - two strings, two hashes, and an Activity object so I'm not surprised you're experiencing high memory usage, especially as you're looping 10 million times! There doesn't appear to be a more memory efficient way to write this code.

The only way I can think of to reduce memory usage is to manually start the garbage collector every x number of iterations. Chances are Ruby's GC isn't being aggressive enough. You don't, however, want to invoke it every iteration as this will radically slow your code. Maybe you could use every 100 iterations as a starting point and go from there. You'll have to profile and test what is most effective.

The documentation for the GC is here.