Best ruby questions in September 2011

ruby array internals

8 votes

How are ruby arrays internally implemented (mainly in CRuby, but any other info is welcomed)?

Are they growable arrays like a c++ vector or are they list based? What's the complexity of shift/unshift and accessing an element by index?

They're growable arrays which "grow at the end".

shift is O(1), unshift is O(n) and accessing by index is O(1). To the best of my knowledge this holds true for all ruby implementations, but it definitely does in MRI.

What runs faster in Ruby: defining the alias method or using alias_method?

7 votes

What is faster on later invocation:

def first_method?() second_method?() end

or

alias_method :first method, :second_method

and if possible why?

(NOTE: I don't ask what is nicer / better etc. -> only raw speed and why it is faster is interesting here)

At least in Ruby 1.8.6, aliasing seems to be faster:

#!/usr/local/bin/ruby

require 'benchmark'

$global_bool = true

class Object 
  def first_method?
    $global_bool
  end

  def second_method?
    first_method?
  end 

  alias_method :third_method?, :first_method?
end

Benchmark.bm(7) do |x|
  x.report("first:")  { 1000000.times { first_method?  }}
  x.report("second:") { 1000000.times { second_method? }}
  x.report("third:")  { 1000000.times { third_method?  }}
end

results in :

$ ./test.rb
             user     system      total        real
first:   0.281000   0.000000   0.281000 (  0.282000)
second:  0.469000   0.000000   0.469000 (  0.468000)
third:   0.281000   0.000000   0.281000 (  0.282000)

Obviously, you have one method call less (look-up receiver ...). So it seems natural for it to be faster.

rails omniauth and UTF-8 errors

6 votes

I had a recent error using omniauth trying to populate some fields from Google's login

Encoding::CompatibilityError: incompatible character encodings: ASCII-8BIT and UTF-8

"omniauth"=>
  {"user_info"=>
    {"name"=>"Joe McÙisnean",
     "last_name"=>"McÙisnean",
     "first_name"=>"Joe",
     "email"=>"someemail@gmail.com"},
   "uid"=>
    "https://www.google.com/accounts/o8/id?id=AItOawnQmfdfsdfsdfdsfsdhGWmuLTiX2Id40k",
   "provider"=>"google_apps"}

In my user model

  def apply_omniauth(omniauth)
    #add some info about the user
    self.email = omniauth['user_info']['email'] if email.blank?
    self.name = omniauth['user_info']['name'] if name.blank?
    self.name = omniauth['user_info'][:name] if name.blank?
    self.nickname = omniauth['user_info']['nickname'] if nickname.blank?
    self.nickname = name.gsub(' ','').downcase if nickname.blank?

    unless omniauth['credentials'].blank?
      user_tokens.build(:provider => omniauth['provider'], 
                        :uid => omniauth['uid'],
                        :token => omniauth['credentials']['token'], 
                        :secret => omniauth['credentials']['secret'])
    else
      user_tokens.build(:provider => omniauth['provider'], :uid => omniauth['uid'])
    end
  end

I'm not hugely knowledgeable about UTF encoding, so I'm not sure where I should be specifying the encoding? But I'm guessing it's here before it get's put into the user model and created, I'm unsure what to do about it?

UPDATE:

Rails 3.0.10 Omniauth 0.2.6 Ruby 1.9.2 PG 0.11.0

Default encoding is UTF-8

That didn't seem to be it, so I dug further and found this in the view:

Showing /Users/holden/Code/someapp/app/views/users/registrations/_signup.html.erb where line #5 raised:

incompatible character encodings: ASCII-8BIT and UTF-8
Extracted source (around line #5):

2:   <%= f.error_messages %>
3: 
4:   <%= f.input :name, :hint => 'your real name' %>
5:   <%= f.input :nickname, :hint => 'Username of your choosing' %>
6: 
7:   <% unless @user.errors[:email].present? or @user.email %>
8:     <%= f.input :email, :as => :hidden %>

UPDATE UPDATE:

It seems to be the omniauth gem which is returns the ASCII-8BIT chars, so my next question is how can I parse the hash and convert it back into UTF8 so my app doesn't explode?

session[:omniauth] = omniauth.to_utf8

Another part to this crazy ride is when I type this into the console

d={"user_info"=>{"email"=>"someemail@gmail.com", "first_name"=>"Joe", "last_name"=>"Mc\xC3\x99isnean", "name"=>"Joe Mc\xC3\x99isnean"}}

It automatically converts it to UTF-8, but it explodes when shoved into a session

 => {"user_info"=>{"email"=>"someemail@gmail.com", "first_name"=>"Joe", "last_name"=>"McÙisnean", "name"=>"Joe McÙisnean"}} 

This is a painful nightmare if there ever was one.

Omniauth proved to be the problem producing the ASCII-8BIT

I ended up forcing the Omniauth hash into submission using:

omniauth_controller.rb

session[:omniauth] = omniauth.to_utf8

added recursive method to force convert the rogue ASCII-8BIT to UTF8

some_initializer.rb

class Hash
  def to_utf8
    Hash[
      self.collect do |k, v|
        if (v.respond_to?(:to_utf8))
          [ k, v.to_utf8 ]
        elsif (v.respond_to?(:encoding))
          [ k, v.dup.force_encode('UTF-8') ]
        else
          [ k, v ]
        end
      end
    ]
  end
end

Special thanks to tadman

recursively convert hash containing non-UTF chars to UTF

How does foo(&nil) behave differently than foo(&"not a proc")?

6 votes

I found out from heckle that

[1, 2, 3].each(&nil)

doesn't cause any errors - it just returns an enumerator.

By contrast,

[1, 2, 3].each(&"")

raises

TypeError: wrong argument type String (expected Proc)

Also, &nil causes block_given? to return false

def block_given_tester
  if block_given?
    puts "Block given"
  else
    puts "Block not given"
  end
end

block_given_tester(&nil) # => Block not given

It's not because NilClass implements to_proc - I checked the RDoc.

I can understand why it'd be nice to have &nil, but I'm not sure how it's done. Is this just one of the ways nil has special behavior not shared by other objects?

The answer can be found by looking at Ruby's source code.

Ruby 1.8:

Look at the function block_pass in the file eval.c. Note that it treats nil specially from Proc objects (the macro NIL_P). If the function is passed a nil value, it evaluates an empty block (I think) and returns. The code right after it checks whether the object is a Proc object (the function rb_obj_is_proc) and raises the exception "wrong argument type (expected Proc") if it isn't.

Ruby 1.9.2:

Look at the method caller_setup_args in the file vm_insnhelper.c. It converts the proc with to_proc only if it is not nil; otherwise, the type conversion and type check are bypassed.

Multiple models in the same form in Rails 3.1?

5 votes

I am using Rails 3.1 and am working on a discussion forum. I have a model called Topic, each of which has many Posts. When the user makes a new topic, they should also make the first Post as well. However, I am not sure how I can do this in the same form. Here is my code:

<%= form_for @topic do |f| %>
<p>
    <%= f.label :title, "Title" %><br />
    <%= f.text_field :title %>
</p>

<%= f.fields_for :post do |ff| %>
    <p>
        <%= ff.label :body, "Body" %><br />
        <%= ff.text_area :body %>
    </p>
<% end %>

<p>
    <%= f.submit "Create Topic" %>
</p>
<% end %>

class Topic < ActiveRecord::Base
  has_many :posts, :dependent => :destroy
  accepts_nested_attributes_for :posts
  validates_presence_of :title
end


class Post < ActiveRecord::Base
  belongs_to :topic
  validates_presence_of :body
end

... but this doesn't seem to be working. Any ideas?

Thanks!

@Pablo's answer seems to have everything you need. But to be more specific...

First change this line in your view from

<%= f.fields_for :post do |ff| %>

to this

<%= f.fields_for :posts do |ff| %>  # :posts instead of :post

Then in your Topic controller add this

def new
  @topic = Topic.new
  @topic.posts.build
end

That should get you going.

Algorithm for nearest point

5 votes

I've got a list of ~5000 points (specified as longitude/latitude pairs), and I want to find the nearest 5 of these to another point, specified by the user.

Can anyone suggest an efficient algorithm for working this out? I'm implementing this in Ruby, so if there's a suitable library then that would be good to know, but I'm still interested in the algorithm!

UPDATE: A couple of people have asked for more specific details on the problem. So here goes:

  • The 5000 points are mostly within the same city. There might be a few outside it, but it's safe to assume that 99% of them lie within a 75km radius, and that all of them lie within a 200km radius.
  • The list of points changes rarely. For the sake of argument, let's say it gets updated once per day, and we have to deal with a few thousand requests in that time.

You can get a very fast upper-bound estimator on distance using Manhattan distance (scaled for latitude), this should be good enough for rejecting 99.9% of candidates if they're not close (EDIT: since then you tell us they are close. In that case, your metric should be distance-squared, as per Lars H comment). Consider this equivalent to rejecting anything outside a spherical-rectangle bounding-box (as an approximation to a circle bounding-box). I don't do Ruby so here is algorithm with pseudocode:

Let the latitude, longitude of your reference point P (pa,po) and the other point X (xa,xo). Precompute ka, the latitude scaling factor for longitudinal distances: ka (= cos(pa in°)). (Strictly, ka = constant is a linearized approximation in the vicinity of P.)

Then the distance estimator is: D(X,P) = ka*|xa-pa| + |xo-po| = ka*da + do

where |z| means abs(z). At worst this overestimates true distance by a factor of √2 (when da==do), hence we allow for that as follows:

Do a running search and keep Dmin, the fifth-smallest scaled-Manhattan-distance-estimate. Hence you can reject upfront all points for which D(X,P) > √2 * Dmin (since they must be at least farther away than √((ka*da)² + do²) - that should eliminate 99.9% of points). Keep a list of all remaining candidate points with D(X,P) <= √2 * Dmin. Update Dmin if you found a new fifth-smallest D. Priority-queue, or else a list of (coord,D) are good data structures. Note that we never computed Euclidean distance, we only used float multiplication and addition.

(Consider this similar to quadtree except filtering out everything except the region that interests us, hence no need to compute accurate distances upfront or build the data structure.)

It would help if you tell us the expected spread in latitudes, longitudes (degrees, minutes or what? If all the points are close, the √2 factor in this estimator will be too conservative and mark every point as a candidate; a lookup-table based distance estimator would be preferable.)

Pseudocode:

initialize Dmin with the fifth-smallest D from the first five points in list
for point X in list:
    if D(X,P) <= √2 * Dmin:
        insert the tuple (X,D) in the priority-queue of candidates
        if (Dmin>D): Dmin = D
# after first pass, reject candidates with D > √2 * Dmin (use the final value of Dmin)
# ...
# then a second pass on candidates to find lowest 5 exact distances

How to set command line value via Ruby to see status via PS?

5 votes

I'd like to provide feedback for my pinger program via the command line and view it using ps ax.

I found a SO q. But

....
ARGV[0] = "Hello!" # does nothing

I'm starting the script via ruby ./pinger

Assign to $0 instead. For example, if I start irb and

$ ps | egrep 'irb|pancakes'
 3119 ttys000    0:01.02 irb 
 3131 ttys001    0:00.00 egrep irb|pancakes

and then over in irb:

>> $0 = 'pancakes'

and back to the other terminal:

$ ps | egrep 'irb|pancakes'
 3119 ttys000    0:01.07 pancakes 
 3135 ttys001    0:00.00 egrep irb|pancakes

You can check with this tiny script as well:

#!/usr/bin/env ruby
$0 = 'pancakes'
sleep 10

Run that, jump to another terminal, do a ps | grep pancakes, and you should see a pancakes process.

What does a single splat/asterisk in a Ruby argument list mean?

5 votes

I was poking through the Rails 3 ActiveRecord source code today and found a method where the entire parameter list was a single asterisk.

def save(*)

I couldn't find a good description of what this does (though I have some ideas based on what I know about splat arguments).

What does it do, and why would you use it?

It means it can have any number of arguments (including zero) and it discards all those arguments.

How can Class be of the Class class and not have Class instance methods?

5 votes

I was studying how the Ruby interpreter is implemented, and one question occurred that didn't get an answer yet for me. That's the one in the title: since Class (r_cClass) has super set to itself (ignoring metaclasses, since actually super is the metaclass of r_cClass), if I send one method to the Class object, this will be looked in the method table of Class' class. But Class' class is Class, so shouldn't I end up looking the instance methods of Class? But that's not the case since in the documentation Class class methods and Class instance methods are separated. In the search_method in eval.c of Ruby, I didn't find any special check for the Class class. Can anyone shed some light on this?

Your beliefs about the way it should work seem right, but I'm not sure why you think it doesn't work that way.

In Ruby 1.8.7:

irb> a = Class.new.methods - Object.new.methods
=> [... 36 element array ...]
irb> b = Class.methods - Object.new.methods
=> [... 37 element array ...]
irb> b - a
=> ["nesting"]

A normal class instance (Class.new) has 36 instance methods. If I look at Class itself, which is also a normal class instance, it has the same 36 instance methods, plus 1 additional class method (nesting), which exists only because it is inherited from its superclass Module.

Note that adding an instance method to Class automatically adds it as a class method as well, but adding a class to Class's metaclass will not.

irb> class Class ; def everywhere ; true ; end ; end
irb> class << Class ; def only_singleton ; true ; end ; end
irb> Class.everywhere
=> true
irb> Class.new.everywhere
=> true
irb> Class.only_singleton
=> true
irb> Class.new.only_singleton
NoMethodError: undefined method 'only_in_singleton' for #<Class:0x4800ac8>

Ruby 1.8: Can I dynamically define a method that takes a block?

5 votes

I know that I can dynamically define methods on a class using define_method, and that I specify the parameters this method takes using the arity of the block.

I want to dynamically define a method that accepts both optional parameters and a block. Unfortunately, Ruby 1.8 doesn't allow passing a block to a block, so this won't work:

class X
  define_method :foo do |bar, &baz|
    puts bar
    baz.call if block_given?
  end
end

x = X.new
x.foo("foo") { puts "called!"} #=> LocalJumpError: no block given

Replacing the explicit block.call with yield doesn't fix the problem either.

I know that this is allowed in Ruby 1.9, but upgrading is not an option for me. Is this an intractable problem, or is there a way around it?

What you could do is use class_eval with a string instead of define_method. The downside to this (apart from not being as elegant) is that you lose lexical scoping. But this is often not needed.

Edit: By the way, good question. +1

How can I index duplicate items in an array?

Asked on Fri, 30 Sep 2011 by Luke ruby
5 votes

Starting with the following array (of hashes):

[
  {:name=>"site a", :url=>"http://example.org/site/1/"}, 
  {:name=>"site b", :url=>"http://example.org/site/2/"}, 
  {:name=>"site c", :url=>"http://example.org/site/3/"}, 
  {:name=>"site d", :url=>"http://example.org/site/1/"}, 
  {:name=>"site e", :url=>"http://example.org/site/2/"}, 
  {:name=>"site f", :url=>"http://example.org/site/6/"},
  {:name=>"site g", :url=>"http://example.org/site/1/"}
]

How can I add an index of the duplicate urls like so:

[
  {:name=>"site a", :url=>"http://example.org/site/1/", :index => 1}, 
  {:name=>"site b", :url=>"http://example.org/site/2/", :index => 1}, 
  {:name=>"site c", :url=>"http://example.org/site/3/", :index => 1}, 
  {:name=>"site d", :url=>"http://example.org/site/1/", :index => 2}, 
  {:name=>"site e", :url=>"http://example.org/site/2/", :index => 2}, 
  {:name=>"site f", :url=>"http://example.org/site/6/", :index => 1},
  {:name=>"site g", :url=>"http://example.org/site/1/", :index => 3}
]

I would use a hash for keeping track of the indices. Scanning the previous entries again and again seems inefficient

counts = Hash.new(0)
array.each { | hash | 
  hash[:index] = counts[hash[:url]] = counts[hash[:url]] + 1
}

or a bit cleaner

array.each_with_object(Hash.new(0)) { | hash, counts | 
  hash[:index] = counts[hash[:url]] = counts[hash[:url]] + 1
}