Best ruby questions in June 2011

Why are Ruby method calls particularly slow (in comparison to other languages)?

19 votes

I'm trying to read about Ruby performance, and came across this SO thread, where one of the answers mentions that "method calls, one of the most common operations in Ruby, are particularly slow."

Another thread mentions that "It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember names per context to allow for eval, so its frames and method calls are slower."

Can someone explain in more detail why Ruby method calls are particularly slow, and elaborate on the second thread? I'm not totally sure what late lookup is or why it's slow, and I don't know what names per context means or how it relates to frames and method calls.

My (possibly incorrect) understanding is that since methods can be added or modified at runtime, the Ruby interpreter can never "remember" how to run a particular method, so it has to lookup the method every time while the program is running, and this is what is meant by method calls being slow. But corrections and more technical explanations would be great.

Compiled languages often have fast method dispatch because the calling code knows an index into the class' vtable, which is an array of method pointers. After just a few pointer dereferences, the calling code can jump right into the method. The compiler create the vtable, and replaces every method name in the source code with the numerical index of the method in the vtable.

Dynamic languages such as Ruby often have slow method dispatch because the calling code has a name for the method, not a pointer (nor an index into an array containing the pointers). The calling code has to ask the object for its class, then has to ask the class if it has a method by that name, and if not, go on up the chain of ancestors asking each ancestor if it has a method by that name (this is what the compiler does in a compiled language, which is why the compiling is slow and the method dispatch is fast). Rather than a few pointer dereferences costing just a few machine instructions to invoke a method, a dynamic language must execute dozens to hundreds of machine instructions to search the object's class and all the object's ancestor classes for the method. Each class has a HashTable of names -> methods, but HashTables with string keys are an order of magnitude slower than arrays with integer indexes.

There are ways to optimize method dispatch in dynamic langauges, of course. In Ruby, that's what JRuby, Rubinius, and IronRuby are working on. But that's a subject for another question.

Bundle / Rake error

13 votes

This is kinda weird - i do not think i changed any code in this particular application (but i was working on another app and performing some gem updates there)

When i started running the rake cron on this application, i got this error message. Production is fine.

You have already activated rake 0.9.2, but your Gemfile requires rake 0.8.7. Consider using bundle exec.

It works when I do this, but i need to fix this properly.

bundle exec rake cron

I saw a discussion here, and i figure it has to do with bundle using the wrong version. What is the best way to fix this?

http://community.engineyard.com/discussions/problems/1391-you-have-already-activated-rake-083-but-your-gemfile-requires-rake-087-consider-using-bundle-exec

Using bundle exec rake cron is the right way to do this.

Basically what's happening is that you've updated rake to 0.9.2 which now conflicts with the version specified in your Gemfile. Previously the latest version of rake you had matched the version in your Gemfile, so you didn't get any warning when simply using rake cron.

Yehuda Katz (one of the original Bundler developers) explains it all in this blog post: http://yehudakatz.com/2011/05/30/gem-versioning-and-bundler-doing-it-right/.

uninitialized constant Rake::DSL in Ruby Gem

10 votes

I have been working on updating my gem (whm_xml at https://github.com/ivanoats/whm_xml_api_ruby ) to make it work with ruby 1.9.2, latest rubygems, latest bundler, latest rdoc, latest rake. It works fine in 1.8.7 but has the "uninitialized constant Rake::DSL" error only in 1.9.2 . I thought that rake 0.9.2 fixed that but maybe not? I have read a lot on StackOverflow but am still stuck. Any ideas on where to look?

ivan:~/Development/ruby/whm_xml_api_ruby [git:master+]  → bundle exec rake -T
(in /Users/ivan/Development/ruby/whm_xml_api_ruby)
rake aborted!
uninitialized constant Rake::DSL
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2482:in `const_missing'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rake-0.9.2/lib/rake/tasklib.rb:8:in `<class:TaskLib>'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rake-0.9.2/lib/rake/tasklib.rb:6:in `<module:Rake>'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rake-0.9.2/lib/rake/tasklib.rb:3:in `<top (required)>'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rdoc-3.6.1/lib/rdoc/task.rb:37:in `require'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rdoc-3.6.1/lib/rdoc/task.rb:37:in `<top (required)>'
/Users/ivan/Development/ruby/whm_xml_api_ruby/Rakefile:3:in `require'
/Users/ivan/Development/ruby/whm_xml_api_ruby/Rakefile:3:in `<top (required)>'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2373:in `load'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2373:in `raw_load_rakefile'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2007:in `block in load_rakefile'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2058:in    `standard_exception_handling'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:2006:in `load_rakefile'
/Users/ivan/.rvm/rubies/ruby-1.9.2-p180/lib/ruby/1.9.1/rake.rb:1991:in `run'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/gems/rake-0.9.2/bin/rake:32:in `<top (required)>'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/bin/rake:19:in `load'
/Users/ivan/.rvm/gems/ruby-1.9.2-p180/bin/rake:19:in `<main>'

This SO Question might help you out. The suggestion there is to add require 'rake/dsl_definition' above require 'rake' in your Rakefile.

Is there something similar to Nokogiri for parsing Ruby code?

8 votes

Nokogiri is awesome. I can do things like #css('.bla') which will return the first matching element.

Right now we need to do some parsing of Ruby source code - finding all methods within a class etc. We're using the ruby_parser gem, but all it does is comb your source code and spit out S-expressions. Is there anything like Nokogiri for these S-expressions which can do things like "return S-expression for first method found named 'foo'"?

The only thing I can think of, is Adam Sanderson's SExpPath library.

Ruby Exceptions -- Why "else"?

7 votes

I'm trying to understand exceptions in Ruby but I'm a little confused. The tutorial I'm using says that if an exception occurs that does not match any of the exceptions identified by the rescue statements, you can use an "else" to catch it:

begin  
# -  
rescue OneTypeOfException  
# -  
rescue AnotherTypeOfException  
# -  
else  
# Other exceptions
ensure
# Always will be executed
end

However, I also saw later in the tutorial "rescue" being used without an exception specified:

begin
    file = open("/unexistant_file")
    if file
         puts "File opened successfully"
    end
rescue
    file = STDIN
end
print file, "==", STDIN, "\n"

If you can do this, then do I ever need to use else? Or can I just use a generic rescue at the end like this?

begin  
# -  
rescue OneTypeOfException  
# -  
rescue AnotherTypeOfException  
# -  
rescue
# Other exceptions
ensure
# Always will be executed
end

The else is for when the block completes without an exception thrown. The ensure is run whether the block completes successfully or not. Example:

begin
  puts "Hello, world!"
rescue
  puts "resue"
else
  puts "else"
ensure
  puts "ensure"
end

This will print Hello, world!, then else, then ensure.

Ruby "return unless nil" idiom

7 votes

I've got a smelly method like:

def search_record(*args)    
  record = expensive_operation_1(foo)
  return record unless record.nil?

  record = expensive_operation_2(foo, bar)
  return record unless record.nil?

  record = expensive_operation_3(baz)
  return record unless record.nil?

  record = expensive_operation_4(foo, baz)
  return record unless record.nil?
end

Is there a good ruby idiom for "return result of call unless nil"?

Or should I just write a return_unless_nil(&blk) method?

(Note that args are different for each call, so I can't simply just iterate over them)

Do you care about the difference between nil and false here? If you only care whether the return value of each method is "falsy," then this is a pretty Rubyish way of doing it:

def search_record(*args)    
  expensive_operation_1(foo)      ||
  expensive_operation_2(foo, bar) ||
  expensive_operation_3(baz)      ||
  expensive_operation_4(foo, baz)
end

If you're unfamiliar with this idiom, it can be explained thusly: Ruby, like most languages, "short circuits" OR comparisons, meaning that if the first operand evaluates to "truey" it won't bother to evaluate the second operand (i.e. if expensive_operation_1 returns something other than nil or false, it won't ever call expensive_operation_2), because it already knows that the result of the boolean operation is true.

Another useful thing that Ruby does is, instead of returning true or false from boolean operations it just returns the last operand it evaluates. So in this case if expensive_operation_1 returns nil, it will then call expensive_operation_2, and if that returns a value (that isn't falsy), the whole expression will just evaluate to that value.

Finally, we can chain these booleans so, in effect, it will return the result of the first operand that isn't falsy and never evaluate the subsequent operands. If all of the operands evaluate to falsy, it will return the final operand (which we know is falsy and, in your case, probably nil).

Is there a straightforward catchall way to log the methods that are being called on an object in Ruby?

7 votes

Is there a quick way to track the methods that are being called on an object? Often, when I'm working with a gem at a level just below their public interface, I run into errors that are hard to track down. Ultimately, I end up tracking the object through the source code and keeping everything in my head.

But it would be nice to be able to call something like a #log_method_calls on an object so that, say, all methods called on it get printed to stdout or something. Is there any way to accomplish this?

There are several methods to do it, depending on the situation.

If it' possible to create a new object instead of the observed, you can easily write an observer class using method_missing.

class LogProxy  
  def initialize obj
    @obj = obj
  end

  def method_missing(name, *args)
    puts "#{name} => #{args.to_s}"  
    @desk.send(name, *args)
  end
end

If it's not possible, you still may use alias_method. It's a bit more tricky, but using Module.instance_methods you can chain every method of anything.

Something like:

module Logger

  def self.included(mod)
    mod.instance_methods.each do |m|
      next if m =~ /with_logging/
      next if m =~ /without_logging/

      mod.class_eval do

        define_method "#{m}_with_logging" do |*args|
          puts "#{m} called #{args.to_s}"
          self.send_without_logging "#{m}_without_logging", *args
        end

        alias_method "#{m}_without_logging", m
        alias_method m, "#{m}_with_logging"
      end

    end
  end

end

TargetClass.send(:include, Logger)

Ruby: is it acceptable to put more than one class in a file?

7 votes

This might be a bit of an esoteric question, but I just want to know what best practices are on this issue.

Yes, it is generally acceptable because it doesn't violate any principles of the Ruby language itself but it ultimately depends on the practices of your target audience or framework. (For example, Rails likes your classes to be one-per-file.)

However, if you are grouping classes with related functionality into a single file then you should also consider making them part of the same module for a namespace.

Does Ruby's Enumerable#zip create arrays internally?

7 votes

In Ruby - Compare two Enumerators elegantly, it was said

The problem with zip is that it creates arrays internally, no matter what Enumerable you pass. There's another problem with length of input params

I had a look at the implementation of Enumerable#zip in YARV, and saw

static VALUE
enum_zip(int argc, VALUE *argv, VALUE obj)
{
    int i;
    ID conv;
    NODE *memo;
    VALUE result = Qnil;
    VALUE args = rb_ary_new4(argc, argv);
    int allary = TRUE;

    argv = RARRAY_PTR(args);
    for (i=0; i<argc; i++) {
        VALUE ary = rb_check_array_type(argv[i]);
        if (NIL_P(ary)) {
            allary = FALSE;
            break;
        }
        argv[i] = ary;
    }
    if (!allary) {
        CONST_ID(conv, "to_enum");
        for (i=0; i<argc; i++) {
            argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
        }
    }
    if (!rb_block_given_p()) {
        result = rb_ary_new();
    }
    /* use NODE_DOT2 as memo(v, v, -) */
    memo = rb_node_newnode(NODE_DOT2, result, args, 0);
    rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);

    return result;
}

Am I understanding the following bits correctly?

Check whether all of the arguments are arrays, and if so, replace some indirect reference to the array with a direct reference

    for (i=0; i<argc; i++) {
        VALUE ary = rb_check_array_type(argv[i]);
        if (NIL_P(ary)) {
            allary = FALSE;
            break;
        }
        argv[i] = ary;
    }

If they aren't all arrays, create an enumerator instead

    if (!allary) {
        CONST_ID(conv, "to_enum");
        for (i=0; i<argc; i++) {
            argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));
        }
    }

Create an array of arrays only if a block isn't given

    if (!rb_block_given_p()) {
        result = rb_ary_new();
    }

If everything is an array, use zip_ary, otherwise use zip_i, and call a block on each set of values

    /* use NODE_DOT2 as memo(v, v, -) */
    memo = rb_node_newnode(NODE_DOT2, result, args, 0);
    rb_block_call(obj, id_each, 0, 0, allary ? zip_ary : zip_i, (VALUE)memo);

Return an array of arrays if no block is given, else return nil (Qnil)?

    return result;
}

I'll be using 1.9.2-p0 as that's what I have on hand.

The rb_check_array_type function looks like this:

VALUE
rb_check_array_type(VALUE ary)
{
    return rb_check_convert_type(ary, T_ARRAY, "Array", "to_ary");  
}

And rb_check_convert_type looks like this:

VALUE
rb_check_convert_type(VALUE val, int type, const char *tname, const char *method)
{
    VALUE v;

    /* always convert T_DATA */
    if (TYPE(val) == type && type != T_DATA) return val;
    v = convert_type(val, tname, method, FALSE);
    if (NIL_P(v)) return Qnil;
    if (TYPE(v) != type) {
        const char *cname = rb_obj_classname(val);
        rb_raise(rb_eTypeError, "can't convert %s to %s (%s#%s gives %s)",
                 cname, tname, cname, method, rb_obj_classname(v));
    }
    return v;
}

Note the convert_type call. This looks a lot like C version of Array.try_convert and try_convert just happens to look like this:

/*   
 *  call-seq:
 *     Array.try_convert(obj) -> array or nil
 *
 *  Try to convert <i>obj</i> into an array, using +to_ary+ method. 
 *  Returns converted array or +nil+ if <i>obj</i> cannot be converted
 *  for any reason. This method can be used to check if an argument is an
 *  array.
 *   
 *     Array.try_convert([1])   #=> [1]
 *     Array.try_convert("1")   #=> nil
 *
 *     if tmp = Array.try_convert(arg)
 *       # the argument is an array
 *     elsif tmp = String.try_convert(arg)
 *       # the argument is a string
 *     end
 *
 */
static VALUE
rb_ary_s_try_convert(VALUE dummy, VALUE ary)
{
    return rb_check_array_type(ary);
}

So, yes, the first loop is looking for anything in argv that is not an array and setting the allary flag if it finds such a thing.

In enum.c, we see this:

id_each = rb_intern("each");

So id_each is an internal reference for the Ruby each iterator method. And in vm_eval.c, we have this:

/*!  
 * Calls a method 
 * \param recv   receiver of the method
 * \param mid    an ID that represents the name of the method
 * \param n      the number of arguments
 * \param ...    arbitrary number of method arguments  
 *
 * \pre each of arguments after \a n must be a VALUE.
 */
VALUE
rb_funcall(VALUE recv, ID mid, int n, ...)

So this:

argv[i] = rb_funcall(argv[i], conv, 1, ID2SYM(id_each));

Is calling to_enum (with, essentially, the default argument) on whatever is in argv[i].

So, the end result of the first for and if blocks is that argv is either full of arrays or full of enumerators rather than possibly being a mix of the two. But note how the logic works: if something is found that isn't an array, then everything becomes an enumerator. The first part of the enum_zip function will wrap arrays in enumerators (which is essentially free or at least cheap enough not to worry about) but won't expand enumerators into arrays (which could be quite expensive). Earlier versions might have gone the other way (prefer arrays over enumerators), I'll leave that as an exercise for the reader or historians.

The next part:

if (!rb_block_given_p()) {
    result = rb_ary_new();
}

Creates a new empty array and leaves it in result if zip is being called without a block. And here we should note what zip returns:

enum.zip(arg, ...) → an_array_of_array
enum.zip(arg, ...) {|arr| block } → nil

If there is a block, then there is nothing to return and result can stay as Qnil; if there isn't a block, then we need an array in result so that an array can be returned.

From parse.c, we see that NODE_DOT2 is a double-dot range but it looks like they're just using the new node as a simple three element struct; rb_new_node just allocates an object, sets some bits, and assigns three values in a struct:

NODE*
rb_node_newnode(enum node_type type, VALUE a0, VALUE a1, VALUE a2)
{
    NODE *n = (NODE*)rb_newobj();

    n->flags |= T_NODE;
    nd_set_type(n, type);

    n->u1.value = a0;
    n->u2.value = a1;
    n->u3.value = a2;

    return n;
}

nd_set_type is just a bit fiddling macro. Now we have memo as just a three element struct. This use of NODE_DOT2 appears to be a convenient kludge.

The rb_block_call function appears to be the core internal iterator. And we see our friend id_each again so we'll be doing an each iteration. Then we see a choice between zip_i and zip_ary; this is where the inner arrays are created and pushed onto result. The only difference between zip_i and zip_ary appears to be the StopIteration exception handling in zip_i.

At this point we've done the zipping and we either have the array of arrays in result (if there was no block) or we have Qnil in result (if there was a block).


Executive Summary: The first loop explicitly avoids expanding enumerators into arrays. The zip_i and zip_ary calls will only work with non-temporary arrays if they have to build an array of arrays as a return value. So, if you call zip with at least one non-array enumerator and use the block form, then it is enumerators all the way down and the "problem with zip is that it creates arrays internally" does not happen. Reviewing 1.8 or other Ruby implementations is left as an exercise for the reader.

Is there any practical difference between Ruby pre-1.9 and Ruby 1.9 threads?

6 votes

I'm trying to understand the difference between Ruby threads pre-1.9 and 1.9 (in the standard MRI implementation), but it seems that in terms of the benefits you can achieve with them, they're practically the same. Is this correct?

From my limited understanding:

  • Pre-1.9 threads are "green threads", which means that they're managed by the Ruby interpreter, not the OS. One consequence of this is that you never achieve true concurrency, since you never have multiple threads running at the same time (even if you're on a multicore/multiprocessor system). (However, you can get the appearance of concurrency, if execution switches between different threads, e.g., if some program runs while another is waiting on I/O.)
  • 1.9 threads are native threads, which means that they are indeed managed by the OS. If there were no global interpreter lock, this would allow Ruby to run multiple threads at the same time (on a multicore/multiprocessor system). But Ruby does have a global interpreter lock, which means that only one thread can ever be running, so again you don't get true concurrency. (But you can still get the appearance of concurrency if execution switches between different threads.)

Is this correct, or am I missing something? What are the benefits of 1.9 threads vs. pre-1.9 threads (in MRI)?

I feel kind of silly offering this as an answer, but your description matches my understanding of the situation perfectly.

If we are right, I should add that it does make sense to evolve the language this way.

Keep in mind that a main point of functional programming, the Actor Model, and other shared-memory-alternative parallel models is to fix the extreme difficulty of developing a parallel shared-memory application. ("Threads considered harmful.")

So it would have been expecting way too much for Ruby to go from nothing-parallel to everything-parallel.

The current approach seems to be to set up the mechanism but to keep the giant lock. I presume that in the future, individually debugged and tested functional areas will be allowed to execute in parallel as they receive fine-grain locks and concurrency testing.

Is there a ruby equivalent to the Scala Option?

6 votes

How do I model an optional value in ruby? Scala has Option[], which is what I'm looking for in ruby.

There's no equivalent in the standard library. You have to define your own. See this article.

Performance differences between '.find' and '.where' methods

5 votes

I am using Ruby on Rails 3.0.7 and I would like to know, regarding performance matters, what are differences between the User.find(<id>) method and the User.where(:id => <id>) method.

Under the hood, find does more or less what you're describing with your where. You can find the details in this post. That being said, if you're looking to grab a single record by id, then you might want to use find_one. That's what find winds up doing when you call it with a single argument of an id, but you'll skip past all the other code it needs to run to figure out that's what you wanted.

I'm using Rails3 with tinymce. How to present user close browser javascript then input xss?

5 votes

I have a site written by Rails3. My post model has a text column naming "content". In the post panel, html form sets up "content" column to textarea field with tinymce. In front page, because of using tinymce, the post.html.erb code needs to implement with raw method like <%= raw @post.content %>.

Okay, now if I close browser javascript, this textarea can type without tinymce, and maybe user will input any xss like <script>alert('xss');</script>. My front will show that alert box.

I try to sanitize(@post.content) in posts_controller, but sanitize method will filter tinymce style with each other. For example, <span style='color:red;'>foo</span> will become <span>foo</span>.

My question is: How to filter xss input and reserve tinymce style at the same time?

The sanitizer can be set to allow the style attribute. In your config/application.rb add:

config.action_view.sanitized_allowed_attributes = ['style']

The sanitize method also has defaults for which css properties and keywords it allows. See sanitizer.rb allowed_css_properties and allowed_css_keywords to get a list of the defaults.

To add some that aren't currently allowed add this to your config/application.rb:

config.action_view.sanitized_allowed_css_keywords = ['puke']

--

If you're doing anything more complicated than this then you'll need to write some code. I don't recommend doing this from scratch, check out the Loofah Gem for a good library for writing html scrubbers.

Differences between *, self.* and @* when referencing associations/attributes in Ruby/Rails Models/Controllers

5 votes

Assuming a Rails Model with persistent / non-persistent attributes, what is the best practice regarding referencing them? If you look at code publicly available, different patterns are used.

For instance, if you have an association from one model to another. What is the difference between using self.association_name and @association_name?. What is the preferable way?

Same as with non-persistent attributes defined with attr_accessor :attr in Models. You can reference them with both approaches, self.attr and @attr. What is the preferable way?

self.x/self.x=y are always method calls.

(self.x is just sugar for self.__send__(:x) and self.x = y is really just sugar for self.__send__(:x=, y))

@x, on the other hand, only refers to an instance variable.

Using @x will not work with AR associations as AR only defines x/x= (which are methods) for its magical operation. (AR essentially just "captures" intent access through these methods and routes through its own internal data structures which are unrelated to any similar-named instance variables.)

attr_accessor allows "accessing both ways" because and only because it uses the same-named instance variable as it's backing (it has to store the value somewhere). Consider that attr_accessor :x is equivalent to:

def x; @x; end
def x= (y); @x = y; end

Happy coding.