Best ruby questions in April 2012

ActiveAdmin Error: no superclass method `buttons'

15 votes

I'm starting with Rails (and I'm also new with Ruby -coming from Python-) and I'm currentrly trying to setup ActiveAdmin for Rails 3.2.3 (Ruby 1.9.3). I'm following this guide but I was not able to run it properly. When I run the rails s command visiting localhost:3000/admin I get

NoMethodError in Active_admin/devise/sessions#new

Showing /home/lex/.rvm/gems/ruby-1.9.3-p125/gems/activeadmin-0.4.3/app/views/active_admin/devise/sessions/new.html.erb where line #11 raised:

super: no superclass method `buttons' for #<ActiveAdmin::FormBuilder:0xb429ae0>

I could not find anything useful on Google, what's wrong here?

If you need more info about this exception please tell me.

Extracted source (around line #11):

8:       f.input :password
9:       f.input :remember_me, :as => :boolean, :if =>  false  #devise_mapping.rememberable? }
10:     end
11:     f.buttons do
12:       f.commit_button "Login"
13:     end
14:   end

seems like formtastic 2.2.0 (released today) breaks active_admin and since active_admin requires formtastic >= 2.0.0... put in your Gemfile this

gem "formtastic", "~> 2.1.1"
gem "activeadmin", "~> 0.4.3"

then run

bundle update formtastic

then restart your server (if you have it running)..

and should work ok...

Why is it bad style to `rescue Exception => e` in Ruby?

14 votes

Ryan Davis's Ruby QuickRef says (without explanation):

don’t rescue Exception. EVER. or I will stab you.

Why not? What's the right thing to do?

Exception is the root of Ruby's exception hierarchy, so when you rescue Exception you rescue from everything, including subclasses such as SyntaxError, LoadError, and Interrupt.

Rescuing Interrupt prevents the user from using CTRL+C to exit the program.

Rescuing SignalException prevents the program from responding correctly to signals. It will be unkillable except by kill -9.

Rescuing SyntaxError means something like this will run just fine:

begin
  eval "djsakru3924r9eiuorwju3498 += 5u84fior8u8t4ruyf8ihiure"
rescue Exception
  puts "Move along, nothing to see here"
end

(Note that because of how IRb works, it does fail there, but in a standalone script it will "move along").

Rescuing from Exception isn't even the default. Doing

begin
  # iceberg!
rescue
  # lifeboats
end

does not rescue from Exception, it rescues from StandardError. You should generally specify something more specific than the default StandardError, but rescuing from Exception broadens the scope rather than narrowing it, and can have catastrophic results and make bug-hunting extremely difficult.

If you have a situation where you do want to rescue StandardError and you need a variable with the exception, you can use this form:

begin
  # iceberg!
rescue => e
  # lifeboats
end

This is equivalent to:

begin
  # iceberg!
rescue StandardError => e
  # lifeboats
end

Is Hash Rocket deprecated?

10 votes

The well-cited RIP Hash rocket post would seem to imply the Hash Rocket syntax (:foo => "bar") is deprecated in favor of the new-to-Ruby JSON-style hash (foo: "bar"), but I can't find any definitive reference stating the Hash Rocket form is actually deprecated/unadvised as of Ruby 1.9.

The author of that blog post is being overly dramatic and foolish, the => is still quite necessary. In particular:

  1. You must use the rocket for symbols that require quoting: :'where.is' => x is valid but :'where.is': x is not.
  2. You must use the rocket for symbols that are not valid labels: :$set => x is valid but $set: x is not.
  3. You must use the rocket if you use keys in your Hashes that aren't symbols: 's' => x is valid but 's': x is not.

You can kludge around the above in the obvious manner of course:

h = { }
h[:'where.is'] = 'pancakes house?'
# etc.

but that's just ugly and unnecessary.

The rocket isn't going anywhere without crippling Ruby's Hashes.

Fast(er) method for wildcard searching of 250K+ strings

9 votes

I have an English dictionary in a MySQL database with just over 250K entries, and I'm using a simple ruby front-end to search it using wildcards at the beginning of the strings. So far I've been doing it like this:

SELECT * FROM words WHERE word LIKE '_e__o'

or even

SELECT * FROM words WHERE word LIKE '____s'

I always know the exact length of the word, but all but a single character are potentially unknown.

This is slower than molasses, about fifteen times slower than a similar query without the leading wildcard because the index for the column cannot be used.

I've tried a few methods to narrow the scope of the search. For example, I've added 26 additional columns containing each word's individual letter counts and narrow the search using those first. I've also tried narrowing by word length. These methods made almost no difference, thanks to the inherent inefficiency of leading-wildcard searches. I've experimented with the REGEXP statement, which is even slower.

SQLite and PostgreSQL are just as limited as MySQL, and though I have limited experience with NoSQL systems, my research gives me the impression that they excel at scalability, not performance of the kind I need.

My question then, is where should I look for a solution? Should I continue trying to find a way to optimize my queries or add supplementary columns that can narrow my potential recordset? Are there systems designed specifically to accomplish fast wildcard searching in this vein?

With PostgreSQL 9.1 and the pg_trgm extension you can create indexes that are usable for a like condition you are describing.

For an example see here: http://www.depesz.com/2011/02/19/waiting-for-9-1-faster-likeilike/

I verified it on a table with 300k rows using LIKE '____1' and it does use such an index. It took about 120ms to count the number of rows in that table (on an old laptop). Interesting enough the expression LIKE 'd___1' is not faster, it's about the same speed.

It also depends on the number of characters in the search term, the longe it gets, the slower it will be as far as I can tell.

You would need to check with your data if the performance is acceptable.

Install gem on demand

6 votes

I would like to install a gem (JSON) on the client side, but only if hasn't been installed already (some 1.9 Ruby distros have JSON bundled).

I couldn't find a clue on how to do that from gem help install. And running gem install json on a Windows system with Ruby 1.9 installed (with JSON bundled) results in

    ERROR:  Error installing json:
    The 'json' native gem requires installed build tools.

-- it tries to install it ignoring the fact that the gem is already there.

And I can't do bash tricks like grepping gem list output because the client might be Windows.

So what's the multiplatform way of installing a gem only if it's not present in the system already?

This may work...

begin
  require "json"
rescue LoadError
  system("gem install json")
end

If you don't want to require "json", you can remove it from $LOAD_PATH.

Or, put as a one liner:

ruby -e 'begin; require "some_gem"; rescue LoadError; system "gem install some_gem"; end'

Ruby on Rails plural (controller) and singular (model) convention - explanation

6 votes

As per Ruby on Rails convention, controller names get pluralized while model names are singular. Example : a Users controller, but a User model.

rails generate controller Users
rails generate model User name:string email:string

Now open migration file

 class CreateUsers < ActiveRecord::Migration
  def change
    create_table :users do |t|
      t.string :name
      t.string :email    
      t.timestamps
    end
  end
end

Here table name is plural (users).

So my question is - Why table name is plural (users) even though the model name is singular (User)?

Ruby on Rails follow linguistic convention. That means a model represents a single user, whereas a database table consists of many users.

Accessing a constant

6 votes

Why can't I access 'B' in the following from 'A' but can from the main environment?

module A; end
A.instance_eval{B=1}

B #=> 1
A::B #=> uninitialized

The idiomatic way to do this would be

 A.const_set(:B, 1)
 A::B #=> 1

As to why it doesn't work, in Ruby 1.8 and 1.9.2+ (it was different in 1.9.1), constant lookup is lexically scoped. I found a good blog post with an explanation. To quote:

Note that these rules apply to constant definition as well as lookup. In 1.8 and 1.9.2, a constant defined in a class_evaluated block will be defined in the enclosing lexical scope, rather than the scope of the receiver.

The same is also true for instance_eval.

Training Naive Bayes Classifier on ngrams

6 votes

I've been using the Ruby Classifier library to classify privacy policies. I've come to the conclusion that the simple bag-of-words approach built into this library is not enough. To increase my classification accuracy, I want to train the classifier on n-grams in addition to individual words.

I was wondering whether there's a library out there for preprocessing documents to get relevant n-grams (and properly deal with punctuation). One thought was that I could preprocess the documents and feed pseudo-ngrams into the Ruby Classifier like:

wordone_wordtwo_wordthree

Or maybe there's a better way to be doing this, such as a library that has ngram based Naive Bayes Classification built into it from the getgo. I'm open to using languages other than Ruby here if they get the job done (Python seems like a good candidate if need be).

If you're ok with python, I'd say nltk would be perfect for you.

For example:

>>> import nltk
>>> s = "This is some sample data.  Nltk will use the words in this string to make ngrams.  I hope that this is useful.".split()
>>> model = nltk.NgramModel(2, s)
>>> model._ngrams
set([('to', 'make'), ('sample', 'data.'), ('the', 'words'), ('will', 'use'), ('some', 'sample'), ('', 'This'), ('use', 'the'), ('make', 'ngrams.'), ('ngrams.', 'I'), ('hope', 'that'
), ('is', 'some'), ('is', 'useful.'), ('I', 'hope'), ('this', 'string'), ('Nltk', 'will'), ('words', 'in'), ('this', 'is'), ('data.', 'Nltk'), ('that', 'this'), ('string', 'to'), ('
in', 'this'), ('This', 'is')])

You even have a method nltk.NaiveBayesClassifier

Is it necessary to close StringIO in ruby?

6 votes

Do we need to close StringIO objects after usage in Ruby in order to free resources, like we do with the real IO objects?

obj = StringIO.new "some string"
#...
obj.close # <--- Do we need to close it?

Refining my question

Closing File object is necessary because it will close the file descriptor. The number of opened files is limited in the OS, and that's why it is necessary to close File. But, if I understand correctly, StringIO is an abstraction in memory. Do we need to close it?

  • StringIO#close does not free any resources or drop its reference to the accumulated string. Therefore calling it has no effect upon resource usage.

  • Only StringIO#finalize, called during garbage collection, frees the reference to the accumulated string so that it can be freed (provided the caller does not retain its own reference to it).

  • StringIO.open, which briefly creates a StringIO instances, does not keep a reference to that instance after it returns; therefore that StringIO's reference to the accumulated string can be freed (provided the caller does not retain its own reference to it).

  • In practical terms, there is seldom a need to worry about a memory leak when using StringIO. Just don't hang on to references to StringIO once you're done with them and all will be well.


Diving into the source

The only resource used by a StringIO instance is the string it is accumulating. You can see that in stringio.c (MRI 1.9.3); here we see the structure that holds a StringIO's state:

static struct StringIO *struct StringIO {
    VALUE string;
    long pos;
    long lineno;
    int flags;
    int count;
};

When a StringIO instance is finalized (that is, garbage collected), its reference to the string is dropped so that the string may be garbage collected if there are no other references to it. Here's the finalize method, which is also called by StringIO#open(&block) in order to close the instance.

static VALUE
strio_finalize(VALUE self)
{
    struct StringIO *ptr = StringIO(self);
    ptr->string = Qnil;
    ptr->flags &= ~FMODE_READWRITE;
    return self;
}

The finalize method is called only when the object is garbage collected. There is no other method of StringIO which frees the string reference.

StringIO#close just sets a flag. It does not free the reference to the accumulated string or in any other way affect resource usage:

static VALUE
strio_close(VALUE self)
{   
    struct StringIO *ptr = StringIO(self);
    if (CLOSED(ptr)) {
        rb_raise(rb_eIOError, "closed stream");
    }
    ptr->flags &= ~FMODE_READWRITE;
    return Qnil;
}

And lastly, when you call StringIO#string, you get a reference to the exact same string that the StringIO instance has been accumulating:

static VALUE
strio_get_string(VALUE self)
{   
    return StringIO(self)->string;
}

How to leak memory when using StringIO

All of this means that there is only one way for a StringIO instance to cause a resource leak: You must not close the StringIO object, and you must keep it around longer than you keep the string you got when you called StringIO#string. For example, imagine a class having a StringIO object as an instance variable:

class Leaker

  def initialize
    @sio = StringIO.new
    @sio.puts "Here's a large file:"
    @sio.puts
    @sio.write File.read('/path/to/a/very/big/file')
  end

  def result
    @sio.string
  end

end

Imagine that the user of this class gets the result, uses it briefly, and then discards it, and yet keeps a reference to the instance of Leaker. You can see that the Leaker instance retains a reference to the result via the un-closed StringIO instance. This could be a problem if the file is very large, or if there are many extant instance of Leaker. This simple (and deliberately pathological) example can be fixed by simply not keeping the StringIO as an instance variable. When you can (and you almost always can), it's better to simply throw away the StringIO object than to go through the bother of closing it explicitly:

class NotALeaker

  attr_reader :result

  def initialize
    sio = StringIO.new
    sio.puts "Here's a large file:"
    sio.puts
    sio.write File.read('/path/to/a/very/big/file')
    @result = sio.string
  end

end

Add to all of this that these leaks only matter when the strings are large or the StringIO instances numerous and the StringIO instance is long lived, and you can see that explicitly closing StringIO is seldom, if ever, needed.

ide sublime2 how to find method definition

6 votes

I'm using Sublime 2 for Ruby On Rails programming. I need a ability to click a method name and jump to class where the method is defined. There are many IDE with similar capability...

Goto symbol is Ctrl-R (linux), this gives a pop-up-list of all symbol and class definitions in the file, in definition order, and you can jump to what you're after. You could do the same thing with Goto Anything, Ctrl-P and then typing @ and the method name.

Also, there is a Goto Symbol plugin, which lets you jump straight to the definition of the method name your cursor is at, with a key binding or click.

However, both those methods are limited to the current file. If you need to jump to definitions in other files, probably the best solution is the SublimeCodeIntel plugin. It seems to be working pretty well and just by hitting Ctrl-f3 (linux) will open up the file at the definition you want.

Retaining the pattern characters while splitting via Regex, Ruby

5 votes

I have the following string

str="HelloWorld How areYou I AmFine"

I want this string into the following array

["Hello","World How are","You I Am", "Fine"]

I have been using the following regex, it splits correctly but it also omits the matching pattern, i also want to retain that pattern. What i get is

str.split(/[a-z][A-Z]/)
 => ["Hell", "orld How ar", "ou I A", "ine"] 

It omitts the matching pattern.

Can any one help me out how to retain these characters as well in the resulting array

Three answers so far, each with a limitation: one is rails-only and breaks with underscore in original string, another is ruby 1.9 only, the third always has a potential error with its special character. I really liked the split on zero-width assertion answer from @Alex Kliuchnikau, but the OP needs ruby 1.8 which doesn't support lookbehind. There's an answer that uses only zero-width lookahead and works fine in 1.8 and 1.9 using String#scan instead of #split.

str.scan /.*?[a-z](?=[A-Z]|$)/
=> ["Hello", "World How are", "You I Am", "Fine"]

How to manually construct an AST?

5 votes

I'm currently learning about parsing but i'm a bit confused as how to generate an AST. I have written a parser that correctly verifies whether an expressions conforms to a grammar (it is silent when the expression conforms and raises an exception when it is not). Where do i go from here to build an AST? I found plenty of information on building my LL(1) parser, but very little on then going on to build the AST.

My current code (written in very simple Ruby, and including a lexer and a parser) is found here on github: https://gist.github.com/e9d4081b7d3409e30a57

Can someone explain how i go from what i have currently to an AST?

Alternatively, if you are unfamiliar with Ruby, but know C, could you tell me how i build an AST for the C code in the recursive descent parsing wikipedia article.

Please note, i do not want to use a parser generator like yacc or antlr to do the work for me, i want to do everything from scratch.

Thanks!

You need to associate each symbol that you match with a callback that constructs that little part of the tree. For example, let's take a fairly common construct: nested function calls.

a(b())

Your terminal tokens here are something like:

  • L_PAREN = '('
  • R_PAREN = ')'
  • IDENTIFIER = [a-z]+

And your nonterminal symbols are something like:

  • FUNCTION_CALL = IDENTIFIER, L_PAREN, R_PAREN
  • or;
  • FUNCTION_CALL = IDENTIFIER, L_PAREN, FUNCTION_CALL, R_PAREN

Obviously the second alternative above for the rule FUNCTION_CALL is recursive.

You already have a parser that knows it has found a valid symbol. The bit you're missing is to attach a callback to the rule, which receives its components as inputs and returns a value (usually) representing that node in the AST.

Imagine if the first alternative from our FUNCTION_CALL rule above had a callback:

Proc.new do |id_tok, l_paren_tok, r_paren_tok|
  { item: :function_call, name: id_tok, args: [] }
end

That would mean that the AST resulting from matching:

a()

Would be:

{
  item: :function_call,
  name: "a",
  args: []
}

Now to extrapolate that to the more complex a(b()). Because the parser is recursive, it will recognize the b() first, the callback from which returns what we have above, but with "b" instead of "a".

Now let's define the callback attached to the rule that matches the second alternative. It's very similar, except it also deals with the argument it was passed:

Proc.new do |id_tok, l_paren_tok, func_call_item, r_paren_tok|
  { item: :function_call, name: id_tok, args: [ func_call_item ] }
end

Because the parser has already recognized b() and that part of the AST was returned from your callback, the resulting tree is now:

{
  item: :function_call,
  name: "a",
  args: [
    {
      item: :function_call,
      name: "b",
      args: []
    }
  ]
}

Hopefully this gives you some food for thought. Pass all the tokens you match into a routine that constructs very small parts of your AST.

Set with custom rule

5 votes

According to the Set doc, elements in a set are compared using eql?.

I have a class like:

class Foo
  attr_accessor :bar, :baz

  def initialize(bar = 1, baz = 2)
    @bar = bar
    @baz = baz
  end

  def eql?(foo)
    bar == foo.bar && baz == foo.baz
  end
end

In console:

f1 = Foo.new
f2 = Foo.new
f1.eql? f2 #=> true

But...

 s = Set.new
 s << f1
 s << f2
 s.size #=> 2

Because f1 equals f2, s should not include both of them.

How to make the set reject elements with a custom rule?

The docs that you link to say explicitly (emphasis mine):

The equality of each couple of elements is determined according to Object#eql?
and Object#hash, since Set uses Hash as storage.

If you add a hash method to your class that returns the same value for eql? objects, it works:

# With your current class

f1, f2 = Foo.new, Foo.new
p f1.eql?(f2)
#=> true
p f1.hash==f2.hash
#=> false
p Set[f1,f2].length
#=> 2

# Fix the problem
class Foo
  def hash
    [bar,hash].hash
  end
end

f1, f2 = Foo.new, Foo.new
p f1.eql?(f2)
#=> true
p f1.hash==f2.hash
#=> true
p Set[f1,f2].length
#=> 1

To be honest I've never had a great sense for how to write a good custom hash method when multiple values are involved.

Elegant chained 'or's

5 votes

What's the sensible way of saying this.

if @thing == "01" or "02" or "03" or "04" or "05"

(The numbers are contained in a column of datatype string.)

Make an array and use .include?

if ["01","02","03","04","05"].include?(@thing)

If the values really are all consecutive, you can use a range like (1..5).include? For strings, you can use:

if ("01".."05").include?(@thing)

Print all method names of a class in Ruby?

5 votes

In Ruby, if we want to list all methods for a class we can use Class_Name.methods so for example, to list all methods in FixNum class:-

> 5.methods
 => [:to_s, :-@, :+, :-, :*, :/, :div, :%, :modulo, :divmod, :fdiv, :**, :abs, :magnitude, :==, :===, :<=>, :>, :>=, :<, :<=, :~, :&, :|, :^, :[], :<<, :>>, :to_f, :size, :zero?, :odd?, :even?, :succ, :integer?, :upto, :downto, :times, :next, :pred, :chr, :ord, :to_i, :to_int, :floor, :ceil, :truncate, :round, :gcd, :lcm, :gcdlcm, :numerator, :denominator, :to_r, :rationalize, :singleton_method_added, :coerce, :i, :+@, :eql?, :quo, :remainder, :real?, :nonzero?, :step, :to_c, :real, :imaginary, :imag, :abs2, :arg, :angle, :phase, :rectangular, :rect, :polar, :conjugate, :conj, :between?, :nil?, :=~, :!~, :hash, :class, :singleton_class, :clone, :dup, :initialize_dup, :initialize_clone, :taint, :tainted?, :untaint, :untrust, :untrusted?, :trust, :freeze, :frozen?, :inspect, :methods, :singleton_methods, :protected_methods, :private_methods, :public_methods, :instance_variables, :instance_variable_get, :instance_variable_set, :instance_variable_defined?, :instance_of?, :kind_of?, :is_a?, :tap, :send, :public_send, :respond_to?, :respond_to_missing?, :extend, :display, :method, :public_method, :define_singleton_method, :object_id, :to_enum, :enum_for, :equal?, :!, :!=, :instance_eval, :instance_exec, :__send__, :__id__]

now, as can see this list this is really hard to read. I also tried 5.methods.sort method but that does help much in making it more readable.

I frequently use the list of methods during my everyday programming. so, I was wondering if there is a way to pretty print this, so it becomes easy bit easy to read?

try this one liner:-

puts 5.methods.sort

Recommended approach to monkey patching a class in ruby

5 votes

I've noticed that there are two common ways to monkey patch a class in ruby:

Define the new members on the class like so:

class Array
   def new_method
     #do stuff
   end
end

And calling class_eval on the class object:

Array.class_eval do
   def new_method
      #do stuff
   end
end

I'm wondering if there is any difference between the two and whether there are advantages to using one approach over the other?

Honestly, I used to use the 1st form (reopening the class), as it feels more natural, but your question forced me to do some research on the subject and here's the result.

The problem with reopening the class is that it'll silently define a new class if the original one, that you intended to reopen, for some reason wasn't defined at the moment. The result might be different:

  1. If you don't override any methods but only add the new ones and the original implementation is defined (e.g., file, where the class is originally defined is loaded) later everything will be ok.

  2. If you redefine some methods and the original is loaded later your methods will be overridden back with their original versions.

  3. The most interesting case is when you use standard autoloading or some fancy reloading mechanism (like the one used in Rails) to load/reload classes. Some of these solutions rely on const_missing that is called when you reference undefined constant. In that case autoloading mechanism tries to find undefined class' definition and load it. But if you're defining class on your own (while you intended to reopen already defined one) it won't be 'missing' any longer and the original might be never loaded at all as the autoloading mechanism won't be triggered.

On the other hand, if you use class_eval you'll be instantly notified if the class is not defined at the moment. In addition, as you're referencing the class when you call its class_eval method, any autoloading mechanism will have a chance to locate class' definition and load it.

Having that in mind class_eval seems to be a better approach. Though, I'd be happy to hear some other opinion.

Should class instance variables in Rails be set within a mutex?

4 votes

Let's say I've got a Ruby class in my Rails project that is setting an instance variable.

class Something
  def self.objects
    @objects ||= begin
      # some logic that builds an array, which is ultimately stored in @objects
    end
  end
end

Is it possible that @objects could be set multiple times? Is it possible that during one request, while executing code between the begin/end above, that this method could be called during a second request? This really comes down to a question of how Rails server instances are forked, I suppose.

Should I instead be using a Mutex or thread synchronization? e.g.:

class Something
  def self.objects
    return @objects if @objects

    Thread.exclusive do
      @objects ||= begin
        # some logic that builds an array, which is ultimately stored in @objects
      end
    end
  end
end

I'll take a stab.

Rails is single-threaded. Successive requests to a Rails application are either queued or handled by separate application instances (read: processes). The value of the class instance variable @objects defined in your Something class exists within scope of the process, not within the scope of any instance of your application.

Therefore a mutex would be unnecessary as you would never encounter the case where two processes are accessing the same resource because the memory spaces of the two processes are entirely separate.

I think this raises another question, is @objects intended to be a shared resource, if so I think it needs to be implemented differently.

Disclaimer: I may be completely off the mark here, in fact I sort of hope I am so I can learn something today :)