Best ruby questions in June 2012

ActiveRecord objects in hashes aren't garbage collected -- a bug or a sort of caching feature?

16 votes

I have a simple ActiveRecord model called Student with 100 records in the table. I do the following in a rails console session:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

x = Student.all

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

x = nil
GC.start

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0     # Good!

Now I do the following:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

x = Student.all.group_by(&:last_name)

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

x = nil
GC.start

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100     # Bad!

Can anyone explain why this happens and whether there is a smart way to solve this without knowing the underlying hash structure? I know I can do this:

x.keys.each{|k| x[k]=nil}
x = nil
GC.start

and it will remove all Student objects from memory correctly, but I'm wondering if there is a general solution (my real-life problem is wide spread and has more intricate data structures than the hash shown above).

I'm using Ruby 1.9.3-p0 and Rails 3.1.0.

UPDATE (SOLVED)

Per Oscar Del Ben's explanation below, a few ActiveRecord::Relation objects are created in the problematic code snippet (they are actually created in both code snippets, but for some reason they "misbehave" only in the second one. Can someone shed light on why?). These maintain references to the ActiveRecord objects via an instance variable called @records. This instance variable can be set to nil through the "reset" method on ActiveRecord::Relation. You have to make sure to perform this on all the relation objects:

ObjectSpace.each_object(ActiveRecord::Base).count
# => 100

ObjectSpace.each_object(ActiveRecord::Relation).each(&:reset)

GC.start
ObjectSpace.each_object(ActiveRecord::Base).count
# => 0

Note: You can also use Mass.detach (using the ruby-mass gem Oscar Del Ben referenced), though it will be much slower than the code above. Note that the code above does not remove a few ActiveRecord::Relation objects from memory. These seem to be pretty insignificant though. You can try doing:

Mass.index(ActiveRecord::Relation)["ActiveRecord::Relation"].each{|x| Mass.detach Mass[x]}
GC.start

And this would remove some of the ActiveRecord::Relation objects, but not all of them (not sure why, and those that are left have no Mass.references. Weird).

I think I know what's going on. Ruby's GC wont free immutable objects (like symbols!). The keys returned by group_by are immutable strings, and so they wont be garbage collected.

UPDATE:

It seems like the problem is not with Rails itself. I tried using group_by alone, and sometimes the objects would not get garbage collected:

oscardelben~/% irb
irb(main):001:0> class Foo
irb(main):002:1> end
=> nil
irb(main):003:0> {"1" => Foo.new, "2" => Foo.new}
=> {"1"=>#<Foo:0x007f9efd8072a0>, "2"=>#<Foo:0x007f9efd807250>}
irb(main):004:0> ObjectSpace.each_object(Foo).count
=> 2
irb(main):005:0> GC.start
=> nil
irb(main):006:0> ObjectSpace.each_object(Foo).count
=> 0
irb(main):007:0> {"1" => Foo.new, "2" => Foo.new}.group_by
=> #<Enumerator: {"1"=>#<Foo:0x007f9efb83d0c8>, "2"=>#<Foo:0x007f9efb83d078>}:group_by>
irb(main):008:0> GC.start
=> nil
irb(main):009:0> ObjectSpace.each_object(Foo).count
=> 2 # Not garbage collected
irb(main):010:0> GC.start
=> nil
irb(main):011:0> ObjectSpace.each_object(Foo).count
=> 0 # Garbage collected

I've digged through the GC internals (which are surprisingly easy to understand), and this seems like a scope issue. Ruby walks through all the objects in the current scope and marks the ones which it thinks are still being used, after that it goes through all the objects in the heap and frees the ones which have not been marked.

In this case I think the hash is still being marked even though it's out of scope. There are many reasons why this may happening. I'll keep investigating.

UPDATE 2:

I've found what's keeping references of objects. To do that I've used the ruby mass gem. It turns out that Active Record relation keeps track of the objects returned.

User.limit(1).group_by(&:name)
GC.start
ObjectSpace.each_object(ActiveRecord::Base).each do |obj|
  p Mass.references obj # {"ActiveRecord::Relation#70247565268860"=>["@records"]}
end

Unfortunately, calling reset on the relation didn't seem to help, but hopefully this is enough information for now.

Testing: how to focus on behavior instead of implementation without losing speed?

14 votes

It seems, that there are two totally different approaches to testing, and I would like to cite both of them.

The thing is, that those opinions were stated 5 years ago (2007), and I am interested, what has changed since then and which way should I go.

Brandon Keepers:

The theory is that tests are supposed to be agnostic of the implementation. This leads to less brittle tests and actually tests the outcome (or behavior).

With RSpec, I feel like the common approach of completely mocking your models to test your controllers ends up forcing you to look too much into the implementation of your controller.

This by itself is not too bad, but the problem is that it peers too much into the controller to dictate how the model is used. Why does it matter if my controller calls Thing.new? What if my controller decides to take the Thing.create! and rescue route? What if my model has a special initializer method, like Thing.build_with_foo? My spec for behavior should not fail if I change the implementation.

This problem gets even worse when you have nested resources and are creating multiple models per controller. Some of my setup methods end up being 15 or more lines long and VERY fragile.

RSpec’s intention is to completely isolate your controller logic from your models, which sounds good in theory, but almost runs against the grain for an integrated stack like Rails. Especially if you practice the skinny controller/fat model discipline, the amount of logic in the controller becomes very small, and the setup becomes huge.

So what’s a BDD-wannabe to do? Taking a step back, the behavior that I really want to test is not that my controller calls Thing.new, but that given parameters X, it creates a new thing and redirects to it.

David Chelimsky:

It’s all about trade-offs.

The fact that AR chooses inheritance rather than delegation puts us in a testing bind – we have to be coupled to the database OR we have to be more intimate with the implementation. We accept this design choice because we reap benefits in expressiveness and DRY-ness.

In grappling with the dilemma, I chose faster tests at the cost of slightly more brittle. You’re choosing less brittle tests at the cost of them running slightly slower. It’s a trade-off either way.

In practice, I run the tests hundreds, if not thousands, of times a day (I use autotest and take very granular steps) and I change whether I use “new” or “create” almost never. Also due to granular steps, new models that appear are quite volatile at first. The valid_thing_attrs approach minimizes the pain from this a bit, but it still means that every new required field means that I have to change valid_thing_attrs.

But if your approach is working for you in practice, then its good! In fact, I’d strongly recommend that you publish a plugin with generators that produce the examples the way you like them. I’m sure that a lot of people would benefit from that.

Ryan Bates:

Out of curiosity, how often do you use mocks in your tests/specs? Perhaps I'm doing something wrong, but I'm finding it severely limiting. Since switching to rSpec over a month ago, I've been doing what they recommend in the docs where the controller and view layers do not hit the database at all and the models are completely mocked out. This gives you a nice speed boost and makes some things easier, but I'm finding the cons of doing this far outweigh the pros. Since using mocks, my specs have turned into a maintenance nightmare. Specs are meant to test the behavior, not the implementation. I don't care if a method was called I just want to make sure the resulting output is correct. Because mocking makes specs picky about the implementation, it makes simple refactorings (that don't change the behavior) impossible to do without having to constantly go back and "fix" the specs. I'm very opinionated about what a spec/tests should cover. A test should only break when the app breaks. This is one reason why I hardly test the view layer because I find it too rigid. It often leads to tests breaking without the app breaking when changing little things in the view. I'm finding the same problem with mocks. On top of all this, I just realized today that mocking/stubbing a class method (sometimes) sticks around between specs. Specs should be self contained and not influenced by other specs. This breaks that rule and leads to tricky bugs. What have I learned from all this? Be careful where you use mocking. Stubbing is not as bad, but still has some of the same issues.

I took the past few hours and removed nearly all mocks from my specs. I also merged the controller and view specs into one using "integrate_views" in the controller spec. I am also loading all fixtures for each controller spec so there's some test data to fill the views. The end result? My specs are shorter, simpler, more consistent, less rigid, and they test the entire stack together (model, view, controller) so no bugs can slip through the cracks. I'm not saying this is the "right" way for everyone. If your project requires a very strict spec case then it may not be for you, but in my case this is worlds better than what I had before using mocks. I still think stubbing is a good solution in a few spots so I'm still doing that.

I think all three opinions are still completely valid. Ryan and I were struggling with the maintainability of mocking, while David felt the maintenance tradeoff was worth it for the increase in speed.

But these tradeoffs are symptoms of a deeper problem, which David alluded to in 2007: ActiveRecord. The design of ActiveRecord encourages you to create god objects that do too much, know too much about the rest of the system, and have too much surface area. This leads to tests that have too much to test, know too much about the rest of the system, and are either too slow or brittle.

So what's the solution? Separate as much of your application from the framework as possible. Write lots of small classes that model your domain and don't inherit from anything. Each object should have limited surface area (no more than a few methods) and explicit dependencies passed in through the constructor.

With this approach, I've only been writing two types of tests: isolated unit tests, and full-stack system tests. In the isolation tests, I mock or stub everything that is not the object under test. These tests are insanely fast and often don't even require loading the whole Rails environment. The full stack tests exercise the whole system. They are painfully slow and give useless feedback when they fail. I write as few as necessary, but enough to give me confidence that all my well-tested objects integrate well.

Unfortunately, I can't point you to an example project that does this well (yet). I talk a little about it in my presentation on Why Our Code Smells, watch Corey Haines' presentation on Fast Rails Tests, and I highly recommend reading Growing Object Oriented Software Guided by Tests.

Ruby if vs end of the line if behave differently?

11 votes

Why doesn't this code work?

h = {k: 1}
v = k if k = h.delete(:k)

Error: undefined local variable or method `k'

But this does:

h = {k: 1}

if k = h.delete(:k)
    v = k
end

Shouldn't they be the same?

This is a very good question. It has to do with the scoping of variables in Ruby.

Here is a post by Matz on the Ruby bug tracker about this:

local variable scope determined up to down, left to right. So a local variable first assigned in the condition of if modifier is not effective in the left side if body. It's a spec.

What is the formal term for the "#{}" token in Ruby syntax?

10 votes

The Background

I recently posted an answer where I variously referred to #{} as a literal, an operator, and (in one draft) a "literal constructor." The squishiness of this definition didn't really affect the quality of the answer, since the question was more about what it does and how to find language references for it, but I'm unhappy with being unable to point to a canonical definition of exactly what to call this element of Ruby syntax.

The Ruby manual mentions this syntax element in the section on expression substitution, but doesn't really define the term for the syntax itself. Almost every reference to this language element says it's used for string interpolation, but doesn't define what it is.

Wikipedia Definitions

Here are some Wikipedia definitions that imply this construct is (strictly speaking) neither a literal nor an operator.

  1. Literal (computer programming)
  2. Operator (programming)

The Questions

Does anyone know what the proper term is for this language element? If so, can you please point me to a formal definition?

TL;DR

Embedded expression seems the most likely definition for this token, based on hints in the source code.

Related Answer

This answer calls attention to the Ruby source, which makes numerous references to embexpr throughout the code base. @Phlip suggests that this variable is an abbreviation for "EMBedded EXPRession." This seems like a reasonable interpretation, but neither the ruby-1.9.3-p194 source nor Google (as of this writing) explicitly references the term embedded expression in association with embexpr in any context, Ruby-related or not.

Additional Research

A scan of the source code with:

ack-grep -cil --type-add=YACC=.y embexpr .rvm/src/ruby-1.9.3-p194 |
    sort -rnk2 -t: |
    sed 's!^.*/!!'

reveals 9 files and 33 lines with the term embexpr:

test_scanner_events.rb:12
test_parser_events.rb:7
eventids2.c:5
eventids1.c:3
eventids2table.c:2
parse.y:1
parse.c:1
ripper.y:1
ripper.c:1

Of particular interest is the inclusion of string_embexpr on line 4,176 of the parse.y and ripper.y bison files. Likewise, TestRipper::ParserEvents#test_string_embexpr contains two references to parsing #{} on lines 899 and 902 of test_parser_events.rb.

The scanner, exercised in test_scanner_events.rb, is also noteworthy. This file defines tests in #test_embexpr_beg and #test_embexpr_end that scan for the token #{expr} inside various string expressions. The tests reference both embexpr and expr, raising the likelihood that "embedded expression" is indeed a sensible name for the thing.

Using god only to kill

8 votes

I serve my software using passenger. It spawns many ruby processes.

Sometimes one of these rubies becomes bloated and I want it to die.

I was hoping to use god to that intent. My idea was to monitor all these rubies and if it is consuming more than 500MB of memory for 3 cycles, god should try to gracefuly kill it. If it remains alive for more than 5 minutes then god should kill it not gracefully.

It seems to me that god always tries to run the service again, so it forces us to provide a start command. Is it possible to use god only to kill bad behaviored processes and let the passenger spawner to bring them back to live when necessary?

Answer to your question lies in question itself. you can kill ruby processes using god gem which is ruby process process monitor framework by github guys.

basically, here is how it works:

  1. configure god to monitor process it can be anything from apache,passenger,mongrel or just simple file doing a long-running task.
  2. Set conditionals in god's configuration file based upon which god will execute some predefined code.

here is a simple example(taken from docs). consider this as file long running process that runs undefiantly which we want to monitor for memory usage, lets call it simple.rb

loop do
  puts 'Hello'
  sleep 1
end

now, we install the god gem & configure it to as run as superuser so it can kill/spawn processes and next create a configuration file. example(also taken from docs):

God.watch do |w|
  w.name = "simple"
  w.start = "ruby /full/path/to/simple.rb"
  w.keepalive(:memory_max => 500.megabytes)
end

Here, as you may have got the idea if the process memory usage goes above 500 megabytes, god will restart it. here are few resources that might help, if you are getting started with process management using god gem:

Now, please remember ALL configuration for god is actually legal ruby code so you can get creative & do all sorts of things.

lastly, if you are frequently finding yourself running long running process, I advice you to try JRuby which is works much better with long running processes due to JVM & LOT faster than MRI

Why is the splat/unary operator changing the assigned value a when p is called before *a = ""?

7 votes

To give a little context around how I understand the problem.

Using splat collect on a string sends :to_a or :to_ary to the String

class String
  def method_missing method, *args, &block
    p method #=> :to_ary
    p args   #=> []
    p block  #=> nil
  end
end

*b = "b"

So I was thinking that redefining the :to_ary method would be what I'm after.

class String
  def to_ary
    ["to_a"]
  end
end

p *a = "a" #=> "a"
p a        #=> "a"

*b = "b"
p b        #=> ["to_a"]

Now this confuses me to no end.

Printing the result from the *a = "a" changes the value assigned to a?

To demonstrate further

class String
  def to_ary
    [self.upcase!]
  end
end

p *a = "a" #=> "a"
p a        #=> "a"

*b = "b"
p b        #=> ["B"]

Very interesting question! Ruby takes this expression:

 p *a = "a"

and translates it to something like this:

 temp = (a = "a")
 p *temp

So the first thing that happens is that a gets assigned to "a", and then the result of the assignment expression which is "a" gets splatted and sent to p. Since p's default behaviour when sent multiple arguments is just to iterate over and print each one, you only see "a" appear.

In short, it follows a "assign then splat" order of evaluation. So a gets assigned to "a" before the string gets splatted.

When you don't have a function call however, it is interpreted as something like this:

# *a = "a" gets interpreted as:
temp = "a"
a = *temp

This follows a "splat then assign" order of evaluation. So a gets assigned after the string gets splatted.

You can see what's being received by a function by going like this:

def foo *args
  puts args.inspect
end

foo *a = "a"    # outputs ["a"]
a               # outputs "a"

Hope this clears up what's going on!

In short (thanks to Mark Reed):

p *a = "a"    # interpreted as: p(*(a = "a"))
*a = "a"      # interpreted as: a = *("a")

Reference for learning web socket

7 votes

I want to write a web socket client in javascript and web socket server in ruby.

Where shall I start? are there any existing libraries to reduces my work?

I'm lost and confused googling. Please provide any links where to start, given that has knowledge on ruby, javascript, basic networking in ruby.

i currently using em-websocket

EventMachine.run {

    EventMachine::WebSocket.start(:host => "0.0.0.0", :port => 8080) do |ws|
        ws.onopen {
          puts "WebSocket connection open"

          # publish message to the client
          ws.send "Hello Client"
        }

        ws.onclose { puts "Connection closed" }
        ws.onmessage { |msg|
          puts "Recieved message: #{msg}"
          ws.send "Pong: #{msg}"
        }
    end
}

for more info see another thread about ruby & websocket:

Understanding ruby quine

6 votes

I have found this code block on Wikipedia as an example of a quine (program that prints itself) in Ruby.

puts <<2*2,2
puts <<2*2,2
2

However, I do not get how it works. Especially, what I do not get is that when I remove the last line, I get this error:

syntax error, unexpected $end, expecting tSTRING_CONTENT or tSTRING_DBEG or tSTRING_DVAR or tSTRING_END

What happens in those lines?

The <<something syntax begins a here-document, borrowed from UNIX shells via Perl - it's basically a multiline string literal that starts on the line after the << and ends when a line starts with something.

So structurally, the program is just doing this:

puts str*2,2

... that is, print two copies of str followed by the number 2.

But instead of the variable str, it's including a literal string via a here-document whose ending sentinel is also the digit 2:

puts <<2*2,2
puts <<2*2,2
2

So it prints out two copies of the string puts <<2*2,2, followed by a 2. (And since the method used to print them out is puts, each of those things gets a newline appended automatically.)

What is the closest C++ analogue to Ruby's Rack?

6 votes

I'm a big fan of Rack, and I've used it to build several lightweight web apps over the past few years. I've been curious for a while if something similar exists for C++. I've spent quite a bit of time searching Google and come up empty-handed. It doesn't help that I find Rack hard to describe. Its tagline is "A Ruby Webserver Interface". Searching for {c++ "webserver interface"}, I've found things that do much more than I want, like wt, and I've found suggestions to use FastCGI directly. I feel like Rack fits squarely between these two options.

I'm not sure if I'm having trouble finding a C++ analogue to Rack because no such thing exists or because I'm just using poor search terms.

Is there a close C++ analogue to Rack? If not, is there a library or small set of libraries that can do most of the lower-level, error-prone stuff for me, but still leave me with the level of control that Rack does?

Here are the best options I've found so far:

  • cpp-net-lib (Thanks @Managu) - This appears to be close to what I had in mind.
  • fastcgi++ - This appears to offers lots of niceties over straight FastCGI without turning into a full framework -- so also close to what I had in mind.
  • Mongrel2 - According to Zed, "Mongrel2's protocol also tends to remove the need for any 'middleware' like WSGI or Rack since its protocol is already similar to what those do." This comes from a very different angle, but also looks like it satisfies my general criteria.

HW impossibility?: "Create a rock paper scissors program in ruby WITHOUT using conditionals"

6 votes

I'm in an introductory software development class, and my homework is to create a rock paper scissors program that takes two arguments (rock, paper), etc, and returns the arg that wins.

Now I would make quick work of this problem if I could use conditionals, but the assignment says everything we need to know is in the first three chapters of the ruby textbook, and these chapters DO NOT include conditionals! Would it be possible to create this program without them? Or is he just expecting us to be resourceful and use the conditionals. It's a very easy assignment with conditionals though...I'm thinking that I might be missing something here.

EDIT: I'm thinking of that chmod numerical system and think a solution may be possible through that additive system...

def winner(p1, p2)
  wins = {rock: :scissors, scissors: :paper, paper: :rock}
  {true => p1, false => p2}[wins[p1] == p2]
end

winner(:rock, :rock) # => :rock d'oh! – tokland

Per @sarnold, leaving this as an exercise for the student :).

Ruby: What's an elegant way to pick a random line from a text file?

6 votes

I've seen some really beautiful examples of Ruby and I'm trying to shift my thinking to be able to produce them instead of just admire them. Here's the best I could come up with for picking a random line out of a file:

def pick_random_line
  random_line = nil
  File.open("data.txt") do |file|
    file_lines = file.readlines()
    random_line = file_lines[Random.rand(0...file_lines.size())]
  end 

  random_line                                                                                                                                                               
end 

I feel like it's gotta be possible to do this in a shorter, more elegant way without storing the entire file's contents in memory. Is there?

You can do it without storing anything except the current candidate for the random line.

def pick_random_line
  chosen_line = nil
  File.foreach("data.txt").each_with_index do |line, number|
    chosen_line = line if rand < 1.0/(number+1)
  end
  return chosen_line
end

So the first line is chosen with probability 1/1 = 1; the second line is chosen with probability 1/2, so half the time it keeps the first one and half the time it switches to the second.

Then the third line is chosen with probability 1/3 - so 1/3 of the time it picks it, and the other 2/3 of the time it keeps whichever one of the first two it picked. Since each of them had a 50% chance of being chosen as of line 2, they each wind up with a 1/3 chance of being chosen as of line 3.

And so on. At line N, every line from 1-N has an even 1/N chance of being chosen, and that holds all the way through the file (as long as the file isn't so huge that 1/(number of lines in file) is less than epsilon :)). And you only make one pass through the file and never store more than two lines at once.

EDIT You're not going to get a real concise solution with this algorithm, but you can turn it into a one-liner if you want to:

def pick_random_line
  File.foreach("data.txt").each_with_index.reduce(nil) { |picked,pair| 
    rand < 1.0/(1+pair[1]) ? pair[0] : picked }
end

. vs :: (dot vs. double-colon) for calling a method

6 votes

Possible Duplicate:
What is Ruby's double-colon (::) all about?

I am learning Ruby from the Poignant Guide to Ruby and in some of the code examples, I came across uses of the double colon and dot that seem to be used for the same purpose:

File::open( 'idea-' + idea_name + '.txt', 'w' ) do |f|
   f << idea
end

In the above code, the double colon is being used to access the open method of the File class. However, I later came across code that used a dot for the same purpose:

require 'wordlist'
# Print each idea out with the words fixed
Dir['idea-*.txt'].each do |file_name|
   idea = File.read( file_name )
   code_words.each do |real, code| 
     idea.gsub!( code, real )
   end
puts idea
end 

This time, a dot is being used to access the read method of the File class. What is the difference between:

File.read()

and

File::open()

It's the scope resolution operator.

It gives you the ability to access things that the . operator won't such as constants, modules or non instance methods. It also can remove ambiguity when dealing with local and global variables.

Ruby sprintf acting up in 1.9

6 votes

I am experiencing some confusion on the Kernel#sprintf method in Ruby.

Ruby 1.9 handles encoding in a different way than Ruby 1.8.

Here are the results I am after, and how it behaves in Ruby 1.8:

>> RUBY_VERSION
=> "1.8.7"
>> sprintf("%c", 88599)
=> "\027"

This is how it behaves in Ruby 1.9:

1.9.3p194 :001 > RUBY_VERSION
=> "1.9.3" 
1.9.3p194 :002 > sprintf("%c", 88599)
=> "\u{15A17}"

If I use the magic comment to set the encoding to binary (ascii-8bit) I get an error:

1.9.3p194 :001 > RUBY_VERSION
=> "1.9.3" 
1.9.3p194 :002 > # encoding: binary
1.9.3p194 :003 >   sprintf("%c", 88599)
RangeError: 88599 out of char range
from (irb):3:in `sprintf'
from (irb):3
from /Users/lisinge/.rvm/rubies/ruby-1.9.3-p194/bin/irb:16:in `<main>'

I have also tried this with Ruby 1.9.2 so there doesn't seem to be specific to 1.9.3.

Maybe I am doing something wrong? I am not so familiar with the Kernel#sprintf method.

I am using a smpp library called ruby-smpp which can be found on github. It is the send_concat_mt method on line #47 that is acting up when i am trying to run it in Ruby 1.9.3.

I would greatly appreciate it if any of you could shed some light on this matter.

The sprintf documentation states:

Field |  Other Format 
------+--------------------------------------------------------------
  c   | Argument is the numeric code for a single character or
      | a single character string itself.

88599 is not a valid numeric code for a single character in the default behavior for Ruby 1.8; which, I believe, is no encoding. What it appears to be doing is doing a mod 256 on the value you supply and then converting it:

% irb
1.9.3-p194 :003 > 88599 % 256 == 027
 => true 

As to you doing something wrong, no. What did happen is that allowing out-of-bounds character codes was a bug that has been fixed by Ruby 1.9 which now properly throws an exception.

How to DRY scope methods used in two different classes?

5 votes

I am using Ruby on Rails 3.2.2 and I would like to retrieve / scope associated objects by "specifying" / "filtering on" an attribute value on those associated objects. That is, at this time I am using the following code:

class Article < ActiveRecord::Base
  def self.search_by_title(search)
    where('articles.title LIKE ?', "%#{search}%")
  end
end

class ArticleAssociation < ActiveRecord::Base
  def self.search_by_article_title(search)
    joins(:article).where('articles.title LIKE ?', "%#{search}%")
  end
end

In the above code the where('articles.title LIKE ?', "%#{search}%") clause is repeated twice and so I thought that it may be improved with the DRY principle: is it possible to use the Article.search_by_title method directly in the ArticleAssociation.search_by_article_title method?


Typical use cases are:

  • ArticleAssociation.search_by_article_title("Sample string")
  • Article.search_by_title("Sample string")

Unless you change the code structure completely, no.

You could do some hacking with lambdas, but that would be more code then the code you're DRYing. There is a such thing as good refactoring, and a such thing as bad refactoring. Unless a piece of very complex or long code is used in 2 or more places, then you can worry about refactoring. Code conventions are important, but for tiny one-method-call things like that its a waste and will probably make your code more cryptic.

Though, I know that it's annoying when people don't answer your question, so here:

class Article < ActiveRecord::Base
  SEARCH_BY_TITLE=lambda {|obj, search| obj.where('articles.title LIKE ?', "%#{search}%")}
  def self.search_by_title(search)
    SEARCH_BY_TITLE.call(self, search)
  end
end

class ArticleAssociation < ActiveRecord::Base
  def self.search_by_article_title(search)
    Article::SEARCH_BY_TITLE.call(joins(:article),search)
  end
end

That just makes a lambda as a constant that performs the where call on a specified object. Both methods just wrap that lambda.

Note: Although this may be considered more elegant, it will decrease performance a lot, as lambdas, closures, and the extra call are expensive in a dynamic language like Ruby. But I don't think that's an issue for you.

Rails 3.2.3 namespaced controllers being overridden by global controllers with same name

5 votes

When the global application controller is loaded first, the namespaced application controller does not load when loading pages within that namespace. The application controller looks like this:

class ApplicationController < ActionController::Base
 protect_from_forgery
end

And the namespaced application controller looks like this:

class Admin::ApplicationController < ApplicationController

def authenticate_admin!
 if current_admin.nil?
  redirect_to new_admin_session_url
 end
end

private

 def current_admin
  @current_admin ||= Admin.find(session[:admin_id]) if session[:admin_id]
 end

helper_method :current_admin
end

When we use the before_filter "authenticate_admin!" like this:

class Admin::AssetsController < Admin::ApplicationController
  before_filter :authenticate_admin!
end

A "NoMethodError in Admin::AssetsController#new" is thrown. This only occurs when we hit the global route before the namespaced route. If the server is restarted and the namespaced route is loaded first everything works properly.

This is happening because you also happen to have an Admin model (a Class) with the same name as your namespace.

This Google group thread provides a good explanation of what exactly is happening.

To fix, I would either rename the model to AdminUser or if that is not a possibility, renaming the namespace will also fix the issue.

How to encrypt data in a UTF-8 string using OpenSSL::Cipher?

5 votes

In a Rails 3.0 (Ruby 1.9.2) app I'm trying to encrypt some data using something like this:

cipher = OpenSSL::Cipher.new 'aes-256-cbc'
cipher.encrypt
cipher.key = cipher.random_key
cipher.iv = cipher.random_iv

encrypted = cipher.update 'most secret data in the world'
encrypted << cipher.final

That will go into a UTF-8 database. My problem is that

> encrypted.encoding
 => #<Encoding:ASCII-8BIT>

> encrypted.encode 'utf-8'
Encoding::UndefinedConversionError: "\xF7" from ASCII-8BIT to UTF-8

How can I get an UTF-8 encrypted string?

The solution is to convert the ASCII-8BIT string to Base64 and then encode to UTF-8.

cipher = OpenSSL::Cipher.new 'aes-256-cbc'
cipher.encrypt
cipher.key = cipher.random_key
cipher.iv = cipher.random_iv

encrypted = cipher.update 'most secret data in the world'
encrypted << cipher.final

encoded = Base64.encode64(encrypted).encode('utf-8')

Once persisted and retrieved from the database,

decoded = Base64.decode64 encoded.encode('ascii-8bit')

and finally decrypt it.


PS: If you're curious:

cipher = OpenSSL::Cipher.new 'aes-256-cbc'
cipher.decrypt
cipher.key = random_key
cipher.iv = random_iv

decrypted = cipher.update encoded
decrypted << cipher.final

> decrypted
 => 'most secret data in the world'

Ruby gem for text comparison

5 votes

I am looking for a gem that can compare two strings (in this case paragraphs of text) and be able to gauge the likelihood that they are similar in content (with perhaps only a few words rearranged, changed). I believe that SO uses something similar when users submit questions.

I'd probably use something like Diff::LCS:

>> require "diff/lcs"
>> seq1 = "lorem ipsum dolor sit amet consequtor".split(" ")
>> seq2 = "lorem ipsum dolor amet sit consequtor".split(" ")
1.9.3-p194 :010 > Diff::LCS.diff(seq1, seq2).length
 => 2

It uses the longest common subsequence algorithm (the method for using LCS to get a diff is described on the wiki page).

Select adjacent sibling elements without intervening non-whitespace text nodes

4 votes

Given markup like:

<p>
  <code>foo</code><code>bar</code>
  <code>jim</code> and then <code>jam</code>
</p>

I need to select the first three <code>—but not the last. The logic is "Select all code elements that have a preceding-or-following-sibling-element that is also a code, unless there exist one or more text nodes with non-whitespace content between them.

Given that I am using Nokogiri (which uses libxml2) I can only use XPath 1.0 expressions.

Although a tricky XPath expression is desired, Ruby code/iterations to perform the same on a Nokogiri document are also acceptable.

Note that the CSS adjacent sibling selector ignores non-element nodes, and so selecting nokodoc.css('code + code') will incorrectly select the last <code> block.

Nokogiri.XML('<r><a/><b/> and <c/></r>').css('* + *').map(&:name)
#=> ["b", "c"]

Edit: More test cases, for clarity:

<section><ul>
  <li>Go to <code>N</code> and
      then <code>Y</code><code>Y</code><code>Y</code>.
  </li>
  <li>If you see <code>N</code> or <code>N</code> then…</li>
</ul>
<p>Elsewhere there might be: <code>N</code></p>
<p><code>N</code> across parents.</p>
<p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
<p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

All the Y above should be selected. None of the N should be selected. The content of the <code> are used only to indicate which should be selected: you may not use the content to determine whether or not to select an element.

The context elements in which the <code> appear are irrelevant. They may appear in <li>, they may appear in <p>, they may appear in something else.

I want to select all the consecutive runs of <code> at once. It is not a mistake that there is a space character in the middle of one of sets of Y.

Use:

//code
     [preceding-sibling::node()[1][self::code]
    or
      preceding-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      preceding-sibling::node()[2][self::code]
    or
     following-sibling::node()[1][self::code]
    or
      following-sibling::node()[1]
         [self::text()[not(normalize-space())]]
     and
      following-sibling::node()[2][self::code]
     ]

XSLT - based verification:

<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     <xsl:output omit-xml-declaration="yes" indent="yes"/>

     <xsl:template match="/">
      <xsl:copy-of select=
       "//code
             [preceding-sibling::node()[1][self::code]
            or
              preceding-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              preceding-sibling::node()[2][self::code]
            or
             following-sibling::node()[1][self::code]
            or
              following-sibling::node()[1]
                 [self::text()[not(normalize-space())]]
             and
              following-sibling::node()[2][self::code]
             ]"/>
     </xsl:template>
</xsl:stylesheet>

When this transformation is applied on the provided XML document:

<section><ul>
      <li>Go to <code>N</code> and
          then <code>Y</code><code>Y</code><code>Y</code>.
      </li>
      <li>If you see <code>N</code> or <code>N</code> then…</li>
    </ul>
    <p>Elsewhere there might be: <code>N</code></p>
    <p><code>N</code> across parents.</p>
    <p>Then: <code>Y</code> <code>Y</code><code>Y</code> and <code>N</code>.</p>
    <p><code>N</code><br/><code>N</code> elements interrupt, too.</p>
</section>

the contained XPath expression is evaluated and the selected nodes are copied to the output:

<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>
<code>Y</code>