Object-oriented programming (OOP) is Ruby’s bread and butter. It’s often referred to as a purely object-oriented language because everything in Ruby is an object or can be turned into one, and I mean everything. From classes all the way down to numeric literals, Ruby exposes a consistent design that is found in few other languages.
Ruby’s object model was heavily influenced by Smalltalk and is probably a bit different than what you’re used to if you’re coming from languages like C++ or Java. The fact that classes are also objects is enough to send your mind into infinite recursion if you let it. Then there are some gotchas like subclasses not automatically initializing their superclasses and the ambiguity between defining variables and calling setter methods. This chapter tackles these issues and sorts them all out.
Additionally, I’ll explain the way Ruby actually builds inheritance hierarchies when you create classes, subclasses, and mix modules into them. Armed with this information you’ll be able to track down how methods are introduced into your class and where they came from. An important skill when dealing with large frameworks such as Ruby on Rails.
Understanding Ruby’s flavor of OOP will help you make better decisions and avoid long-term design mistakes. Especially when it comes to Ruby’s open and dynamic nature which can be used to create leaky abstractions and ignore encapsulation. Both of which will lead to maintenance nightmares and late night debugging sessions. Something which I’ll help you avoid.
Question: When you send a message to an object, how does Ruby locate the appropriate method? Deceptively simple answer: Through the inheritance hierarchy. This answer is deceptive due to the way in which Ruby constructs inheritance hierarchies behind the scenes. This is one of those rare situations where Ruby goes out of its way to obscure what’s really going on under the covers, often leading to unnecessary confusion. The methods Ruby gives us for discovering which classes are part of a hierarchy don’t always tell the whole truth. Fortunately, the way the Ruby interpreter internally builds the inheritance hierarchy is both consistent and straight forward once you understand a few of its tricks.
Exploring how Ruby searches for methods will provide some good insight into the language’s implementation and give us the perfect environment for unraveling a class’s true inheritance hierarchy. Something you should definitely be aware of. The good news is that by the end of this item you’ll never again be surprised by Ruby’s object model. The bad news is that we’ll need to start by reviewing some classic OOP terminology and how a few general terms have specific definitions in Ruby. Bear with me for a few paragraphs.
• An object is a container of variables. These variables are referred to as instance variables and represent the state of an object. Each object has a special, internal variable that connects it to one and only one class. Because of this connection the object is said to be an instance of this class.
• A class is a container of methods and constants. The methods are referred to as instance methods and represent the behavior for all objects which are instances of the class.
Things get a little confusing and circular here because classes are themselves objects. Therefore each class is also a container of variables called class variables. Classes also have a connection with another class where their methods are kept. Technically, these methods are indistinguishable from instance methods but to avoid excessive confusion they are called class methods. In other words, classes are objects whose variables are called class variables and whose methods are called class methods. (Class objects can also have instance variables, but we’ll get to that in Item 15.)
• A superclass is a fancy name for the parent class in a class hierarchy. If class B
inherits from class A
, then A
is the superclass of B
. Classes have a special, internal variable to keep track of their superclass.
• A module is identical to a class in all respects but one. Like classes, modules are objects and therefore have a connection to a class which they are an instance of. While classes are connected to the Class
class, modules are connected to the Module
class. Internally, Ruby implements modules and classes using the same data structure but limits what you can do with them through their class methods (there’s no new
method) and a more restrictive syntax.
• Modules have many uses in Ruby but for now we’re only concerned with how they contribute to the inheritance hierarchy. Although Ruby doesn’t directly support multiple inheritance, modules can be mixed into a class with the include
method which has a similar effect.
• A singleton class is a much confused term for an anonymous and invisible class in the inheritance hierarchy.
• The confusion around these classes doesn’t necessarily come from their purpose as much as it does from their name. They are sometimes referred to as eigenclasses or metaclasses. Even the source code for the Ruby interpreter uses these terms interchangeable. In this book I’ll always refer to them as singleton classes since that’s the name you’ll see from within Ruby when working with the introspection methods such as singleton_class
and singleton_methods
.
Singleton classes play an important role in Ruby, such as providing a place to store class methods and methods included from modules. Unlike other classes, they’re created dynamically on an as needed basis by Ruby itself. They also come with restrictions. For example, you can’t create an instance of a singleton class. The only thing you really need to take away from this is that singleton classes are just regular classes which don’t have names and are subjected to a couple of limitations.
• A receiver is the object on which a method is invoked. For example, in “customer.name
” the method invoked is name
and the receiver is customer
. While the name
method is executing the self
variable will be set to customer
and any instance variables accessed will come from the customer
object. Sometimes the receiver is omitted from method calls, in which case it’s implicitly set to whatever self
is in the current context.
Phew! Now that we’re done with the vocabulary lesson we can explore Ruby’s object model and how it constructs inheritance hierarchies. Let’s start by putting together a small class hierarchy and use IRB to play with it:
class Person
def name
...
end
end
class Customer < Person
...
end
irb> customer = Customer.new
---> #<Customer>
irb> customer.class
---> Customer
irb> Customer.superclass
---> Person
irb> customer.respond_to?(:name)
---> true
There’s nothing too surprising in this code. If you invoke the name
method on the customer
object the method lookup would happen exactly as you’d expect. First the Customer
class would be checked for a matching instance method. It’s obviously not there so the search would continue up the inheritance hierarchy to the Person
class where it would be found and executed. If the method hadn’t been found there Ruby would continue searching all the way up until it was either found or it hit the root class, BasicObject
. You can see the complete hierarchy in Figure 2-1.
As you know, if the method lookup makes it all the way to the root of the class tree without finding a match it will restart the search where it began, this time looking for the method_missing
method. Item 30 asks you to consider alternatives to defining your own method_missing
method so we won’t consider it here. That is, other than to say there’s a default implementation of method_missing
in the Kernel
module which raises an exception to punish you for calling an undefined method.
Which reminds me, it’s time to take this simple example and throw a wrench into it:
module ThingsWithNames
def name
...
end
end
class Person
include(ThingsWithNames)
end
irb> Person.superclass
---> Object
irb> customer = Customer.new
---> #<Customer>
irb> customer.respond_to?(:name)
---> true
I’ve taken the name
method out of the Person
class and moved it into a module which is then included into the Person
class. Instances of the Customer
class still respond to the name
method as you’d expect, but why? Clearly the ThingsWithNames
module isn’t in the inheritance hierarchy because the superclass of Person
is still Object
, or is it? As it turns out this is where Ruby starts to lie to you and where we need to talk about singleton classes a bit more.
When you use the include
method to mix in a module Ruby is doing something very sneaky behind the scenes. It creates a singleton class and inserts it into the class hierarchy. This anonymous and invisible class is linked to the module so they share instance methods and constants. In the case of the Person
class, when it includes the ThingsWithNames
module, Ruby creates a singleton class and silently inserts it as the superclass of Person
. “But calling the superclass
method on Person
returns Object
, not ThingsWithNames
,” you say. Yep, and for good reason, the singleton class is anonymous and invisible so both the superclass
and class
methods skip over it. Therefore a more accurate class hierarchy needs to include modules too. And that’s exactly what Figure 2-2 contains.
As each module is included into a class it is inserted into the hierarchy immediately above the including class in a last in, first out (LIFO) manner. Everything is connected through the superclass
variable like a singly linked list. The net result is that when Ruby is searching for a method it will visit each module in reverse order, most recently included first. An important point to take away from this is that modules can never override methods from the class which includes them. Since modules are inserted above the including class Ruby always checks the class before moving upward. (Okay, this isn’t entirely true. Be sure to read Item 35 to see how the prepend
method in Ruby 2.0 complicates this.)
Now that you’re comfortable with the interaction between modules and singleton classes it’s my duty to shake things up and introduce yet another weird (but lovely) wrench in the works:
customer = Customer.new
def customer.name
"Leonard"
end
If you haven’t seen this syntax before it can be a little confusing at first. But it’s pretty straight forward. The code above defines a method which exists only for this one object, customer
. This specific method cannot be called on any other object. How do you suppose Ruby implements that? If you’re pointing a finger at singleton classes you’d be correct. In fact, this method is called a singleton method. When Ruby executes this code it will create a singleton class, install the name
method as an instance method, and then insert this anonymous class as the class of the customer
object. Even though the class of the customer
object is now a singleton class, the introspective class
method in Ruby will skip over it and still return Customer
. This obscures things for us but makes life easier for Ruby. When it searches for the name
method it only needs to traverse the class hierarchy. No special logic is needed.
This should also shed some light on why these kinds of classes are known as singleton classes. Most classes service many objects. The Array
class, for example, contains the methods for every array object used in your program. Singleton classes, on the other hand, only serve a single object.
With no more tricks up my sleeve I’m left with only class methods to describe. You’ll be pleasantly surprised to know that you already completely understand them. They’re another form of the singleton method which we just explored.
Notice that when you define a singleton method you specify an object (almost like it’s the receiver) which will be the owner of the method. When you define a class method you do the exact same thing by specifying the class object, either through the class’s name or more commonly by using the self
variable. An object is an object after all, even if that object is a class. Therefore class methods are stored as instance methods on a singleton class. See for yourself:
class Customer < Person
def self.where_am_i?
...
end
end
irb> Customer.singleton_class.instance_methods(false)
---> [:where_am_i?]
irb> Customer.singleton_class.superclass
---> #<Class:Person>
In the end, when Ruby wants to look up a method, it only has to consider the inheritance hierarchy. This works for both instance methods and class methods. Once you understand how Ruby manipulates the inheritance hierarchy the process for looking up a method is pretty easy:
1. Get the class of the receiver, which may actually be a hidden singleton class. (Remember that you can’t actually do this from Ruby code because the class
method skips over singleton classes. You’d have to go straight to the singleton_class
method.)
2. Search the list of instance methods which are stored in the class. If the method is found stop searching and execute the method. Otherwise continue to step 3.
3. Move up the hierarchy to the superclass and repeat step 2. (Again, the superclass
may actually be another singleton class. From within Ruby—the superclass
method specifically—it would be invisible.)
4. Steps 2 and 3 are repeated until Ruby finds the method or reaches the top of the hierarchy.
5. When it reaches the top, Ruby starts back at step 1 with the original receiver, but this time looking for the method_missing
method.
As you can see, when methods are stored in invisible singleton classes it’s not as straight forward to see the entire hierarchy from within Ruby code. You can, however, piece together the real picture using a handful of methods which Ruby provides for introspection purposes.
• The singleton_class
method returns the singleton class for the receiver, creating it if it doesn’t already exist.
• The ancestors
method returns an array of all the classes and modules that make up the inheritance hierarchy. It can only be called on classes or modules and skips over singleton classes.
• The included_modules
method returns the same array as ancestors
with the exception that all the classes have been filtered out.
Thankfully, you’ll rarely need to use these methods. Knowing how the inheritance hierarchy is constructed and searched internally is enough to clarify most situations. That said, Item 35 includes a few examples just in case.
• To find a method Ruby only has to search up the class hierarchy. If it doesn’t find the method it’s looking for it starts the search again, trying to find the method_missing
method.
• Including modules silently creates singleton classes which are inserted into the hierarchy above the including class.
• Singleton methods (class methods and per-object methods) are stored in singleton classes which are also inserted into the hierarchy.
Suppose you’ve written a class which inherits from a base class. This base class defines a method which, in the context of your class, doesn’t quite do everything you need it to. You’ve therefore decided to improve the method. But you don’t want to completely replace the existing method since it already performs 90% of the necessary heavy lifting. You also don’t want to change the base class because that would break other code. Ignoring the fact that composition might be a better choice than inheritance, you plow ahead and override the method. You wind up with something like this:
class Base
def m1 (x, y)
...
end
end
class Derived < Base
def m1 (x)
# How do you call Base.m1?
end
end
You’ve gotten yourself into a pickle. The big question is, how do you call the version of m1
that exists in the superclass? If you try calling m1
from the Derived
class you’ll just go into infinite recursion. That’s not going to help much. This is where super
comes in:
def m1 (x)
super(x, x)
...
end
Now we’re getting somewhere. This version of m1
uses super
to invoke its parent’s copy of m1
, the one from the Base
class. In this case super
sort of stands in for Base#m1
. You can use super
in any way that you’d use Base#m1
directly. This includes passing whichever arguments you want to the Base
copy of m1
through super
. (Just as long as the target method accepts those arguments.) This creates a subtle illusion which can bite you if you’re not careful.
Even though super
is used and acts like a method, it’s actually a language keyword. This difference is important because super
changes its behavior based on something that is supposed to be optional in Ruby, parentheses. Seriously, think about that for a second, choosing to omit parentheses when using super
changes the way it works. It’s even slightly worse than it sounds. To see how parentheses change things we’ll need to review the three ways that super
can be written:
• The first way of using super
is the least surprising, it’s the way we’ve been using it so far. If you give super
at least one argument then it acts like a regular Ruby method and the parentheses are optional. In this form, super
passes along the exact arguments it was given to the target method.
• If no arguments are given and no parentheses are used super
will act in a way that you might not expect. Used this way super
will invoke the target method with all of the arguments which were given to the enclosing method. It also forwards along the method’s block if one is associated with it.
There are a couple of side effects with this form of super
that you should watch out for. First, using super
this way only works if the target method accepts the same number of arguments as the enclosing method. As an example, this wouldn’t be the case for the Base#m1
and Derived#m1
methods we looked at earlier. Trying to use super
with no arguments in Derived#m1
would result in an ArgumentError
exception. You’ll be bitten by the same exception if the overridden method doesn’t expect any arguments but the enclosing method does.
The second thing to watch out for has to do with the value of the arguments given to the overridden method. If you mutate one of the enclosing method’s arguments before calling super
with no arguments, super
will pass their current value to the target method, not their original values. This seems pretty reasonable but something you need to be aware of.
• When you don’t actually want to pass any arguments to the overridden method you need to write super
with an empty set of parentheses, i.e. super()
. This looks especially weird in Ruby. Even those of us who prefer to use parentheses don’t do so when calling a method with no arguments. This use of super
seems unnatural but it’s the only way to call an overridden method with no arguments (and no block).
Here are the same rules from above expressed in code:
class SuperSilliness < SillyBase
def m1 (x, y)
super(1, 2) # Call with 1, 2.
super(x, y) # Call with x, y.
super x, y # Same as above.
super # Same as above.
super() # Call without arguments.
end
end
The next thing we need to concern ourselves with is how super
goes about finding the overridden method. It would appear that super
simply calls the superclass’s copy of the current method, but that’s an oversimplification. In reality super
considers the entire inheritance hierarchy and follows the rules laid out in Item 6, but with a minor change. When you use super
it searches for a method with the same name as the current method, just higher up in the inheritance hierarchy. Therefore it needs to start at the next level up and continue from there.
If you remember our study of the inheritance hierarchy you’ll know that the next level up might be a singleton class. This is great because it means that super
can be used to access overridden methods from included modules. For example:
module CoolFeatures
def feature_a
...
end
end
class Vanilla
include(CoolFeatures)
def feature_a
...
super # CoolFeatures.feature_a
end
end
This example also points out a limitation when using super
. If you want to reach a method which is defined in a superclass and an included module also defines a method with the same name, super
won’t work. Obviously, super
is going to stop at the first matching method it finds, which would be in the module, not the superclass. If you happen to run into this situation you’ve probably made a serious design error though. Consider using composition instead of inheritance.
Finally, the last thing I want to warn you about is how super
interacts with method_missing
. If you think about it, as long as you only ever use super
in a method which has the same name as another method in the inheritance hierarchy it won’t involve method_missing
. On the other hand, if you’re an especially sleepy driver or doing some sort of voodoo metaprogramming you might wind up with a call to super
which eventually invokes method_missing
. In that case you’ll be glad you read this section.
As with normal method lookup, if super
can’t find the method it’s looking for it will call method_missing
to report the error. As you’d expect, Ruby starts its search for method_missing
with the class of the original receiver. All this is fine and good, we even get a useful error message from the resulting exception:
class SuperHappy
def laugh
super
...
end
end
irb> SuperHappy.new.laugh
NoMethodError: super: no superclass method `laugh' for #<SuperHappy>
Internally, Ruby tracks whether or not method_missing
is being invoked due to a bad super
call and provides an extra bit of information along with the exception. But this only works if Ruby doesn’t find any definition of method_missing
in the inheritance hierarchy and therefore uses the default implementation. Once any class in the hierarchy defines a method_missing
method you’ll lose this useful detail. There’s no way to get it back, even if you make it a point to call super
from within your implementation of method_missing
.
Furthermore, method_missing
will be invoked with details about a method that actually exists, just in the wrong place. If the SuperHappy
class above defined a method_missing
method it would get invoked when we call laugh
because that method tries to use super
to call another copy of itself which doesn’t exist. And what’s the first argument to method_missing
? That’s right, it’s the name of the missing method which in this case would be :laugh
. But SuperHappy
does have a laugh
method, confusing huh?
If you’re counting reasons to avoid defining a method_missing
method make sure to write this one down too (along with those mentioned in Item 30). In this case the NoMethodError
exception from the default implementation of method_missing
is demonstrably better than anything we could do in our own method_missing
.
• When you override a method from the inheritance hierarchy the super
keyword can be used to call the overridden method.
• Using super
with no arguments and no parentheses is equivalent to passing it all of the arguments which were given to the enclosing method.
• If you want to use super
without passing the overridden method any arguments, you must use empty parentheses, i.e. super()
.
• Defining a method_missing
method will discard helpful information when a super
call fails. See Item 30 for alternatives to method_missing
.
Classes in Ruby don’t have traditional OOP-style constructors. If we’re interested in controlling the initial state of our objects we need to write a method named initialize
and do any necessary work there. This method is called at the right time by Class::new
just after a new object has been freshly allocated. If you don’t write your own initialize
method your class will inherit the default implementation from the BasicObject
class. It’s not all that useful though. In fact, it’s a completely empty method which doesn’t do anything at all. It’s merely there so Class::new
has something to call if you don’t define your own. Fortunately, there are plenty of times when BasicObject#initialize
will suit your needs just fine. However, when you do need to define an initialize
method there’s a slight catch to be aware of.
It’s easy to forget that initialize
is just a regular old private instance method and therefore it follows all the normal method lookup rules. For example, if you wanted you could define a reset
method which simply called initialize
to reset all instance variables to their initial states. The choice to treat initialize
as a regular method comes with a surprising consequence, however. When you write an initialize
method you also override all other definitions of initialize
which are higher up in the inheritance hierarchy. If you’re used to a language which has official constructors you might expect that all copies in the hierarchy are chained together instead of overriding one another. That’s not the case in Ruby. Consider this:
class Parent
attr_accessor(:name)
def initialize
@name = "Howard"
end
end
class Child < Parent
attr_accessor(:grade)
def initialize
@grade = 8
end
end
Looking at Parent#initialize
, it’s clear that if you create a new Parent
object the @name
instance variable will be initialized with a default value. It’s also clear from Child#initialize
that creating a new Child
object will set up a @grade
instance variable. The potentially unclear question is whether creating a Child
object would also result in @name
being set. Playing with the code in IRB clears things up.
irb> adult = Parent.new
---> #<Parent @name="Howard">
irb> youngster = Child.new
---> #<Child @grade=8>
irb> youngster.name
---> nil
Ouch. Ruby doesn’t automatically call overridden methods, not even initialize
. In this case the initialize
method in the Parent
class isn’t called when using Child::new
because the Child
class has its own initialize
method which overrides the one from Parent
. Rewriting these initialize
methods to take arguments will make this even clearer.
class Parent
def initialize (name)
@name = name
end
end
class Child < Parent
def initialize (grade)
@grade = grade
end
end
Now you can see the dilemma. Ruby doesn’t provide a way for us to specify the relationship between an initialize
method in a subclass and the one from its superclass. It therefore has no way of knowing how to automatically call initialize
in a superclass and pass the correct arguments. So it doesn’t and leaves the task for us to do manually. This can be quite surprising for new Ruby programmers and easy to forget even for those with experience.
Since Ruby doesn’t initialize parent classes for us, how do we do it ourselves? The solution comes from the fact that initialize
is just like any other overridden method. That is, we can use the general purpose super
keyword to call a method with the same name higher up in the hierarchy.
class Child < Parent
def initialize (name, grade)
super(name) # Initialize Parent.
@grade = grade
end
end
irb> youngster = Child.new("Abigail", 8)
---> #<Child @name="Abigail", @grade=8>
irb> youngster.name
---> "Abigail"
What we loose in automatic construction behavior we gain in flexibility. Using super
in the initialize
methods of subclasses gives us fine grained control over how and when we initialize superclasses. Should the superclass be initialized before the subclass? Do we need to set some instance variables before initializing superclasses? We’re free to mix and match behaviors as we see fit. Just make sure you remember to use super
and take a moment to review some of its quirks which are discussed in Item 7.
Before we wrap up here I should remind you that initialize
isn’t the only way to set up a new object. Ruby lets us create copies of objects using the dup
and clone
methods. When you use one of these methods the newly created copy is given a chance to perform any special copying logic by defining an initialize_copy
method. If you override the initialize_copy
method from a superclass you’ll definitely want to use super
to let it set itself up correctly.
• Ruby doesn’t automatically call the initialize
method in a superclass when creating objects from a subclass. Instead, normal method lookup rules apply to initialize
and only the first matching copy is invoked.
• When writing an initialize
method for a class which explicitly uses inheritance, use super
to initialize the parent class. The same rule applies when you define an initialize_copy
method.
When it comes to method naming rules, Ruby gives us a lot of freedom. While not as liberal as languages such as Lisp, Ruby does let us use one of three non-alphanumeric characters at the end of method names: “?
”, “!
”, and “=
”. Two of those characters are purely aesthetic but one of them has special internal meaning to Ruby.
As you know, tacking a question mark on the end of a method name doesn’t change anything about the method nor does it make Ruby treat it any differently. It’s merely a naming convention adopted by Ruby programmers to indicate that a method returns a Boolean value. (Or what Ruby considers to be a Boolean value. See Item 1 for details.) This naming convention isn’t enforced and methods ending in “?
” can still return whatever value they want. The exclamation point is similar, if not just a bit more vague. It usually means that the method will mutate the receiver but can also warn you about a potentially harmful action. Both are loose guidelines that Ruby programmers tend to follow. Ending a method name with an equal sign, on the other hand, is something entirely different.
Appending “=
” to the end of a method’s name turns it into a setter method and allows you to invoke that method with some nifty syntax. Typically such methods take a single argument, update some internal state, and return their argument. This means that setter methods can also be used as lvalues (the left-hand side of an assignment).
class SetMe
def initialize
@value = 0
end
def value # "Getter"
@value
end
def value= (x) # "Setter"
@value = x
end
end
irb> x = SetMe.new
---> #<SetMe @value=0>
irb> x.value = 1 # Call setter.
---> 1
irb> x
---> #<SetMe @value=1>
Even though “=
” is technically part of the method’s name, Ruby lets us place whitespace between it and the rest of the name. It looks like variable assignment but it’s actually just a plain old method call. You can see this more clearly when you add parentheses and remove the whitespace.
irb> x.value=(1)
---> 1
You may have never defined one of these setter methods by hand but most assuredly you’ve done it indirectly. That’s because Ruby has helper methods that write them for us. Both attr_writer
and attr_accessor
define setter methods which are exactly like the value=
example above. They add a little bit of indirection which is why I bring them up. As long as you remember what these helper methods do, and the advice that follows, you’ll steer clear of any trouble.
Speaking of trouble, we need to consider the ambiguity between assignment and setters. Because setter method invocation looks just like variable assignment it’s easy to confuse the two, even for experienced Ruby programmers. Consider this:
class Counter
attr_accessor(:counter)
def initialize
counter = 0
end
...
end
It’s not unreasonable to assume that the body of the initialize
method is calling the counter=
method, and it’s an assumption that a lot of people make. It’s not correct of course, it’s just a simple variable assignment. The initialize
method creates a new local variable called counter
, sets it to 0
, and then throws away a reference to it when the scope ends. Not at all what we wanted to do but obvious when you think about it.
There’s a parsing ambiguity between variable assignment and calling setter methods. Ruby resolves it by requiring that setter methods be called with an explicit receiver. In order to invoke a setter method instead of creating a variable you need to prepend a receiver to the method name. Therefore, to call the counter=
method from within an instance method you need to use self
as the receiver.
class Counter
attr_accessor(:counter)
def initialize
self.counter = 0
end
...
end
With self
used as the receiver Ruby will parse our code correctly and invoke the counter=
setter method instead of creating a new variable. You might think that you could avoid using self
if you invoked the method with no whitespace and used parentheses (i.e. counter=(0)
) but that doesn’t work either. We’re stuck with using self
as an explicit receiver. Which leads to another problem.
Programmers bitten by this parsing rule tend to overcompensate by littering their code with unnecessary self
receivers. While there’s nothing technically wrong with putting self
in front of every method call, it sure is ugly. It’s a matter of taste for sure, but one which is fairly universal among Ruby programmers. See if you can spot the unnecessary uses of self
in the following code:
class Name
attr_accessor(:first, :last)
def initialize (first, last)
self.first = first
self.last = last
end
def full
self.first + " " + self.last
end
end
Clearly the full
method is using self
redundantly. With no setter methods involved you can safely remove the use of self
and rely on some simple parsing rules. When Ruby comes across an identifier like first
or last
it checks to see if there’s a variable in the current scope with a matching name. When it doesn’t find one it tries to look up the identifier as a method. In the case of the two attributes from the Name
class, Ruby finds the getter methods defined by the earlier use of attr_accessor
. Here’s another version of the full
method which does the same thing as before but with less noise:
def full
first + " " + last
end
There should be no doubt in your mind at this point that setter methods are a special kind of animal. If you don’t want any surprises then you need to remember to call them with a receiver. But don’t let that fool you into thinking that all methods need receivers, especially when called from within instances methods. For every other type of method Ruby will automatically use self
as the receiver if the method call is missing it.
• Setter methods can only be called with an explicit receiver. Without a receiver they will be parsed as variable assignments.
• When invoking a setter method from within an instance method use self
as the receiver.
• You don’t need to use an explicit receiver when calling non-setter methods. In other words, don’t litter self
all over your code.
Hash tables are very useful, general-purpose data structures which are employed heavily by Ruby programmers. The Hash
class provides a simple interface for working with hash tables and is such a deep part of Ruby that, like arrays, it has its own dedicated syntax for creating new instances. When it comes to working with key-value pairs, Hash
is definitely the go to class.
In fact, Ruby programmers use hashes all the time. Even method keyword arguments are implemented with the Hash
class and a pinch of syntactic sugar. They’re so general that hashes can be used to emulate many other types such as arrays, sets, and even basic objects. When working with structured data in an OOP language we often have better choices than hashes, and Ruby is no exception. Let’s look at a typical use of the Hash
class and then consider replacing it with something more appropriate.
Say you’re interested in exploring annual weather data from a local weather station. Armed with a CSV file from the National Oceanic and Atmospheric Administration (NOAA) you plan to load the data into an array and play around with it. Each row in the CSV file contains temperature statistics for a single month. You’ve decided to extract the interesting columns from each row and store them in a Hash
. Consider this:
require('csv')
class AnnualWeather
def initialize (file_name)
@readings = []
CSV.foreach(file_name, headers: true) do |row|
@readings << {
:date => Date.parse(row[2]),
:high => row[10].to_f,
:low => row[11].to_f,
}
end
end
end
There’s nothing out of the ordinary here. Each row read from the CSV file is transformed into a hash which is then inserted into an array. After the initialize
method is complete you’ll have an array of uniform hashes, that is, all of the hashes will have the same keys but different values. Essentially this array of hashes represents a collection of objects, except that you can’t access their attributes though getter methods. You have to use the hash index operator instead. This might be a minor issue but it’s one which will have an impact on the AnnualWeather
class’ interface.
You shouldn’t allow this array of hashes to leak through the public interface. That’s because the keys for each hash are an internal implementation detail. Without class-level documentation, other programmers would have to read the entire definition of the initialize
method in order to know which keys are populated with which CSV columns. It might not be much of a burden when initialize
is the only method setting keys but as the class matures this might not be the case anymore. This is a weakness when using hashes to emulate objects and one which isn’t limited to the public interface.
Every time you want to work with these hashes internally you will need to go back to the initialize
method to remind yourself which keys are available. Again, as long as the keys are set in a single method the burden is fairly low. Let’s consider a situation where you’re tempted to create a new key. Each month in the @readings
array has high and low temperatures. You’d like to know the mean temperature of the year which means you’ll also need to know the mean temperature for each month.
def mean
return 0.0 if @readings.size.zero?
total = @readings.reduce(0.0) do |sum, reading|
sum + (reading[:high] + reading[:low]) / 2.0
end
total / @readings.size.to_f
end
Calculating the mean temperature for each month is pretty simple. Even so, it would be better if each object in the @readings
array responded to a mean
method so you could abstract away the logic. It’s possible to shoehorn such a method onto each of the hashes but at that point it should be clear that you’ve pushed the limits of this design. (We’re not working with JavaScript after all.)
Using hashes to stand in for really simple classes tends to happen a lot in Ruby. Sometimes it’s completely fine but more often than not we really should be creating dedicated types for these sort of objects. If you’re lazy like me, the thought of creating a new class for something so simple seems like an unnecessary chore. Fortunately, that’s exactly what the Struct
class is for.
On the surface, using the Struct
class is a lot like creating a new struct
type in C++. However, if you dig deeper you’ll see that it’s actually more similar to a class generator than a data structure. Using Struct
is simple and only requires a call to the Struct::new
method with a list of attributes. The return value from this method is virtually identical to a new class which has getter and setter methods for each of the attributes. This new class also has an initialize
method which accepts initial values for each of the attributes and sets them accordingly.
To replace the hashes we’ve been using so far we only need to make some minor changes to the initialize
method.
class AnnualWeather
# Create a new struct to hold reading data.
Reading = Struct.new(:date, :high, :low)
def initialize (file_name)
@readings = []
CSV.foreach(file_name, headers: true) do |row|
@readings << Reading.new(Date.parse(row[2]),
row[10].to_f,
row[11].to_f)
end
end
end
As you can see, it’s common practice to assign the return value from the Struct::new
method to a constant. This allows you to treat the constant like a class and create objects from it. This single line of code also makes it clear which methods you can expect objects of this new class to respond to. That’s already a big improvement over the individual hashes. Let’s see how this change affects the mean
method:
def mean
return 0.0 if @readings.size.zero?
total = @readings.reduce(0.0) do |sum, reading|
sum + (reading.high + reading.low) / 2.0
end
total / @readings.size.to_f
end
The existing mean
method required few changes but now has a much better OOP feel to it. Accessing the high and low attributes through getter methods has a subtle side effect too. Typos in attribute names will now raise a NoMethodError
exception. The same isn’t the case when using hashes. Attempting to access an invalid key in a hash doesn’t raise an exception but instead returns nil
. This usually means that you’ll wind up with a more obscure TypeError
exception later in the code. Another improvement with Struct
is that we can now do something which we wanted to do earlier, define a mean
method for each month.
The Struct
class is more powerful than it might first appear. In addition to a list of attributes, the Struct::new
method can optionally take a block which is then evaluated in the context of the new class. That’s a mouthful but it means we can define instance and class methods inside the block. For example, here’s how you’d define the mean
method:
Reading = Struct.new(:date, :high, :low) do
def mean
(high + low) / 2.0
end
end
For those times when it seems too heavy-handed to create a new class, Struct
can be very useful. And unlike a bunch of unrelated yet uniform hashes, Struct
lets you define instance and class methods. That’s perfect for when you need to add a few simple behaviors to these otherwise boring objects. They’re also properly typed with their own public interface making them more suitable for users of the AnnualWeather
class. The reservations I had before about exposing an array of hashes through the AnnualWeather
interface has been resolved, opening the way to an attr_reader
for the @readings
array. I’d say that’s an all around improvement.
• When dealing with structured data which doesn’t quite justify a new class prefer using Struct
to Hash
.
• Assign the return value of Struct::new
to a constant and treat that constant like a class.
Imagine you’re working on an application for ordering custom notebooks (the old fashion paper kind). Customers can choose among many binding styles such as metal spirals or the more traditional glue. You decide to create a class to represent binding styles and all the options that go along with them. Unfortunately, things don’t seem to be working as planned. What’s wrong with the following class definition?
class Binding
def initialize (style, options={})
@style = style
@options = options
end
end
At first glance everything seems to be correct. There are no syntax errors, yet if you were to play with this class in IRB you’d see that something is really wrong. While the code above appears to create a new class, that’s not actually what happens if you execute it. “If it doesn’t define a class what does this code do?” That’s a mighty fine question.
As you know, classes in Ruby are mutable, you can add and replace methods at any time. The syntax for defining a class is the same syntax which is used to modify one. In this case it just so happens that the class name you wanted to use has already been taken. Binding
is a class which is part of the Ruby core library. Instead of defining a new class the code above reopens and modifies an existing class. Not exactly what you had in mind.
Any nontrivial application is going to run into this problem sooner or later. It’s an even bigger problem for library authors. What happens if more than one library defines the same class? How can you use both of the libraries at the same time? Thankfully, nearly every modern general purpose programming language has a solution to this problem, including Ruby. The way we isolate our names from one another is done with a process called namespacing.
Namespaces are a way to qualify a constant so it becomes unique. At the most basic level namespaces let you create new scopes where you can define constants that would otherwise conflict with one another. Since class and module names are both constants, namespaces can be used to isolate them as well.
All of the core Ruby classes are said to be in the “global” namespace. We’ll see what this actually means a bit later, but for now it’s enough to say that each of these class names can be used without any sort of qualification. In other words, if you fire up IRB and type Array
, you mean “the Array
class in the global namespace.” When you define a class without specifying a namespace you are placing that class into the global namespace. At the same time you’re running the risk of clashing with an existing name.
Creating a new class and placing it into a custom namespace is as easy as nesting the class definition inside a module:
module Notebooks
class Binding
...
end
end
Nesting the definition inside a module makes the Binding
class distinct from, and avoids any confusion with, the core class of the same name. To reference this new class you need to include the module’s name along with the “class path separator” which is two colons:
style = Notebooks::Binding.new
Using modules to create namespaces isn’t limited to protecting classes, this technique can be used to place other constants and module functions inside a namespace as well. You can also nest modules within other modules to create arbitrarily deep namespaces. This is helpful in large applications and libraries as a way to keep code compartmentalized and modular.
When using namespaces it’s common practice is to mirror the namespace within the directory structure of the project on the file system. For example, under this scheme the class above would be found in the bindings.rb
file located in a notebooks
directory. In other words, the Notebooks::Bindings
name maps to the notebooks/bindings.rb
file.
Sometimes nesting classes in modules is cumbersome and causes unnecessary indentation. There’s an alternative syntax for creating classes inside a namespace. It only works if the namespace already exists, that is, the module which creates the namespace was previously defined. In this form you use the module name and class path separator directly in the class definition:
class Notebooks::Binding
...
end
Typically you’d use this style of class definition by first defining the namespacing module in a main source file and then load all remaining source files. But watch out, attempting to use this syntax without first defining the namespace will result in a NameError
exception. This is the exception Ruby raises if it can’t find a referenced constant. Nesting constants inside one another adds some complexity to how you qualify the constants in your program but understanding how Ruby searches for them will clear things up.
Ruby uses two techniques for finding constants. First it examines the current lexical scope and all of the enclosing lexical scopes. (We’ll explore lexical scopes shortly.) If the constant can’t be found there Ruby searches the inheritance hierarchy. This is why you can use a constant defined in a superclass from within a subclass. As we’ll see later, this is also why we can use the so called “global” constants.
When dealing with namespaces we’re mostly concerned with lexical scopes, the actual nested locations where a constant is defined or referenced. They’re easier to see than read about so consider the following:
module SuperDumbCrypto
KEY = "password123"
class Encrypt
def initialize (key=KEY)
...
end
end
end
The module definition creates a lexical scope. Since the KEY
constant and the Encrypt
class are both defined in the same lexical scope the initialize
method can use the KEY
constant without qualifying it (i.e. giving the full class path to the constant). It’s important to understand that the lexical scope is different than the namespace created by the module. It has to do with the physical location where the constants are defined and used. If we change things up a bit you can see the difference:
module SuperDumbCrypto
KEY = "password123"
end
class SuperDumbCrypto::Encrypt
def initialize (key=KEY) # raises NameError
...
end
end
This time a namespace and a lexical scope are created with the module definition, but immediately after creating the KEY
constant both are closed. The class is defined in the correct namespace but it doesn’t share the same lexical scope as the KEY
constant and therefore can’t use it unqualified. This code raises a NameError
exception because Ruby can’t find the KEY
constant in the lexical scope or through the inheritance hierarchy. The fix is simple, just qualify the constant:
class SuperDumbCrypto::Encrypt
def initialize (key=SuperDumbCrypto::KEY)
...
end
end
It might seem a bit weird but that’s how it works my friend. Even stranger still, now that the constant is fully qualified it’s actually found through the inheritance hierarchy and not the lexical scope. The SuperDumbCrypto
constant can be considered a global constant, but as it turns out, Ruby doesn’t actually have a global namespace. Instead all top-level constants are stored inside the Object
class. Since almost everything in Ruby inherits from Object
it can find top-level constants through the inherits hierarchy. And that explains why Ruby looks in two different places: the current lexical scope and the inheritance hierarch.
There’s one last way you can get stuck when using namespaces. Consider this:
module Cluster
class Array
def initialize (n)
@disks = Array.new(n) {|i| "disk#{i}"}
# Oops, wrong Array! SystemStackError!
end
end
end
The code above defines a Cluster::Array
class which needs to use the top-level Array
class. The lookup rules for constants say, that in this context, an unqualified Array
constant means Cluster::Array
, which isn’t what we want. The solution is to fully qualify the Array
constant. Since it’s a top-level constant and we know those are kept in the Object
class, the fully qualified constant name would be Object::Array
. This looks a bit strange, so Ruby lets us abbreviate at it as ::Array
. The following code works as expected:
module Cluster
class Array
def initialize (n)
@disks = ::Array.new(n) {|i| "disk#{i}"}
end
end
end
While namespacing adds a bit of complexity—particularly for unqualified constants—the feature is well worth the cost. Any nontrivial Ruby project is bound to be using them, especially libraries which are packaged as a Ruby Gem. Such libraries are expected to place all their constants under a namespace which matches the library name. Place nice with others, create and use namespaces.
• Use modules to create namespaces by nesting definitions inside them.
• Mirror your namespace structure and your directory structure.
• Use “::
” to fully qualify a top-level constant when its use would be ambiguous (e.g. ::Array
).
Take a look at the following IRB session and ask yourself: Why does the equal?
method return a different answer than the “==
” operator?
irb> "foo" == "foo"
---> true
irb> "foo".equal?("foo")
---> false
As it turns out, there are actually four different ways in Ruby to check if objects are equal to one another. Sometimes the various methods overlap and compare the same way, but as you can see from above, that’s not always the case. Depending on the comparison method and the objects involved, you can get surprising results. They won’t be surprising much longer, however.
It might seem like having four different ways to compare objects for equality is a bit of an overkill. For most objects that’s certainly true and all four methods end up doing the same thing. But some give us several flavors of equality which often have subtle (and not so subtle) differences. For example, the various classes which represent numbers silently convert to one another when compared using the “==
” operator:
irb> 1 == 1.0
---> true
Each time you define a class you inherit the four different equality testing methods. Understanding how they’re expected to work is important if you want to override any of them. This is especially useful if you plan on implementing the full range of comparison operators as discussed in Item 13. We’ll start with an easy one, the method that you’re supposed to leave alone.
It may have surprised you in the example above when the equal?
method didn’t return the same result as the “==
” operator. Obviously they’re testing two different things. I would venture so far as to say that equal?
is misnamed. Instead of comparing two objects based on their contents or values it’s actually checking to see if the two objects are the exact same object. That is, do they both have the same object_id
. (Internally, the implementation of equal?
checks to see if the two objects are both pointers to the same chuck of memory.)
Even though the two strings above have identical contents, they’re not the exact same object. They’re two different objects that just happen to have the same characters. Each time Ruby evaluates a string literal it allocates a brand new String
object, even if the same exact string already exists somewhere else in memory. Of course, that’s what you want to happen. You wouldn’t want to mutate a string only to find out that you’ve accidentally mutated every other string in your program which has the same contents. (If you’re left wishing for copy-on-write strings you’re not alone. Ruby 2.1, however, has immutable string literals which share memory with other, identical strings. See Item 47 for details.)
An important point about the equal?
method is that its behavior should be the same for all objects. That is, you shouldn’t override this method and give it a different implementation. Several existing classes have come to rely on how this method works and changing it is sure to break things in strange ways. If you want to compare objects then using equal?
probably isn’t the method you want to use anyways. More often than not you’re interested in the “==
” operator.
When comparing objects, “==
” does what you think it’s going to do, and sometimes pleasantly surprises you. While each class is free to redefine “==
”, the agreed upon behavior is that it will return true
if two objects represent the same value. That explains the earlier string comparison and the test between 1
and 1.0
. (Which, by the way, are represented by two different classes, Fixnum
and Float
.) This is probably more inline with what you would consider to be a comparison of equality.
If you don’t write your own implementation of “==
” you’ll inherit the default implementation which does same thing as the equal?
method. This probably isn’t very useful behavior. Obviously some objects should be considered equal even if they’re not stored in the same memory location. Instead, you want the “==
” operator to consider the contents of the objects being compared. Maybe it’s as simple as comparing object attributes, record IDs, or delegating the comparison to another object. Either way, the “==
” operator should be smarter than the equal?
method. But resist the temptation to define “==
” directly.
Item 13 explains how you can get “==
” (and other operators) for free by defining the “<=>
” operator and mixing in the Comparable
module. If you’re interested in defining ordering operators like “>
” you should definitely consider the advice in that item.
The next method on our equality tour is ambiguously named and a bit obscure. Nevertheless, it’s quite important. The eql?
method is heavily used by the Hash
class when comparing objects used as keys. You don’t want to insert the same key in a hash more than once and defining a reasonable eql?
method gets you halfway there. We’ll look at the other half shortly.
By default eql?
does the same comparison as equal?
. That’s probably a bit stricter than what you want. If you define a class and use instances of that class as hash keys you’ll be in for a surprise if you don’t override the default implementation. Since the Hash
class will be strictly comparing keys based on their object_id
you’re likely to wind up with big hashes that don’t behave the way you’d expect them to. For example, consider the Color
class below.
class Color
def initialize (name)
@name = name
end
end
Look what happens if we create two instances of that class with the exact same values, then attempt to use them as hash keys:
irb> a = Color.new("pink")
---> #<Color @name="pink">
irb> b = Color.new("pink")
---> #<Color @name="pink">
irb> {a => "like", b => "love"}
---> {#<Color @name="pink">=>"like", #<Color @name="pink">=>"love"}
Without knowing about the eql?
method you’d probably expect the final hash to only have one key, not two. In order to get the hash to collapse both keys into one we need to define an eql?
method along with another important method, hash
. When objects are used as keys the Hash
class needs to compute where in the data structure the object will reside. It does this by calling the hash
method on the key object. Two objects which represent the same key should always return the same value from the hash
method. But it’s also okay if dissimilar objects return the same hash
value. That is, they don’t have to be unique. When this happens it’s called a collision and is resolved by further comparing the key objects with eql?
. Objects which have the same hash
value and result in true
when compared with eql?
are consider to be the same key.
Therefore, if we want two similar Color
objects to represent the same hash key we need to implement the hash
method and the eql?
method. For such a simple class we’ll just manually delegate both to the @name
attribute:
class Color
attr_reader(:name)
def initialize (name)
@name = name
end
def hash
name.hash
end
def eql? (other)
name.eql?(other.name)
end
end
irb> a = Color.new("pink")
---> #<Color @name="pink">
irb> b = Color.new("pink")
---> #<Color @name="pink">
irb> {a => "like", b => "love"}
---> {#<Color @name="pink">=>"love"}
In most situations you probably want to implement the “<=>
” operator as described in Item 13, then simply make eql?
an alias for “==
”. “Why even bother having an eql?
method if it’s just going to be an alias for the equality operator?” That’s a fair question and let me answer it by reminding you of the type conversion done by the “==
” operator in the numeric classes. Look again at how “==
” and eql?
differ for them.
irb> 3 == 3.0
---> true
irb> 3.eql?(3.0)
---> false
irb> {3 => "I'm Three!"}[3.0]
---> nil
It might be fine to consider 1
equal to 1.0
when using the “==
” operator but not when using the eql?
method, and hence we have both. When defining your own classes you’ll have to decide how objects compare with one another when used as hash keys. If you choose to implement a sloppy version of the “==
” operator you’ll need to define a stricter version of the eql?
method. Otherwise alias eql?
to “==
”. I said it earlier but it’s worth repeating, be aware that if don’t define eql?
you’ll pick up the default implementation which uses the same logic as the equal?
method.
Finally, let’s take a look at an operator that you use all the time, maybe without even realizing it, the case equality operator. This operator is written with three equal signs (“===
”) but it’s most often used indirectly, through the case
expression. See for yourself:
case command
when "start" then start
when "stop", "quit" then stop
when /^cds+(.+)$/ then cd($1)
when Numeric then timer(command)
else raise(InvalidCommandError, command)
end
There are two different ways to use a case
expression. The one we’re concerned with is chosen by Ruby when you supply an expression to the case
keyword (the command
variable above). This flavor of case
uses the “===
” operator to compare the given expression to each of the when
clauses. It’s easier to see what’s going on if you remove the syntactic sugar and reveal the underlying if
expression:
if "start" === command then start
elsif "stop" === command then stop
elsif "quit" === command then stop
elsif /^cds+(.+)$/ === command then cd($1)
elsif Numeric === command then timer(command)
else raise(InvalidCommandError, command)
end
Notice that the expression given to the case
keyword always ends up as the right operand to “===
”. That’s important. In Ruby the left operand becomes the receiver of the message and the right operand is the sole argument to the method call. It also means that operators in Ruby are not commutative because their behavior is decided by the left operand. Combining these two facts means you can rewrite the use of any operator as a normal method call:
irb> 1 + 2
---> 3
irb> 1.+(2)
---> 3
What does this have to do with the case equality operator? Well for one thing it’s important to know which object will wind up being the receiver. Knowing that the expression given to a when
clause becomes the left operand to the “===
” operator—and thus the receiver—tells you which implementation of the operator you’re dealing with. Not surprisingly, Ruby core classes have very useful variants of the case equality operator.
The default implementation of “===
” is rather boring and passes its operands on to “==
”. This is the version you’ll inherit in your classes. Things get interesting, however, when you look at classes like Regexp
. The Regexp
class defines a “===
” operator which returns true
if its argument is a string which matches the receiver (the regular expression). You have to use the regular expression as the left operand otherwise you won’t be using the “===
” implementation from the Regexp
class.
irb> /er/ === "Tyler" # Regexp#===
---> true
irb> "Tyler" === /er/ # String#===
---> false
Class and modules also get into the act with class method versions of the “===
” operator. They all share a common implementation which returns true
if the right operand is an instance of the left operand. It’s basically an operator version of the is_a?
method with the receiver and argument reversed. This gives us the ability to use class and module names as arguments to the when
clause. It’s good to know how this works if you strip away the case
syntax. Consider the similarities between is_a?
and “===
”:
irb> [1,2,3].is_a?(Array)
---> true
irb> [1,2,3] === Array # Array#===
---> false
irb> Array === [1,2,3] # Array::===
---> true
It’s not very likely that you’ll need to define the “===
” operator directly since the default implementation from Object
is sufficient and can be found through inheritance. Still, if you want your objects or classes to have special behavior when used as an argument to the when
clause in a case
expression you’ll know which operator to override. Just remember which operand becomes the receiver and which the argument.
• Never override the equal?
method. It’s expected to strictly compare objects and return true
only if they’re both pointers to the same object in memory (i.e. they both have the same object_id
).
• The Hash
class uses the eql?
method to compare objects used as keys. The default implementation probably doesn’t do what you want. Follow the advice in Item 13 and then alias eql?
to “==
” and then write a sensible hash
method.
• Use the “==
” operator to test if two objects represent the same value. Some classes like those representing numbers have a sloppy equality operator which performs type conversion.
• case
expressions use the “===
” operator to test each when
clause. The left operand is the argument given to when
and the right operand is the argument given to case
.
Item 12 discusses the four ways in which objects can be tested for equality. If you’re also interested in sorting and comparing objects then you’ll need to go one step further and define the remaining comparison operators. Unlike with the equality operators, classes don’t inherit default implementations of the other comparison operators. Fortunately, Ruby gives us a shortcut, one we’ll dig into shortly.
First, let’s make this interesting and implement a class which has unusual ordering. As programmers we’re so used to software version numbers that the strange notation doesn’t seem to bother us much. But to the uninitiated it’s unusual indeed. How does one compare “10.10.3” to “10.9.8”? The answer is obvious to us, yet if we compared them using lexicographical ordering these version numbers would come out wrong. To get it right you need to consider each component separately. That’s exactly what we’ll do in our Version
class.
To keep things reasonable let’s work with version numbers that only have three parts (just like those above). The first part is the major version number, then the minor version number, and finally the patch level. We’ll also only deal with well-formed version numbers so we can focus on comparing them. That makes it easy to parse a version string into its individual components.
class Version
attr_reader(:major, :minor, :patch)
def initialize (version)
@major, @minor, @patch =
version.split('.').map(&:to_i)
end
end
An important point that I want to repeat is that classes don’t automatically inherit comparison operators, with one exception. We’ll see later that you only really need to define one comparison operator, namely “<=>
”. This particular operator is in fact inherited from Object
, but the implementation we get for free is incomplete. See what happens if we try to sort an array of Version
objects:
irb> vs = %w(1.0.0 1.11.1 1.9.0).map {|v| Version.new(v)}
---> [ #<Version @major=1, @minor=0, @patch=0>,
#<Version @major=1, @minor=11, @patch=1>,
#<Version @major=1, @minor=9, @patch=0> ]
irb> vs.sort
ArgumentError: comparison of Version with Version failed
That’s not very helpful. The default implementation of the “<=>
” comparison operator is to blame here. It only considers whether two objects are equal to one another (using equal?
and “===
”) and doesn’t do the full range of test which our implementation will be required to do. If the two objects being compared aren’t equal to one another it returns nil
, which signals to the sort
method that the comparison is invalid. That’s okay, you can’t expect much from a general purpose implementation. Let’s write our own.
Implementing the full range of comparison operators is done in two steps. The hardest part is writing a sensible “<=>
” operator (informally referred to as the “spaceship” operator). Remember that in Ruby binary operators turn into methods calls where the left operand is the receiver and the right operand is the method’s first and only argument. When writing a comparison operator it’s common practice to name the argument “other” since it will be the other object you’re comparing the receiver with.
The “<=>
” operator can stand in for the full range of comparison operators due to the flexibility of its return value. It can return one of four values:
• When it doesn’t make sense to compare the receiver with the argument then the comparison operator should return nil
. It’s possible that the argument is an instance of another class or even nil
. For some classes it might be useful to convert the argument to the correct type before doing the comparison, but more often it’s best to return nil
if the receiver and argument are not instances of the same class. That’s what we’ll do with the Version
class.
• If the receiver is less than the argument, return -1
. In other words, if comparing the left and right operands with “<
” should return true
then you want “<=>
” to return negative one.
• If the receiver is greater than the argument, return 1
. This comparison is for the “>
” operator. If comparing the left and right operands with “>
” should return true
then make sure “<=>
” returns positive one.
• If the receiver is equal to the argument, return 0
. The “==
” operator will return true
only when “<=>
” returns zero.
We want the comparison operator for the Version
class to work the way it does in the numeric classes. As a matter of fact, we’ll be able to implement our version in terms of the numeric one. As you can see, it follows the rules from above:
irb> 9 <=> "9"
---> nil
irb> 9 <=> 10
---> -1
irb> 10 <=> 9
---> 1
irb> 10 <=> 10
---> 0
When writing the “<=>
” operator it’s often possible to delegate the comparison to an object’s instance variables. The three variables in the Version
class are all instances of Fixnum
which has a working implementation of the “<=>
” operator. This greatly simplifies our work. To compare version numbers we need to consider the instance variables in the receiver (left operand) and those from the argument (right operand) in order from major number to patch level. We can stop comparing as soon as one of the variables from the receiver doesn’t equal the corresponding variable from the argument. In other words, if the two versions we’re comparing have different major numbers we don’t need to compare the minor numbers or patch levels to know which is greater than the other. If both major numbers are the same, however, we’ll need to repeat the comparison with the minor numbers, and so on to the patch levels. When all components are equal to one another our comparison operator should return 0
to indicate the equality of both version objects. Otherwise we just need to return the result of using “<=>
” for the first pair of variables which don’t match. Consider the Version
implementation of the comparison operator:
def <=> (other)
return nil unless other.is_a?(Version)
[ major <=> other.major,
minor <=> other.minor,
patch <=> other.patch,
].detect {|n| !n.zero?} || 0
end
Each component is compared separately and the results are stored in an array. All we need to do then is find the first non-zero element in the array (the first pair of components which aren’t equal). If all components are equal to one another then detect
will return nil
, in which case the method will return 0
. Now we can sort an array of Version
objects.
irb> vs = %w(1.0.0 1.11.1 1.9.0).map {|v| Version.new(v)}
---> [ #<Version @major=1, @minor=0, @patch=0>,
#<Version @major=1, @minor=11, @patch=1>,
#<Version @major=1, @minor=9, @patch=0> ]
irb> vs.sort
---> [ #<Version @major=1, @minor=0, @patch=0>,
#<Version @major=1, @minor=9, @patch=0>,
#<Version @major=1, @minor=11, @patch=1> ]
There’s one more piece we’ll need add in order for Version
objects to be fully comparable. Beyond just sorting, we want to be able to use operators such as “>
” and “>=
” with these objects. As a matter of fact, there are actually five other operators which make up the full set of ordering operators, they are: “<
”, “<=
”, “==
”, “>
”, and “>=
”. You’ll be happy to hear that we don’t have to implement them by hand. Instead, all we need to do is include the Comparable
module.
class Version
include(Comparable)
...
end
That’s it. Now we can use all of the ordering operators and even one additional helper method:
irb> a = Version.new('2.1.1')
---> #<Version @major=2, @minor=1, @patch=1>
irb> b = Version.new('2.10.3')
---> #<Version @major=2, @minor=10, @patch=3>
irb> [a > b, a >= b, a < b, a <= b, a == b]
---> [false, false, true, true, false]
irb> Version.new('2.8.0').between?(a, b)
---> true
There are few final considerations when using the Comparable
module. First, for some classes you might want to implement your own copy of the “==
” operator so it’s more fuzzy than the one from Comparable
. A good example was presented in Item 12 which showed that the numeric classes perform type conversions before making comparisons. If you want similar behavior you’ll have to write your own equality operator or change the conditions that cause “<=>
” to return 0
. Which option you choose depends on how you want the other comparison operators to be affected.
If you want “>=
”, “<=
”, and “==
” to return consistent answers you should change the way “<=>
” calculates equality. If they don’t need to be consistent with one another you can simply override “==
” and make it less strict than the other comparison operators. For most classes, however, you’ll probably want to make all the operators consistent with one another.
Finally, if you want instances of your class to be usable as hash keys you’ll need to do two more things. The eql?
method should become an alias for “==
”. The default implementation of eql?
does the same thing as equal?
which doesn’t make sense if you have a “<=>
” operator defined for your class. The alias will make the Hash
class use the “==
” operator defined by the Comparable
module.
You’ll also need to define a hash
method which returns a Fixnum
. To get the best performance from the Hash
class you should ensure that different objects return different hash
values. Below is a simple example implementation for the Version
class. Writing an optimized version of hash
is outside the scope of this book.
class Version
...
alias_method(:eql?, :==)
def hash
[major, minor, patch].hash
end
end
• Implement object ordering by defining a “<=>
” operator and including the Comparable
module.
• The “<=>
” operator should return nil
if the left operand can’t be compared with the right.
• If you implement “<=>
” for a class you should consider aliasing eql?
to “==
”, especially if you want instances to be usable as hash keys. In which case you should also override the hash
method.
One of the major tenets of OOP is encapsulation, the idea that an object’s internal implementation will be just that, internal and hidden from the outside. This allows us to build a firewall. On one side we have the public interface which we advertise for others to use. Hidden on the other side is the private implementation which we’re free to change without fear of breaking anything external to the class. In Ruby this barrier is more theoretical than concrete, but it’s there nonetheless.
Take instance variables for example, they’re private by default. Without resorting to any backdoor metaprogramming they’re only accessible from within instance methods. If you want to expose them to the outside world you need to define so-called accessor methods. This is good news, you can safely store internal state in instance variables without worrying that it will accidentally become part of the public interface. Most of the time this is all fine and good but there’s one situation where this sort of encapsulation gets in the way. Sometimes one object needs to access the internal state of another. Consider this:
class Widget
def overlapping? (other)
x1, y1 = @screen_x, @screen_y
x2, y2 = other.instance_eval {[@screen_x, @screen_y]}
...
end
end
The Widget#overlapping?
method checks to see if the receiver and another object overlap one another on the screen. The public interface for the Widget
class doesn’t expose these screen coordinates, they’re an internal implementation detail. It would seem that our only alternative is to break the wall of encapsulation using some metaprogramming. This approach has drawbacks, however.
Usually, when designing classes with internal state, it’s a good idea to reduce the amount of code which accesses the state directly to the smallest number of methods possible. This makes it easier to test and reduces the amount of breakage when things change. Hiding state behind private methods also facilitates optimizations such as memoization (see Item 48). It’s better for maintainability if the Widget
class keeps the screen coordinates private, even to the overlapping?
method. This might tempt you into writing a private method to act as a gateway for the screen coordinates, but that won’t work in this situation.
Private methods behave differently in Ruby than they do in other OOP languages. Ruby only places one restriction on private methods, they can’t be called with an explicit receiver. It doesn’t matter where you are in the inheritance hierarchy, as long as you don’t use a receiver you can call private methods defined in superclasses. What you can’t do, however, is call private methods on another object, even when the caller and receiver have the same class. If you make the screen coordinates accessible through a private method you still won’t be able to use them in the overlapping?
method. Thankfully Ruby has an answer to this, protected methods.
Where private methods can’t be used with a receiver, protected methods can. There’s just one catch though, and it’s a slightly complicated one at that. The method being invoked on the receiver must also be in the inheritance hierarchy of the caller. As long as both the caller and receiver both share the same instance method through inheritance then the caller can invoke that method on the receiver. The caller and receiver don’t necessarily have to be instances of the same class, but they both have to share the superclass where the method is defined.
For the Widget
class, we can define a method which exposes the private screen coordinates without making them part of the public interface. By declaring this method as protected we maintain encapsulation while allowing the overlapping?
method to access both its own coordinates and those of the other widget. No metaprogramming black magic required.
class Widget
def overlapping? (other)
x1, y1 = screen_coordinates
x2, y2 = other.screen_coordinates
...
end
protected
def screen_coordinates
[@screen_x, @screen_y]
end
end
This is exactly what protected methods were designed for, sharing private information between related classes.
• Share private state through protected methods.
• Protected methods can only be called with an explicit receiver from objects of the same class or when they inherit the protected methods from a common superclass.
Ruby has two types of “at” variables, instance variables and class variables. Every object in a running Ruby application has its own private set of instance variables, you know, the ones whose names begin with “@
”. Setting an instance variable in one object doesn’t affect variables in another. (Mutating a variable is a different story though, two objects sharing a reference to a third can certainly affect one another. See Item 16 for more detail.)
Class variables (those which begin with “@@
”) are handled differently. Instead of being associated with a single object, they’re attached to a class and visible to all instances of that class. Setting a class variable in one object is immediately visible to another. Of course, this isn’t surprising, most object-oriented languages have these class-wide shared variables. What might be surprising, however, is how class variables and inheritance interact.
A common use for class variables—but certainly not the only use—is implementing the singleton pattern (not to be confused with Ruby singleton classes). Sometimes in larger applications you only want a single instance of a particular class and you want all of the code to access the same instance. Take a configuration class for example. You’d like to parse your configuration file once and then share it with the entire application, but without having to pass it around as an argument to every method that might need it. Basically, the singleton pattern is a glorified global variable.
Setting aside concurrency for a moment, let’s write a class which implements the singleton pattern and from which we can derive other classes:
class Singleton
private_class_method(:new, :dup, :clone)
def self.instance
@@single ||= new
end
end
By their very definition singleton classes shouldn’t have more than one instance, so it’s pretty typical to make the new
method private. This prevents anyone from creating a new object from the outside. We do the same thing with dup
and clone
to ensure copies of the one true instance can’t be made. The only way to access this official object is through the instance
class method. Its job is to make sure new
is only called once and it does this by storing the singleton object in a class variable named @@single
. If that variable is nil
then the instance
method will call new
and assign the result to @@single
. Otherwise it simply returns the value of @@single
. Let’s create two classes which inherit from the Singleton
class so that they can use the same logic:
class Configuration < Singleton
...
end
class Database < Singleton
...
end
So far so good. An application using these classes now has a way to access a single configuration object and a single database connection from anywhere in the code. Now, I wouldn’t be writing this if there wasn’t a problem, so let’s see what happens when we actually try to use these classes:
irb> Configuration.instance
---> #<Configuration>
irb> Database.instance
---> #<Configuration>
Oops. What happened? Obviously both classes are sharing the same class variable. When Configuration::instance
is called the @@single
variable is nil
and therefore new
is called and the resulting objected is cached. When Database::instance
is called @@single
already has a value (the configuration object) and that’s returned instead of calling new
. The fix is easy but requires a little explanation.
Class variables in a superclass are shared between it and all of its subclasses. To make things a little messier, any instance of one of these classes can access the shared class variables and modify them. Instance methods and class methods all share the same class variables, making them practically as undesirable as global variables. Instances should therefore stick to working with instance variables. And therein lies the solution. Classes are objects, so they have instance variables too. Every class is a unique object and has its own private set of instance variables which can only be accessed from its class methods. Changing the @@single
variable from a class variable to a class instance variable fixes the problem:
def self.instance
@single ||= new
end
irb> Configuration.instance
---> #<Configuration>
irb> Database.instance
---> #<Database>
If you haven’t seen instance variables inside a class method before it can be a bit strange. It’s also easy to confuse them with instance variables from one of their objects. But they do in fact belong to the class itself. This is made clearer if you remember that class methods are really an illusion. Because classes are objects, what we call “class methods” are actually instance methods for the class object. Class methods exist in name only.
Besides breaking the sharing relationship between a class and its subclasses, class instance variables also provide more encapsulation. While class variables (“@@
”) can be accessed and modified directly from any instance or class method, class instance variables (“@
”) are only accessible at the class definition level or from inside class methods. Therefore, instances of the class and other external code can only access these variables if you provide class methods for that purpose (e.g. Singleton::instance
). You get to protect them just like any other instance variable, exposing access to them as necessary.
Oh, by the way, earlier I asked that we set aside any issues relating to concurrency. That’s because class variables and class instance variables have all of the same problems that global variables do. If your application has multiple threads of control then altering any of these variables without using a mutex isn’t safe. Thankfully, the Singleton
module which is part of the Ruby standard library correctly implements the singleton pattern in a thread-safe way. Using it is simple:
require('singleton')
class Configuration
include(Singleton)
end
• Prefer class instance variables to class variables.
• Classes are objects and therefore have their own private set of instance variables.