When developing applications we often have methods that run slowly. Perhaps they need to query the database, or hit an external service, both of which can cause them to slow down. We could call the method every time we need that data and just accept the overhead, but if performance is a concern we have some options.
For one, we can assign the data to a variable and re-use it, which would speed up the process. While a possible solution, manually managing that variable could quickly become tedious.
But, what if instead, the method doing this "slow work" could just handle that variable for us? This would allow us to call the method in the same way, but have the method save and re-use the data. This is exactly what memoization does.
Put simply, memoization is saving a method's return value so it does not have to be recomputed each time. As with all caching, you are effectively trading memory for time (i.e. you give up the memory required to store the value, but you save the time required to process the method).
How to Memoize a Value
Ruby provides a very clean idiom for memoizing values with the or-equals operator: ||=
. This uses a logical OR (||
) between left and right values, then assigns the result to the variable on the left. In action:
value ||= expensive_method(123)
#logically equivalent to:
value = (value || expensive_method(123))
How Does Memoization Work
To understand how this works you need to grasp two concepts: "falsey" values and lazy evaluation. We'll start with truthy-falsey first.
Truthy and Falsey
Ruby (like almost all other languages) has built-in keywords for boolean true
and false
values. They work exactly as you'd expect:
if true
#we always run this
end
if false
# this will never run
end
However, Ruby (and many other languages) also has the concept of "truthy" and "falsey" values. This means values can be treated "as if" they were true
or false
. In Ruby only nil
and false
are falsey. All other values (including zero) are treated as true
(note: other languages make different choices. For example C treats zero as false
). Re-using our example from above, we could also write:
value = "abc123" # a string
if value
# we always run this
end
value = nil
if value
# this will never run
end
Lazy Evaluation
Lazy evaluation is a form of optimization that is very common in programming languages. It allows the program to skip operations that are not necessary.
The logical OR operator (||
) returns true if either left or right-hand sides are true. This means that if the left-hand argument is true there is no point evaluating the right-hand side since we already know the result will be true.
If we were to implement this ourselves we might end up with something like this:
def logical_or (lhs, rhs)
return lhs if lhs
rhs
end
If lhs
and rhs
were functions (e.g. lamdas) then you can see rhs
will only execute if lhs
is falsey.
Or-Equals
Combining these two concepts of truthy-falsey values and lazy evaluation shows us what the ||=
operator is doing:
value #defaults to nil
value ||= "test"
value ||= "blah"
puts value
=> test
We start with value being nil
because it was not initialized. Next, we encounter our first ||=
operator. value
is falsey at this stage so we evaluate the right-hand side ("test"
) and assign the result to value
.
Now we hit the second ||=
operator, but this time value
is truthy as it has the value "test"
. We skip the evaluation of the right-hand side and continue with value
untouched.
Deciding When to Use Memoization
When using memoization there are some questions we need to ask ourselves: How often is the value accessed? What causes it to change? How often does it change?
If the value is only accessed once then caching the value is not going to be very useful, the more often the value is accessed, the more benefit we can get from caching it.
When it comes to what causes it to change we need to look at what values are used in the method. Does it take arguments? If so the memoization probably needs to take this into account. Personally, I like using the memoist gem for this as it handles arguments for you.
Lastly, we need to consider how often the value changes. Are there instance variables that cause it to change? Do we need to clear the cached value when they change? Should the value be cached at the object level or the class level?
To answer these questions let's look at a simple example and step through the decisions:
class ProfitLossReport
def initialize(title, expenses, invoices)
@expenses = expenses
@invoices = invoices
@title = title
end
def title
"#{@title} #{Time.current}"
end
def cost
@expenses.sum(:amount)
end
def revenue
@invoices.sum(:amount)
end
def profit
revenue - cost
end
def average_profit(months)
profit / months.to_f
end
end
Calling code is not shown here, but it's a good guess that the title
method is probably only called once, it also uses Time.current
so memoizing it could mean the value instantly becomes stale.
The revenue
and cost
methods are hit several times even within this class. Given that they both require hitting the database, they would be prime candidates for memoizing if performance became an issue. Assuming we memoize these, then profit
shouldn't need to be memoized, otherwise, we're just adding caching on top of caching for minimal gains.
Finally, we have average_profit
. The value here relies on the argument so our memoizing has to take this into account. For a simple case like revenue
we can just do this:
def revenue
@revenue ||= @invoices.sum(:amount)
end
For average_profit
though, we need a different value for each argument that is passed in. We could use memoist for this, but for the sake of clarity we'll roll our own solution here:
def average_profit(months)
@average_profit ||= {}
@average_profit[months] ||= profit / months.to_f
end
Here we're using a hash to keep track of our computed values. First we ensure @average_profit
has been initialized, then we use the argument passed in is as the hash key.
Memoizing at the Class Level or Instance Level
Most of the time memoization is done at the instance level, meaning we use an instance variable to hold the computed value. This also means that whenever we create a new instance of the object it does not benefit from the "cached" value. Here's a very simple illustration:
class MemoizedDemo
def value
@value ||= computed_value
end
def computed_value
puts "Crunching Numbers"
rand(100)
end
end
Using this object we can see the results:
demo = MemoizedDemo.new
=> #<MemoizedDemo:0x00007f95e5d9d398>
demo.value
Crunching Numbers
=> 19
demo.value
=> 19
MemoizedDemo.new.value
Crunching Numbers
=> 93
We can change this by simply using a class-level variable (with @@
) for our memoized value:
def value
@@value ||= computed_value
end
The results then become:
demo = MemoizedDemo.new
=> #<MemoizedDemo:0x00007f95e5d9d398>
demo.value
Crunching Numbers
=> 60
demo.value
=> 60
MemoizedDemo.new.value
=> 60
You may not want class-level memoization often, but it is there as an option. However, if you need a value to be cached at this level it is probably worth looking at caching the value with an external store like Redis or memcached.
Common Memoization Use Cases in Ruby on Rails Applications
In Rails applications, the most common use-case I see for memoization is reducing database calls, particularly when a value is not going to change within a single request. "Finder" methods for looking up records in controllers are a good example of this kind of database call such as:
def current_user
@current_user ||= User.find(params[:user_id])
end
Another common place is if you use some type of decorator/presenter/view-model type of architecture for rendering views. Methods in these objects often have good candidates for memoization because they only persist for the life of the request, the data is normally not mutated, and some methods are probably hit multiple times when rendering the views.
Memoization Gotchas
One of the biggest gotchas is memoizing things when it is not really necessary. Things like string interpolation can look like easy candidates for memoization, but in reality, they are unlikely to be causing any noticeable impact on your site's performance (unless of course you are using exceptionally large strings or doing a very large amount of string manipulation), for example:
def title
# memoization here is not going to have much of an impact on our performance
@title ||= "#{@object.published_at} - #{@object.title}"
end
Another thing to watch for is our old friend cache invalidation, particularly if your memoized value depends on the state of the object. One way to help prevent this is to cache at the lowest level you can. Instead of caching a method computing a + b
it may be better to cache the a
and b
methods individually.
# Instead of this
def profit
# anyone else calling 'revenue' or 'losses' is not benefitting from the caching here
# and what happens if the 'revenue' or 'losses' value changes, will we remember to update profit?
@profit ||= (revenue - losses)
end
# try this
def profit
# no longer cached, but subtraction is a fast calculation
revenue - losses
end
def revenue
@revenue ||= Invoice.all.sum(:amount)
end
def losses
@losses ||= Purchase.all.sum(:amount)
end
The last gotcha is due to how lazy evaluation works - you will have to do something a bit more custom if you need to memoize a falsey value (i.e nil or false), as the ||=
idiom will always execute the right-hand side if your saved value is falsey. In my experience, it's not often you need to cache these values, but if you do, you may need to add a boolean flag to indicate it's already been computed, or use another caching mechanism.
def last_post
# if the user has no posts, we will hit the database every time this method is called
@last_post ||= Post.where(user: current_user).order_by(created_at: :desc).first
end
# As a simple workaround we could do something like:
def last_post
return @last_post if @last_post_checked
@last_post_checked = true
@last_post ||= Post.where(user: current_user).order_by(created_at: :desc).first
end
When Memoization is Not Enough
Memoization can be a cheap and effective way to improve performance in parts of your application, but it's not without its drawbacks. One big one is persistence; for common instance-level memoization, the value is only saved for that one particular object. This makes memoization great for saving values for the life of a web request, but doesn't give you the full benefits of caching if you have values that would be the same for multiple requests and are being re-computed each time.
Class-level memoization could help with this, but it becomes more difficult to manage cache invalidation. Not to mention that if your server reboots those cached values will be lost, and they can't be shared among multiple web servers.
In the next issue in this series on caching we'll look at Rails' solution to these problems - low-level caching. Allowing you to cache values to an external store that can be shared among servers, and manage cache invalidation with expiry timeouts and dynamic cache keys.