A Rubyist's Guide to Big-O Notation

Big-O notation gives you crucial insight into why your apps aren't as fast as you'd like them to be. In this post we'll uncover the meaning of things like O(N^2) and show how to use these concepts to speed up your apps and your database queries.

I don't have a degree in computer science. Lots of us Rubyists don't. So for a long time I avoided learning about big-O notation. It looks a little too much like higher math. I mean: O(N^2). Come on.

Instead, I learned rules of thumb:

  • Finding a specific item in a Hash is faster than in an Array
  • Avoid nested loops
  • Watch out for accidental DB queries when generating lists in your views

These rules are nice, but unless you understand WHY they work, you're going to find yourself making mistakes and having inexplicable performance problems.

Why does it matter?

Big-O notation is just a fancy way of describing how your code's performance depends on the amount of data it's processing.

Performance means one of two things: speed, or ram usage. In computer science class you'd refer to these as "time complexity" and "space complexity." Big-O notation is used for both, but we're going to focus on speed here, since that seems to be the more common usage.

You would expect that processing an array of 100 items would be slower than an array of 10 items. But by how much? 10x, 100x? 1000x?

It's not a big deal with small datasets, but if your app gets exponentially slower for each row in the database it soon becomes a problem.

Before we get into the details, I made a quick chart showing common big-Os with emoji showing how they'll make you feel as your data scales.

Big O Rank Meaning
O(1) 😎 Speed doesn't depend on the size of the dataset
O(log n) 😁 10x the data means 2x more time
O(n) 😕 10x the data means 10x more time
O(n log n) 😖 10x the data means about 20x more time
O(n^2) 😫 10x the data take 100x more time
O(2^n) 😱 The dilithium crystals are breaking up!

So when someone says Array#bsearch is better than Array#find because it's O(log n) vs O(n) you can just compare 😁 to 😕 and see that they might be onto something.

For something a little more robust, check out the Big-O Cheat Sheet

Deciphering the Notation

You don't have to memorize all of the different Big-O values, so long as you understand how the notation works.

Take, for example, the horrible horrible O(2^n). If we were to express that in Ruby, it might look like this:

# O(2^n) translated to Ruby
def o(n)
  2 ** n  # This is ruby for 2^n
end

Still not obvious? How about if I rename the method and argument to something more user-friendly?

# O(2^n) translated to prettier Ruby
def execution_time(size_of_dataset)
  2 ** size_of_dataset
end

You can do this for all of them:

# O(1)
def o1_execution_time(size_of_dataset)
  1
end

# O(n)
def on_execution_time(size_of_dataset)
  size_of_dataset
end

# O(n^2)
def on2_execution_time(size_of_dataset)
  size_of_dataset * size_of_dataset
end

...etc

Now that you know how the notation works, let's take a look at some typical ruby code and see how it relates.

O(1)

When we say that something is O(1) it means that its speed doesn't depend on the size of its data set.

For example, hash lookup times don't depend on the hash size:

# These should all take the same amount of time
hash_with_100_items[:a]
hash_with_1000_items[:a]
hash_with_10000_items[:a]

This is why we tend to say that hashes are faster than arrays for larger datasets.

O(n)

In contrast, Array#find is O(n). That means Array#find depends linearly on the number of items in the array. An array with 100 items will take 100 times longer to search than an array with one item

A lot of code that iterates over arrays follows the O(n) pattern.

(0..9).each do |i|
  puts i
end

# This example is 1/2 the speed of the previous because it contains 2x the items. 
(0..19).each do |i|
  puts i
end

O(n^2)

Code that fits an O(n^2) profile tends to involve nested loops. That makes sense if you think about it. One loop gives you O(n), a second nested loop gives you O(n^2). If — for some unholy reason — you had a 5-level nested loop, it'd be O(n^5).

data = (0..100)
data.each do |d1|
  data.each do |d2|
    puts "#{ d1 }, #{ d2 }"
  end
end

O(n log n)

O(n log n) code is often the result of someone finding a clever way to reduce the amount of work that an otherwise O(n^2) algorithm would do.

You wouldn't be able to just look at a piece of code and tell it's O(n log n). This is where the higher math comes in, and it's also where I bow out.

But it is important to know about O(n log n) because it describes a lot of common search algorithms. Ruby's Array#sort uses the venerable quicksort algorithm, which on average is O(n log n) and worst-case is O(n^2)

If you're not familiar with quicksort, check out this excellent demonstration.

Putting it all together: Your Database

One of the most common problems with new web applications is that they're fast on the developer's computer, but in production they get slower and slower.

This happens because the amount of records in your database grows over time, but your code is asking the DB to do operations that don't scale well. ie. O(n) or worse.

For example, did you know that in postgres, count queries are always O(n)?

# This makes the DB iterate over every row of Users
# ...unless you're using a Rails counter cache. 
Users.count

We can see this by using the postgres explain command. Below, I use it to get the query plan for a count query. As you can see, it plans on doing a sequential scan (that means looping) over all 104,791 rows in the table.

# explain select count(*) from users;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=6920.89..6920.90 rows=1 width=0)
   ->  Seq Scan on users  (cost=0.00..6660.71 rows=104701 width=0)
(2 rows)

Lots of common rails idioms can trigger unintentional sequential scans, unless you specifically optimize the database to prevent them.

# This asks the DB to sort the entire `products` table
Products.order("price desc").limit(1)

# If `hobby` isn't indexed, the DB loops through each row of Users to find it. 
User.where(hobby: "fishing")

You can use the explain command to see that as well. Here we see we're doing a sort (probably quicksort) on the whole table. If there are memory constraints, it may have chosen a different sorting algorithm with different performance characteristics.

# explain select * from users order by nickname desc limit 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=7190.07..7190.07 rows=1 width=812)
   ->  Sort  (cost=7190.07..7405.24 rows=104701 width=812)
         Sort Key: nickname
         ->  Seq Scan on users  (cost=0.00..6606.71 rows=104701 width=812)

The answer to all these problems is, of course, indexing. Telling the database to use an index is a little like in Ruby, when you use a hash lookup O(1) instead of an array find O(n).

That's all, folks

I hope this has been a useful introduction to Big-O notation and how in impacts you as a ruby developer! Please ping me @StarrHorne if you have any questions.

What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Starr Horne

    Starr Horne is a Rubyist and Chief JavaScripter at Honeybadger.io. When she's not neck-deep in other people's bugs, she enjoys making furniture with traditional hand-tools, reading history and brewing beer in her garage in Seattle.

    More articles by Starr Horne
    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial
    Are you using Sentry, Rollbar, Bugsnag, or Airbrake for your monitoring? Honeybadger includes error tracking with a whole suite of amazing monitoring tools — all for probably less than you're paying now. Discover why so many companies are switching to Honeybadger here.
    Start free trial