A Rubyist's Guide to Big-O Notation

I don't have a degree in computer science. Lots of us Rubyists don't. So for a long time I avoided learning about big-O notation. It looks a little too much like higher math. I mean: O(N^2). Come on.

Instead, I learned rules of thumb:

Finding a specific item in a Hash is faster than in an Array
Avoid nested loops
Watch out for accidental DB queries when generating lists in your views

These rules are nice, but unless you understand WHY they work, you're going to find yourself making mistakes and having inexplicable performance problems.

Why does it matter?

Big-O notation is just a fancy way of describing how your code's performance depends on the amount of data it's processing.

Performance means one of two things: speed, or ram usage. In computer science class you'd refer to these as "time complexity" and "space complexity." Big-O notation is used for both, but we're going to focus on speed here, since that seems to be the more common usage.

You would expect that processing an array of 100 items would be slower than an array of 10 items. But by how much? 10x, 100x? 1000x?

It's not a big deal with small datasets, but if your app gets exponentially slower for each row in the database it soon becomes a problem.

Before we get into the details, I made a quick chart showing common big-Os with emoji showing how they'll make you feel as your data scales.

Big O	Rank	Meaning
O(1)	😎	Speed doesn't depend on the size of the dataset
O(log n)	😁	10x the data means 2x more time
O(n)	😕	10x the data means 10x more time
O(n log n)	😖	10x the data means about 20x more time
O(n^2)	😫	10x the data take 100x more time
O(2^n)	😱	The dilithium crystals are breaking up!

So when someone says Array#bsearch is better than Array#find because it's O(log n) vs O(n) you can just compare 😁 to 😕 and see that they might be onto something.

For something a little more robust, check out the Big-O Cheat Sheet

Deciphering the Notation

You don't have to memorize all of the different Big-O values, so long as you understand how the notation works.

Take, for example, the horrible horrible O(2^n). If we were to express that in Ruby, it might look like this:

# O(2^n) translated to Ruby
def o(n)
  2 ** n  # This is ruby for 2^n
end

Still not obvious? How about if I rename the method and argument to something more user-friendly?

# O(2^n) translated to prettier Ruby
def execution_time(size_of_dataset)
  2 ** size_of_dataset
end

You can do this for all of them:

# O(1)
def o1_execution_time(size_of_dataset)
  1
end

# O(n)
def on_execution_time(size_of_dataset)
  size_of_dataset
end

# O(n^2)
def on2_execution_time(size_of_dataset)
  size_of_dataset * size_of_dataset
end

...etc

Now that you know how the notation works, let's take a look at some typical ruby code and see how it relates.

O(1)

When we say that something is O(1) it means that its speed doesn't depend on the size of its data set.

For example, hash lookup times don't depend on the hash size:

# These should all take the same amount of time
hash_with_100_items[:a]
hash_with_1000_items[:a]
hash_with_10000_items[:a]

This is why we tend to say that hashes are faster than arrays for larger datasets.

O(n)

In contrast, Array#find is O(n). That means Array#find depends linearly on the number of items in the array. An array with 100 items will take 100 times longer to search than an array with one item

A lot of code that iterates over arrays follows the O(n) pattern.

(0..9).each do |i|
  puts i
end

# This example is 1/2 the speed of the previous because it contains 2x the items. 
(0..19).each do |i|
  puts i
end

O(n^2)

Code that fits an O(n^2) profile tends to involve nested loops. That makes sense if you think about it. One loop gives you O(n), a second nested loop gives you O(n^2). If — for some unholy reason — you had a 5-level nested loop, it'd be O(n^5).

data = (0..100)
data.each do |d1|
  data.each do |d2|
    puts "#{ d1 }, #{ d2 }"
  end
end

O(n log n)

O(n log n) code is often the result of someone finding a clever way to reduce the amount of work that an otherwise O(n^2) algorithm would do.

You wouldn't be able to just look at a piece of code and tell it's O(n log n). This is where the higher math comes in, and it's also where I bow out.

But it is important to know about O(n log n) because it describes a lot of common search algorithms. Ruby's Array#sort uses the venerable quicksort algorithm, which on average is O(n log n) and worst-case is O(n^2)

If you're not familiar with quicksort, check out this excellent demonstration.

Putting it all together: Your Database

One of the most common problems with new web applications is that they're fast on the developer's computer, but in production they get slower and slower.

This happens because the amount of records in your database grows over time, but your code is asking the DB to do operations that don't scale well. ie. O(n) or worse.

For example, did you know that in postgres, count queries are always O(n)?

# This makes the DB iterate over every row of Users
# ...unless you're using a Rails counter cache. 
Users.count

We can see this by using the postgres explain command. Below, I use it to get the query plan for a count query. As you can see, it plans on doing a sequential scan (that means looping) over all 104,791 rows in the table.

# explain select count(*) from users;
                           QUERY PLAN
-----------------------------------------------------------------
 Aggregate  (cost=6920.89..6920.90 rows=1 width=0)
   ->  Seq Scan on users  (cost=0.00..6660.71 rows=104701 width=0)
(2 rows)

Lots of common rails idioms can trigger unintentional sequential scans, unless you specifically optimize the database to prevent them.

# This asks the DB to sort the entire `products` table
Products.order("price desc").limit(1)

# If `hobby` isn't indexed, the DB loops through each row of Users to find it. 
User.where(hobby: "fishing")

You can use the explain command to see that as well. Here we see we're doing a sort (probably quicksort) on the whole table. If there are memory constraints, it may have chosen a different sorting algorithm with different performance characteristics.

# explain select * from users order by nickname desc limit 1;
                               QUERY PLAN
-------------------------------------------------------------------------
 Limit  (cost=7190.07..7190.07 rows=1 width=812)
   ->  Sort  (cost=7190.07..7405.24 rows=104701 width=812)
         Sort Key: nickname
         ->  Seq Scan on users  (cost=0.00..6606.71 rows=104701 width=812)

The answer to all these problems is, of course, indexing. Telling the database to use an index is a little like in Ruby, when you use a hash lookup O(1) instead of an array find O(n).

That's all, folks

I hope this has been a useful introduction to Big-O notation and how in impacts you as a ruby developer! Please ping me @StarrHorne if you have any questions.

What to do next:

Try Honeybadger for FREE

Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
Start free trial
Easy 5-minute setup — No credit card required
Get the Honeybadger newsletter

Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
Include latest Ruby articles

Starr Horne

Starr Horne is a Rubyist and Chief JavaScripter at Honeybadger.io. When she's not neck-deep in other people's bugs, she enjoys making furniture with traditional hand-tools, reading history and brewing beer in her garage in Seattle.

@starrhorne Author Twitter

More Ruby articles

Jul 02, 2024 Parsing CSV files in Ruby
Jun 25, 2024 API versioning in Ruby on Rails
Jun 14, 2024 Curated dashboards in Honeybadger
Jun 04, 2024 How to build a Rails API with rate limiting
Jun 03, 2024 Blue Ridge Ruby is exactly what we need
May 21, 2024 Adding views to an API-only Rails app
May 07, 2024 Running Solid Queue in production Rails apps
Apr 23, 2024 Shipping Rails logs with Kamal and Vector
Apr 09, 2024 Account-based subdomains in Rails
Mar 12, 2024 Let's build a Hanami app

Stop wasting time manually checking logs for errors!

Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

Know when critical errors occur, and which customers are affected.
Respond instantly when your systems go down.
Improve the health of your systems over time.
Fix problems before your customers can report them!

As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

Start free trial

Simple 5-minute setup — No credit card required

Learn more

"We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
— Michael Smith, Cofounder & CTO of YvesBlue

Honeybadger is trusted by top companies like:

“Everyone is in love with Honeybadger ... the UI is spot on.”

Molly Struve, Sr. Site Reliability Engineer, Netflix

Start free trial

Are you using Sentry, Rollbar, Bugsnag, or Airbrake for your monitoring? Honeybadger includes error tracking with a whole suite of amazing monitoring tools — all for probably less than you're paying now. Discover why so many companies are switching to Honeybadger here.

Start free trial

A Rubyist's Guide to Big-O Notation

Why does it matter?

Deciphering the Notation

O(1)

O(n)

O(n^2)

O(n log n)

Putting it all together: Your Database

That's all, folks

Starr Horne

More Ruby articles

Get Honeybadger's best Ruby articles in your inbox