Using ActiveRecord's #update_counters to Prevent Race Conditions

Rails is a large framework with a lot of handy tools built-in for specific situations. In this series, we're taking a look at some of the lesser-known tools hidden in Rails' large codebase.

In this article in the series, we're going to take a look at ActiveRecord's update_counters method. In the process, we'll look at the common trap of "race conditions" in multithreaded programs and how this method can prevent them.

Threads

When programming, we have several ways to run code in parallel, including processes, threads, and, more recently (in Ruby), fibers and reactors. In this article, we're only going to worry about threads, as it is the most common form that Rails developers will encounter. For example, Puma is a multithreaded server, and Sidekiq is a multithreaded background job processor.

We won't do a deep dive into threads and thread safety here. The main thing to know is that when two threads are operating on the same data, the data can easily get out of sync. This is what is known as a "race condition".

Race conditions

A race condition occurs when two (or more) threads are operating on the same data at the same time, meaning a thread could end up using stale data. It is called a "race condition" because it is like the two threads are racing each other, and the final state of the data may be different depending on which thread "won the race". Perhaps worst of all, race conditions are very difficult to reproduce because they typically only occur if the threads "take turns" in a particular order and at a particular point in the code.

An example

A common scenario used to show a race condition is updating a bank balance. We'll create a simple test class within a basic Rails application so that we can see what happens:

class UnsafeTransaction
  def self.run
    account = Account.find(1)
    account.update!(balance: 0)

    threads = []
    4.times do
      threads << Thread.new do
        balance = account.reload.balance
        account.update!(balance: balance + 100)

        balance = account.reload.balance
        account.update!(balance: balance - 100)
      end
    end

    threads.map(&:join)

    account.reload.balance
  end
end

Our UnsafeTransaction is pretty simple; we just have one method that looks up an Account (a stock-standard Rails model with a BigDecimal balance attribute). We reset the balance to zero to make re-running the test simpler.

The inner loop is where things get a bit more interesting. We're creating four threads that will grab the current balance of the account, add 100 to it (e.g., a $100 deposit), and then immediately subtract 100 (e.g., a $100 withdrawal). We're even using reload both times to be extra sure we have the up-to-date balance.

The remaining lines are just some tidying up. Thread.join means we will wait for all threads to terminate before proceeding, and then we return the final balance at the end of the method.

If we ran this with a single thread (by changing the loop to 1.times do), we could happily run it a million times and be sure the final account balance will always be zero. Change it to two (or more) threads, though, and things are less certain.

Running our test once in a console will probably give us the correct answer:

UnsafeTransaction.run
=> 0.0

However, what if we ran it over and over. Let’s say we ran it ten times:

(1..10).map { UnsafeTransaction.run }.map(&:to_f)
=> [0.0, 300.0, 300.0, 100.0, 100.0, 100.0, 300.0, 300.0, 100.0, 300.0]

In case the syntax here is not familiar, (1..10).map {} will run code in the block 10 times, with the results from each run put into an array. The .map(&:to_f) at the end is just to make the numbers more human-readable, as the BigDecimal values will be normally printed in exponential notation like 0.1e3.

Remember, our code takes the current balance, adds 100, and then immediately subtracts 100, so the final result should always be 0.0. These 100.0 and 300.0 entries, then, are proof that we have a race condition.

An annotated example

Let's zoom in on the problem code here and see what's happening. We'll separate out the changes to balance for even more clarity.

threads << Thread.new do
  # Thread could be switching here
  balance = account.reload.balance
  # or here...
  balance += 100
  # or here...
  account.update!(balance: balance)
  # or here...

  balance = account.reload.balance
  # or here...
  balance -= 100
  # or here...
  account.update!(balance: balance)
  # or here...
end

As we see in the comments, the threads could be swapping at almost any point during this code. If Thread 1 reads the balance, then the computer starts executing Thread 2, so it's quite possible that the data will be out of date by the time it calls update!. Put another way, Thread 1, Thread 2, and the database, all have data in them, but they are getting out of sync with each other.

The example here is deliberately trivial so that it is easy to dissect. In the real world, though, race conditions can be harder to diagnose, particularly because they usually cannot be reproduced reliably.

Solutions

There are a few options for preventing race conditions, but nearly all of them revolve around a single idea: making sure that only one entity is changing the data at any given time.

Option 1: Mutex

The simplest option is a "mutual exclusion lock", commonly known as a mutex. You can think of a mutex as a lock with only one key. If one thread is holding the key, it can run whatever is in the mutex. All other threads will have to wait until they can hold the key.

Applying a mutex to our example code could be done like so:

class MutexTransaction
  def self.run
    account = Account.find(1)
    account.update!(balance: 0)

    mutex = Mutex.new

    threads = []
    4.times do
      threads << Thread.new do
        mutex.lock
        balance = account.reload.balance
        account.update!(balance: balance + 100)
        mutex.unlock

        mutex.lock
        balance = account.reload.balance
        account.update!(balance: balance - 100)
        mutex.unlock
      end
    end

    threads.map(&:join)

    account.reload.balance
  end
end

Here, every time we read and write to account, we first call mutex.lock, and then once we are done, we call mutex.unlock to allow the other threads to have a turn. We could just call mutex.lock at the start of the block and mutex.unlock at the end; however, this would mean the threads are no longer running concurrently, which somewhat negates the reason for using threads in the first place. For performance, it's best to keep code inside a mutex as small as possible, as it allows threads to execute as much of the code in parallel as possible.

We've used .lock and .unlock for clarity here, but Ruby's Mutex class provides a nice synchronize method that takes a block and handles this for us, so we could have done the following:

mutex.synchronize do
  balance = ...
  ...
end

Ruby's Mutex does what we need, but as you can probably imagine, it's fairly common in Rails applications to need to lock a particular database row, and ActiveRecord has us covered for this scenario.

Option 2: ActiveRecord Locks

ActiveRecord provides a few different locking mechanisms, and we won't do a deep dive into them all here. For our purposes, we can just use lock! to lock a row that we want to update:

class LockedTransaction
  def self.run
    account = Account.find(1)
    account.update!(balance: 0)

    threads = []
    4.times do
      threads << Thread.new do
        Account.transaction do
          account = account.reload
          account.lock!
          account.update!(balance: account.balance + 100)
        end

        Account.transaction do
          account = account.reload
          account.lock!
          account.update!(balance: account.balance - 100)
        end
      end
    end

    threads.map(&:join)

    account.reload.balance
  end
end

Whereas a Mutex "locks" the section of code for a particular thread, lock! locks the particular database row. This means that the same code can execute in parallel on multiple accounts (e.g., in a bunch of background jobs). Only threads that need to access the same record would have to wait. ActiveRecord also provides a handy #with_lock method that lets you do the transaction and lock in one go, so the updates above could be written a bit more succinctly as follows:

account = account.reload
account.with_lock do
  account.update!(account.balance + 100)
end
...

Solution 3: Atomic methods

An 'atomic' method (or function) cannot be stopped mid-way through execution. For example, the common += operation in Ruby is not atomic, even though it looks like a single operation:

value += 10

# equivalent to:
value = value + 10

# Or even more verbose:
temp_value = value + 10
value = temp_value

If the thread suddenly "sleeps" between working out what value + 10 is and writing the result back to value, then it opens the possibility of a race condition. However, let's imagine that Ruby did not allow threads to sleep during this operation. If we could say, with certainty, that a thread will never sleep (e.g., the computer will never switch execution to a different thread) during this operation, then it could be considered an "atomic" operation.

Some languages have atomic versions of primitive values for exactly this kind of thread-safety (e.g., AtomicInteger and AtomicFloat). This doesn't mean that we don't have a few "atomic" operations available to us as Rails developers, though. Once example is ActiveRecord's update_counters method.

Although this is intended more for keeping counter caches up to date, nothing is stopping us from using it in our applications. For more information on counter caches, you can check out my earlier article on caching).

Using the method is incredibly simple:

class CounterTransaction
  def self.run
    account = Account.find(1)
    account.update!(balance: 0)

    threads = []
    4.times do
      threads << Thread.new do
        Account.update_counters(account.id, balance: 100)

        Account.update_counters(account.id, balance: -100)
      end
    end

    threads.map(&:join)

    account.reload.balance
  end
end

No mutexes, no locks, just two lines of Ruby; update_counters takes the record ID as the first argument, and then we tell it which column to change (balance:) and how much to change it by (100 or -100). The reason this works is that the read-update-write cycle now happens in the database in a single SQL call. This means that our Ruby thread can't interrupt the operation; even if it sleeps, it won't matter because the database is doing the actual calculation.

The actual SQL being produced comes out like this (at least for postgres on my machine):

Account Update All (1.7ms)  UPDATE "accounts" SET "balance" = COALESCE("balance", 0) + $1 WHERE "accounts"."id" = $2  [["balance", "100.0"], ["id", 1]]

This way also performs much better, which is unsurprising, as the calculation happens fully in the database; we never have to reload the record to get the latest value. This speed comes at a price, though. Because we are doing this in raw SQL, we are bypassing the Rails model, which means any validations or callbacks will not be executed (meaning, among other things, no change to the updated_at timestamp).

Conclusion

Race conditions could very well be the Heisenbug poster child. They are easy to let in, often impossible to reproduce, and difficult to foresee. Ruby and Rails, at least, give us some helpful tools to squash these issues once we find them.

For general Ruby code, Mutex is a good option and probably the first thing most developers think of when hearing the term "thread safety".

With Rails, more likely than not, the data are coming from ActiveRecord. In these cases, lock! (or with_lock) is straightforward to use and allows more throughput than a mutex, as it only locks the relevant rows in the database.

I'll be honest here, I'm not sure I'd reach for update_counters much in the real world. It is uncommon enough that other developers may not be familiar with how it behaves, and it does not make the intention of the code particularly clear. If faced with thread-safety concerns, ActiveRecord's locks (either lock! or with_lock) are both more common and more clearly communicate the intention of the coder.

However, if you have a lot of simple 'add or subtract' jobs backing up, and you need raw pedal-to-the-metal speed, update_counters can be a useful tool in your back pocket.