Cross-cluster Associations in Rails 7

In this article, Julie Kent discusses using associations in Rails when the underlying data model spans multiple databases. We didn't even know Rails could do this!

One of the beauties of the Rails framework is the ability to utilize associations in your models. These associations allow you to access collections of records in your code with pleasant syntax, abstracting away the need to write underlying SQL queries. For example, let's consider this basic relationship of objects:

# table doctors in database professionals 
class Doctor < ProfessionalsRecord 
  has_many :appointments, through: :patients 
  has_many :patients 
end 

# table patients in database patients 
class Patient < PatientsRecord
  has_many :appointments 
  has_many :doctors 
end 

# table appointments in database default 
class Appointment < ApplicationRecord 
  has_many :doctors, through: :patients 
  has_many :patients 
end 

As you can imagine, there might be a page in our application that shows a list of appointments for a doctor’s office. To accomplish this, in the backend, we could simply write doctor.appointments. This would be fine if all of these relations were in the same database. Rails loads associations lazily, which means that it executes the join when the data is actually requested. In this case, the SQL that would be performed is as follows:

SELECT appointments.* FROM appointments INNER JOIN patients ON appointments.patient_id = patient.id WHERE patients.doctor_id = ?

However, if we tried to run this with the data across multiple databases, Rails would throw an ActiveRecord::StatementInvalid error. While workarounds exist, they're typically less performant and require a lot of extra work to write custom SQL queries that are error prone and more difficult to maintain.

Why Multiple Databases?

Before we go further into how to utilize cross-cluster associations, let's briefly discuss why you might be utilizing multiple databases with Rails. One of the most common reasons is you want to have redundancy in your infrastructure by having a database replica. Another reason is as your application scales, you may want to segregate your database into multiple smaller databases that each have their own responsibility (e.g., one for users/accounts and one for logging). This allows you to potentially store more critical data on a high-availability database cluster while keeping the less critical data on a cheaper configuration. Finally, if you are migrating from an old schema to a new one, it is easier/safer to setup a new database for this.

While ActiveRecord has supported multiple databases prior to Rails 6, it was with the Rails 6 upgrade that users were able to manage them. You can read more about that here.

How to Utilize Cross-Cluster Associations

Let's go back to our original error from the first section. With Rails 7, there is an easy way to solve this issue! All you have to do is add disable_joins: true in your model associations. Going back to our original example, we would have the following:

class Doctor < ProfessionalsRecord 
  has_many :appointments, through: :patients, disable_joins: true 
  has_many :patients 
end 

What exactly is going on here? This tells Rails Active Record not to attempt the join lazily, and instead, it will perform two separate queries to fetch the records:

SELECT "patients"."id" FROM "patients" WHERE "patients"."doctor_id" = ?  [["doctor_id", 1]]
SELECT "appointments".* FROM "appointments" WHERE "appointments"."patient_id" IN (?, ?, ?)  [["patient_id", 1], ["patient_id", 2], ["patient_id", 3]]

Caveats

There are a couple of caveats, though. Because we are now performing two queries instead of one, there may be performance implications. However, this would be the case regardless of whether you used the disable_joins feature or wrote the SQL manually since there are multiple queries across multiple databases happening. Additionally, since a join is longer being performed, a query with an order or limit will now be sorted in-memory. This is because the order from one table cannot be applied to another table. Finally, this setting must be added to all associations that you want joins to be disabled; obviously, Rails cannot simply guess when you do and don't want this feature enabled.

Conclusion

Being able to utilize the built-in association syntax that Rails provides without having to write custom SQL queries for objects that span multiple databases may not seem like a huge win, but it's not a trivial feature. For example, companies like GitHub have 30 databases configured in their Rails monolith, so being able to avoid maintaining all of that custom SQL likely saved them a considerable amount of engineering effort. You can read more about GitHub's use of cross-cluster associations here.

Further Reading

To learn more about cross-cluster associations, I recommend the following:

  • PR that added disable_joins option to has_many relation
  • PR that added disable_joins option to the has_one relation
  • Official docs from Ruby on Rails about disable joins.
What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Julie Kent

    Julie is an engineer at Stitch Fix. In her free time, she likes reading, cooking, and walking her dog.

    More articles by Julie Kent
    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial