If you have the pleasure of working with a Rails app that uses a lot of data like logs, events, and metrics, you’ve probably run into performance issues at some point due to large tables. Deleting high quantities of rows can bring your database server to its knees or slow down queries to unacceptable latencies. PostgreSQL table partitioning is an excellent solution to these sorts of problems. Partitioning is splitting one huge table into smaller, more manageable chunks behind the scenes, so your app keeps humming even as data piles up.

PostgreSQL has supported native partitioning since PostgreSQL version 10, making it straightforward to set up and support over time. In this guide, we'll show you how partitioning in PostgreSQL works, how to use it with Rails, and how to automate it. Let's get started!

Why do people use partitioning in PostgreSQL?

As your application usage grows, so does your data. At some point, that growth starts to hurt your database, and in turn, your application performance (in which case, you will hopefully be alerted by your Ruby application monitoring tool). Massive database tables in PostgreSQL can cause all sorts of headaches. At some point, you'll notice slower queries, bloated indexes, and even long-running DELETE operations. You can buy yourself a lot of time by adding the right indexes and tuning your queries, but eventually, the sheer volume of rows can still degrade performance over time.

We use partitioned tables at Honeybadger in our primary PostgreSQL database to efficiently expire old data, since deleting a bunch of data from our huge tables was causing database performance to suffer.

Partitioning is the de facto method of breaking up a large table into smaller, more manageable pieces. Under the hood, each partition is a separate table, but PostgreSQL now abstracts this away so that your application can treat it like a single table.

Partitioning shines in situations like ours where you're often inserting data and simultaneously expiring old data. Partitioning in PostgreSQL doesn't just make queries faster, it also makes ongoing maintenance more predictable and less risky.

How Postgres partitioning has changed over time

Partitioned tables in PostgreSQL have come a long way. Before version 10, there was no built-in support for table partitioning. You had to do some workarounds to make it happen. Subsequent major versions of Postgres introduced improvements to partitioning, while tools like pg_partman helped automate much of the setup.

It worked by using PostgreSQL's table inheritance to create child tables of the entire table that is to be partitioned and using triggers to insert data into the child tables rather than inserting data into the parent table. That extension worked well for us, but it wasn't exactly what we needed.

PostgreSQL 10 introduced native partitioning, and with each new version, it’s become more powerful and easier to use. You can now declare a partitioned table with straightforward SQL, and PostgreSQL handles routing under the hood. Maintenance is simpler, and the syntax is much cleaner.

Our partitioning use case is fairly simple: we partition tables based on time, creating a new PostgreSQL partition for every day, week, or month, depending on how many rows we want to store across all partitions and on how long we want to retain the data. All of our partitioned tables have a created_at column that will be used to determine which partition stores each row. For example, we might have a table defined like this:

create table events (
  project_id integer,
  data jsonb,
  created_at timestamp
)
partition by range (created_at);

And if we wanted to have weekly partitions, they would look like this:

create table events_p2019_10_28 partition of events for values from ('2019-10-28') to ('2019-11-04');
create table events_p2019_11_04 partition of events for values from ('2019-11-04') to ('2019-11-11');

With a time-based partitioning scheme, deleting old data is as simple as dropping one of the partitions. The regular maintenance we need to do, then, is to create new partitions for date ranges as we approach them and delete old partitions containing data we no longer want. To make that maintenance a little easier, I have created the pg_partition_manager gem. Naturally, it's inspired by my experience with the pg_partman extension, which has served us so well.

Let's take a look at how you'd use this gem, given the events table and partitioning scheme described above. You would create a script or rake task that looks like this:

require "pg_partition_manager"

PgPartitionManager::Time.process([{parent_table: "public.events", period: "week", premake: 1, retain: 3}])

The parent_table is defined as schema.table_name (public, the default schema, is often the only one Rails developers end up using). The period can be day, week, or month. You can choose how many tables you want to be created in advance (after the current period) with premake, and how many tables you want to keep (prior to the current period) with retain. The gem defaults to pre-creating 4 tables if you don't specify premake, and it defaults to retaining data for 7 days, 4 weeks, and 6 months if you don't specify retain.

Invoke that script/task with a daily cron job, and you're all set. It will create partitions, creating and dropping the tables as needed.

Different approaches to PostgreSQL table partitioning

PostgreSQL supports three primary partitioning strategies: range, list, and hash. Each of these strategies serves a different purpose, and it’s worth understanding what each type is good for.

Range partitioning is the most common partitioning strategy, especially if you’re partitioning out time-series data. It lets you define partitions that each cover a range of values for a given column—usually a timestamp like created_at. For example, you might use range partitioning if you have a partition for each week of data, each month, or even each day, depending on your data volume and retention requirements.

List partitioning in PostgreSQL is done by assigning specific values to each partition. It's most often useful when you want to split your data based on discrete categories, like types of subscriptions. For example, you might have a users table with separate partitions for "free", "pro", and "enterprise" users.

Hash partitioning spreads data across your partitions based on the hash of a particular column’s value. This strategy is the right choice when you want to distribute data evenly but don’t care too much about how it gets grouped.

Many Rails apps that are starting to need partitioning are working with time-based data. With this sort of use case, range partitioning on a timestamp column is a practical approach.

Making partitioned tables

Setting up partitioned tables in PostgreSQL starts with defining a parent table, followed by creating the child partitions. The parent table acts as a logical container, and each partition handles a subset of the data based on the strategy you choose.

For example, we might want an events table that stores time-series data. We want to partition it by the created_at column using range partitioning, with one partition per week.

Here’s how we’d define the parent table:

create table events (
  project_id integer,
  data jsonb,
  created_at timestamp
)
partition by range (created_at);

You’ll then need to create partitions for specific date ranges before inserting data. If we wanted to have weekly partitions, they would look like this:

create table events_p2019_10_28 partition of events for values from ('2019-10-28') to ('2019-11-04');
create table events_p2019_11_04 partition of events for values from ('2019-11-04') to ('2019-11-11');

Each partition is its own literal table under the hood, but Postgres lets you pull data as if it were all one table.

Postgres table partitioning diagram

Getting data from PostgreSQL partitioned tables in Rails

One of my favorite things about native PostgreSQL partitioning is that it works so well with the Rails framework. Once your tables are partitioned, you can just keep using ActiveRecord as usual. There's no need to call special methods or interface with the child tables. From the perspective of your Rails app (and you!), a partitioned table is just a table.

All ActiveRecord queries act just as they would with a non-partitioned table, so there's nothing you need to change in your code. That is, Event.create, Event.where, etc., will work just as they always have, with PostgreSQL putting the data in the appropriate partition for you when you insert it. There is one change you may notice if you have lots of data, though... when you include created_at in your queries, PostgreSQL won't have to scan all the partitions. It can just scan the ones that cover the range you specify in your where clause.

When you run a query against a partitioned table, PostgreSQL's query planner automatically figures out the appropriate partitions to scan based on the filter conditions. If your events table is partitioned by week on created_at like in our example above, Postgres will only search the relevant partitions for the specified time range.

To get this performance boost, you should always include the partition key in your queries. If you're using partitioning on created_at like our example, that means including created_at in your query.

Some partitioning tips for Rails

Once you've got partitioned tables initially set up and you're comfortable managing those partitions, there are a few Rails-specific considerations that can help you get the most out of it in terms of performance.

If your app writes data anywhere near the edge of a partition window (like at midnight), it's important that new partitions are created in advance.

The pg_partition_manager gem’s premake setting is designed to handle this—just make sure your cron job or scheduled task runs before the new period starts, so you'll have to run the job often enough.

Will you partition your data?

For apps with lots of data, PostgreSQL table partitioning is one of the most impactful things you can do for performance. Table partitioning in PostgreSQL can drastically improve your SQL query performance and help you manage data retention.

PostgreSQL’s native support for partitioned tables makes it easy to adopt without too much hassle, and tools like the pg_partition_manager gem let you automate the boring parts.

If you're noticing bloated tables causing slow deletes or increasing your query latency, table partitioning in PostgreSQL should be something you reach for. Your future self will thank you.

Get the Honeybadger newsletter

Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo
    Benjamin Curtis

    Ben has been developing web apps and building startups since '99, and fell in love with Ruby and Rails in 2005. Before co-founding Honeybadger, he launched a couple of his own startups: Catch the Best, to help companies manage the hiring process, and RailsKits, to help Rails developers get a jump start on their projects. Ben's role at Honeybadger ranges from bare-metal to front-end... he keeps the server lights blinking happily, builds a lot of the back-end Rails code, and dips his toes into the front-end code from time to time. When he's not working, Ben likes to hang out with his wife and kids, ride his road bike, and of course hack on open source projects. :)

    More articles by Benjamin Curtis
    An advertisement for Honeybadger that reads 'Move fast and fix things.'

    "We love Honeybadger. Not only is it a fab tool and very affordable, they feel like a partner - whenever we needed help we got FIRST CLASS service."

    Fix errors, eliminate performance bottlenecks, and dig into the details faster than ever before. With support for Ruby, Elixir, and 8 other languages, Honeybadger is the best way to gain real-time insights into your application’s health and performance.

    Get started for free
    Simple 5-minute setup — No credit card required