The Case of the Flaky Test Suite

In this story, Jason Swett of The Ruby Testing Podcast discusses the pitfalls of external dependencies in your test suites, and how to avoid them.

I recently worked on a test suite that was really frustrating to use due to its unreliability.

The application that the test suite was targeting was a Rails API-only application. The tests were written in JavaScript using a framework called Chakram, "an API testing framework designed to perform end to end tests on JSON REST endpoints".

The problem with the test suite was that it wasn't deterministic. A function or program is deterministic if, given the same inputs, it always gives the same outputs.

This particular test suite would pass on the first run, pass on the second run, then fail on the third run, without changes having been made to either the application code or the test code.

I discovered that I could bring the test suite back to passing by doing a rake db:reset. The tests, which operated on the Rails application's development environment and not on the test environment, depended on the Rails application's database being in a certain state when the test suite started running.

Sometimes I could start with a freshly seeded database, run the test suite, and then successfully run the test suite a second time. I might even be able to run the test suite three or more times. But quite often, the test suite would somehow mess up the data and I'd have to do another rake db:reset to get the data back to a state that the test suite could successfully use.

This is of course not how things should be. A test suite should only ever fail for one reason: the code it's testing stops working properly.

What should have been done instead

So if this test suite was set up in a problematic way, would would have been the right way to do it?

The root of the problem was that the test suite depended on the database reset being done manually by a human. Instead, each individual test in the test suite should have automatically put the database in a clean and ready state before running. (It's also typical for a test to wipe the database after running so it doesn't "leak" its data into other tests. This isn't strictly necessary if every single test clears the database before running, but it's a good protective measure.)

The test suite should also have targeted the Rails application's test environment and not development environment. A developer's actions in the development environment shouldn't intefere with a test suite's work and vice versa.

Other types of problematic dependencies

In my story, the problematic dependency was a database that wasn't getting cleaned properly.

Another type of problematic dependency is a network request. Imagine you have a test suite that hits the Twilio API. You run the test suite on a Tuesday and it passes. Then on Wednesday you run the same test suite and it fails. Unbeknownst to you, Twilio is having an outage, and that's why your test suite failed. A few minutes later the outage is resolved and your test suite passes again.

This goes back to the idea that a test should only fail for one reason, and that's if your application code stops working.

If you need to write tests for code that interacts with the network, a better way to do it (if you're using Ruby/Rails) is to use a tool like VCR that lets you record network requests and then replay them later without actually using the internet.

Cases when outside dependencies are okay

It could be argued that if an application depends on the Twilio API and the Twilio API went down, then the application really did break and so the test really should fail. How do we reconcile this idea with the idea that tests shouldn't depend on outside conditions?

The way to reconcile these two things is to make a distinction between integration tests, which test multiple layers or systems together, and unit tests, which test a single piece of functionality in isolation.

If a team were to make a conscious decision that a certain set of their tests were going to hit external services across the network so multiple systems could be tested together, then there's nothing "wrong" about that. (The alternative would be to decide not to try to test those systems together at all.) In this scenario the team would just have to be aware that their network-dependent integration test suite is not guaranteed to be deterministic and will probably flake out sometimes. A test suite like this should live separately from the set of tests developers run on their machines every day to check for regressions.

While there's an exception to the rule that tests shouldn't depend on outside conditions, most of the time the rule should be observed, giving your test suite the benefit of being deterministic and reliable.

What to do next:
  1. Try Honeybadger for FREE
    Honeybadger helps you find and fix errors before your users can even report them. Get set up in minutes and check monitoring off your to-do list.
    Start free trial
    Easy 5-minute setup — No credit card required
  2. Get the Honeybadger newsletter
    Each month we share news, best practices, and stories from the DevOps & monitoring community—exclusively for developers like you.
    author photo

    Jason Swett

    Ruby on Rails Trainer, Software Engineer, Speaker, Writer, and host of The Ruby Testing Podcast

    Stop wasting time manually checking logs for errors!

    Try the only application health monitoring tool that allows you to track application errors, uptime, and cron jobs in one simple platform.

    • Know when critical errors occur, and which customers are affected.
    • Respond instantly when your systems go down.
    • Improve the health of your systems over time.
    • Fix problems before your customers can report them!

    As developers ourselves, we hated wasting time tracking down errors—so we built the system we always wanted.

    Honeybadger tracks everything you need and nothing you don't, creating one simple solution to keep your application running and error free so you can do what you do best—release new code. Try it free and see for yourself.

    Start free trial
    Simple 5-minute setup — No credit card required

    Learn more

    "We've looked at a lot of error management systems. Honeybadger is head and shoulders above the rest and somehow gets better with every new release."
    — Michael Smith, Cofounder & CTO of YvesBlue

    Honeybadger is trusted by top companies like:

    “Everyone is in love with Honeybadger ... the UI is spot on.”
    Molly Struve, Sr. Site Reliability Engineer, Netflix
    Start free trial