Why is URI.join so counterintuitive?

We just reached a milestone here at Honeybadger. Our sales pages are no longer part of our main Rails app. It's been on my wish list for years, but not exactly top priority.

As part of this migration, I found myself using URI.join to construct particular redirect links. But I quickly ran into a problem. URI.join wasn't behaving as I expected.

I expected it to take a bunch of path fragments and string them together like so:

# This is what I was expecting. It didn't happen.
URI.join("https://www.honeybadger.io", "plans", "change")
=> "https://www.honeybadger.io/plans/change"

What the join method did is much stranger. It dropped one of my path fragments, only using the last one, "change."

# This is what happened.
URI.join("https://www.honeybadger.io", "plans", "change")
=> "https://www.honeybadger.io/change"

So why the heck does it work like this?

The misunderstanding

It turns out that I was expecting URI.join to behave similarly to a specialized version of Array#join, taking URL fragments and combining them to make a whole URL.

That's not what it does. Big surprise.

If we take a look at the join method's code, we see that it just iterates over all arguments, and calls merge on each.

# File uri/rfc2396_parser.rb, line 236
def join(*uris)
  uris[0] = convert_to_uri(uris[0])
  uris.inject :merge
end

The merge method does two things:

It converts your string like "pages" into a relative URI object.
It tries to resolve the relative URI on to the base URI. It does this in exactly the way specified in RFC2396, Section 5.2.

So that's cool, but how does it explain the unexpected behavior I mentioned before?

URI.join("https://www.honeybadger.io", "plans", "change")
=> "https://www.honeybadger.io/change"

Let's step through it. The code above is equivalent to:

URI.parse("https://www.honeybadger.io/plans").merge("change")

The code above attempts to resolve the relative URI, "change" against the absolute URI "https://www.honeybadger.io/plans".

To do this, it follows RFC2396, Section 5.2.6, which states:

a) All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.

b) The reference's path component is appended to the buffer string.

Let's play along:

Copy everything but the final segment of the absolute URL. That gives me "https://www.honeybadger.io/"
Append the relative path, resulting in "https://www.honeybadger.io/change"

The world makes sense again!

Conclusion

While URI.join can be used to build URLs from various path fragments, that's not really what it's designed to do. It's designed to do something a little more complicated: recursively merge URIs per the standards specified in the RFC.

As for my personal project — building URLs to use in redirects to our new sales pages — well, I just used Array#join instead. :)

EDIT 8/12/2016: After publishing this article I received a couple of tweets suggesting I use File.join for this purpose. This has the benefit of avoiding double slashes, ie. /my//path but will break on OSs like Windows, where the path separator isn't a forward-slash.

Why is URI.join so counterintuitive?

The misunderstanding

Conclusion

Get the Honeybadger newsletter

Try Honeybadger for FREE

More articles