Why is URI.join so counterintuitive?
We just reached a milestone here at Honeybadger. Our sales pages are no longer part of our main Rails app. It's been on my wish list for years, but not exactly top priority.
As part of this migration, I found myself using
URI.join to construct particular redirect links. But I quickly ran into a problem.
URI.join wasn't behaving as I expected.
I expected it to take a bunch of path fragments and string them together like so:
# This is what I was expecting. It didn't happen. URI.join("https://www.honeybadger.io", "plans", "change") => "https://www.honeybadger.io/plans/change"
join method did is much stranger. It dropped one of my path fragments, only using the last one, "change."
# This is what happened. URI.join("https://www.honeybadger.io", "plans", "change") => "https://www.honeybadger.io/change"
So why the heck does it work like this?
It turns out that I was expecting
URI.join to behave similarly to a specialized version of
Array#join, taking URL fragments and combining them to make a whole URL.
That's not what it does. Big surprise.
If we take a look at the
join method's code, we see that it just iterates over all arguments, and calls
merge on each.
# File uri/rfc2396_parser.rb, line 236 def join(*uris) uris = convert_to_uri(uris) uris.inject :merge end
The merge method does two things:
- It converts your string like "pages" into a relative URI object.
- It tries to resolve the relative URI on to the base URI. It does this in exactly the way specified in RFC2396, Section 5.2.
So that's cool, but how does it explain the unexpected behavior I mentioned before?
URI.join("https://www.honeybadger.io", "plans", "change") => "https://www.honeybadger.io/change"
Let's step through it. The code above is equivalent to:
The code above attempts to resolve the relative URI, "change" against the absolute URI "https://www.honeybadger.io/plans".
To do this, it follows RFC2396, Section 5.2.6, which states:
a) All but the last segment of the base URI's path component is copied to the buffer. In other words, any characters after the last (right-most) slash character, if any, are excluded.
b) The reference's path component is appended to the buffer string.
Let's play along:
- Copy everything but the final segment of the absolute URL. That gives me
- Append the relative path, resulting in
The world makes sense again!
URI.join can be used to build URLs from various path fragments, that's not really what it's designed to do. It's designed to do something a little more complicated: recursively merge URIs per the standards specified in the RFC.
As for my personal project — building URLs to use in redirects to our new sales pages — well, I just used Array#join instead. :)
EDIT 8/12/2016: After publishing this article I received a couple of tweets suggesting I use
File.join for this purpose. This has the benefit of avoiding double slashes, ie.
/my//path but will break on OSs like Windows, where the path separator isn't a forward-slash.