Why do mobile browsers share canonical URLs?

Published Jan 10, 2023 · Topics: Browsers, Interaction design

It did it again the other day.

I had tucked some research leads from Google Books — a website that lets you perform searches in the content of digitized books, many very old — into Safari's Reading List on my phone, for later reference. A while later, when it came to following the breadcrumbs I had left for myself back to the insight they were meant to help germinate, lo and behold — all links to Google Books were missing my search queries. What had I found in these now unfamiliar books?

Since it was not the first occurrence of the sort, I've had a low-key suspicion for a while that when you share or bookmark a URL in iOS Safari, it takes into account the page's canonical URL whenever it finds one. I've been content to just shrug off the occasional mishap and move on, but this time I took the opportunity to dig into the subject.

Some background on canonical URLs

The canonical URL is meant to convey, as per the HTML spec, the preferred URL for the current document. The introductory paragraphs from RFC6596: The Canonical Link Relation tell us most of what we need to know about its intent:

The canonical link relation specifies the preferred IRI from resources with duplicative content. Common implementations of the canonical link relation are to specify the preferred version of an IRI from duplicate pages created with the addition of IRI parameters (e.g., session IDs) or to specify the single-page version as preferred over the same content separated on multiple component pages.

In regard to the link relation type, "canonical" can be described informally as the author's preferred version of a resource. More formally, the canonical link relation specifies the preferred IRI from a set of resources that return the context IRI's content in duplicated form. Once specified, applications such as search engines can focus processing on the canonical, and references to the context (referring) IRI can be updated to reference the target (canonical) IRI.

A canonical URL can be defined either through a Link HTTP header or, more commonly, via a <link> HTML element:

<!doctype html>
<html lang='en'>
<head>
	<title>Welcome to my website</title>
	<link rel='canonical' href='https://danburzo.ro/'>
</head>
</html>

How did browsers end up using canonical URLs?

As described in the introduction to RFC6596, one major use case for canonical URLs is to help search engines make sense of several URLs that point to the same underlying content. This is also the angle Google uses to present the benefits of canonical URLs.

Sometime around 2017, this mechanism meant for machines was co-opted by browsers to address a very different problem: nudging users away from Google's AMP Viewer, and towards the original web pages it was caching. Safari 11 for iOS was soon followed by Chrome 64 for Android in favoring a page's canonical URL for certain interactions, such as sharing and bookmarking.

This change worked pretty well for that singular purpose: most article page proxied by the AMP Viewer could be unambiguously traced back to their original URL.

However, this was released as a general mechanism that affects any page that uses <link rel=canonical>. That's a lot of pages, more than half of the pages analyzed by HTTP Archive for this year's Web Almanac, including crowd favorite Wikipedia.org.

In the general case, the results are more of a mixed bag.

The pros and cons of sharing canonical URLs

Some effects are decidedly positive: various pieces of user tracking gunk are stripped from URLs before they're passed to friends, as these are generally not featured in the page's canonical URL. By design or chance, among the things removed from URLs is the odd personally-identifying piece of information, so you could build the case that the feature helps protect the user's privacy.

But things that are beneficial to a search engine don't always match user needs. In fact, they can even clash.

Take filters on an e-commerce website as an example. For search engines, it's useful to know that various filtering criteria and sorting options, most often reflected as query parameters in the URL, reflect the same underlying content. It makes sense, from an SEO perspective, to lump all combinations together under a single canonical URL.

For a user, on the other hand, the specific filtering criteria and sorting options are kind of the whole point, aren't they? When I browse my favorite bookstore's website for foreign books, in English, available in stock, sorted by most recent first and I commit that URL to my bookmarks for quick access, my intent is for the URL to be preserved at that exact level of specificity.

Even if you do manage to devise a pattern that fulfills both user needs and SEO goals, you're not yet out of the woods. When using canonical URLs in any form on your website, you're implicitly signing up for:

remaining vigilant about updating the <link rel=canonical> element in response to all operations that may alter the URL, such as with the History API via pushState() or replaceState();
depending on the browser implementation, waving goodbye to using fragment identifiers. Since it's exclusive to the client, the URL fragment can't be reflected by the server in the canonical URL. It will, therefore, be stripped when sharing or bookmarking the page — unless the browser is proactive in keeping it, or the fragment is kept in sync on the canonical link with client-side JavaScript.

Speaking of browser implementations, let's see how they stack up at the moment of writing.

Current browser behavior

Note: I'm using this demo page to test browser behavior, where the original URL is, depending on the things you click on, of the form https://danburzo.ro/demos/canonical-link.html?hello=world#a. Its canonical URL is defined as https://danburzo.ro/demos/canonical-link-canonical.html via the <link rel=canonical> element. Notice the different HTML file name and lack of query string and fragment.

Desktop browsers seem to be doing a good job of using the original URL when sharing and bookmarking a web page. Across the major mobile platforms and browsers, the situation is more diverse.

On Android:

Samsung Internet 19 and Firefox Android 108 share and bookmark the original, intact URL.
Chrome Android 108 bookmarks the original URL. When sharing the web page, it uses the canonical URL, but picks up the fragment identifier from the original URL. This is an enhancement introduced around 2021 [Chromium#1038187] that makes sharing canonical URLs a little less problematic.
Edge Android 108: no data, as I couldn't find an .apk to load on the device (test data appreciated!)

On iOS:

Firefox iOS 108 shares and bookmarks the original URL.
Safari iOS 16.2 uses the canonical URL for both sharing and bookmarking but loses the fragment identifier from the original URL [WebKit#250317], including the text fragment links for which Safari 16.1 has recently added support.
Chrome iOS 108, in an effort to match Safari [Chromium#1323782], works a bit worse than its Android counterpart: it shares the canonical URL, but without copying over the fragment identifier from the original URL. When bookmarking, the original URL is used. (Add to Reading List is weird: it's made to match Safari's bookmarking behavior since it's accessed from the same sharing panel, even though the page gets added to Chrome's own Reading List.)
Edge iOS 108, like Safari and Chrome, shares the canonical URL without copying over the fragment identifier from the original URL. When bookmarking, the original URL is used.

A summary of mobile browser behavior in regards to sharing and bookmarking URLs in the presence of a canonical link, January 2023.
Browser		Shared URL	Bookmarked URL
Android	Firefox	Original	Original
	Samsung Internet	Original	Original
	Chrome	Canonical, fragment kept	Original
	Edge	Missing data	Missing data
iOS	Firefox	Original	Original
	Safari	Canonical, fragment lost	Canonical, fragment lost
	Chrome	Canonical, fragment lost	Depends, see notes
	Edge	Canonical, fragment lost	Original

How do browser vendors feel about the status quo?

Chrome seems to get a steady stream of issue reports about the "wrong URL" being shared or copied to the clipboard — see, for example, Chromium#799955, Chromium#924309 — to which the resolution is invariably works as intended along with a note that the team is reconsidering its approach to using canonical URLs in light of its tradeoffs. The conversation is framed more around erroneous usage of <link rel=canonical> by authors — see conversations in Chromium#988497, Chromium#1202789, Chromium#1306663 — than on any fundamental flaw or limitation with the approach itself, but it's a start.

Over at Mozilla arguments for, or against, using the canonical URL for sharing [Mozilla#1794879] and bookmarking [Mozilla#502418] have not yet truly taken off.

Possible solutions

I don't have much in the way of alternative solutions, but I tend to err on the side of user agency.

For each browser on each mobile platform, the sharing UI is flexible enough to accommodate a persistent user preference. Pictured below, Safari 16.2's sharing panel already has an Options section that could very well let you tweak how a link gets shared or bookmarked.

The iOS Safari sharing panel floats above the web page content. In the panel's header, underneath the page title and domain name, a button labeled Options. — The sharing panel in Safari 16.2.

Finally, an invitation: I've tried to find and link all browser issues relevant to the topic of user-facing canonical URLs. If you have an opinion or a piece of anecdata, either in support or against their usage, please share it with browser vendors.

Thanks to Šime for the useful feedback.

Miscellaneous bits and bobs

In the process of testing browser behaviors, I also logged issues for a couple of things I noticed in macOS Safari: that a page whose defined canonical URL differs from its actual URL will never be marked as visited [WebKit#250319], and that clicking an in-page link to a text fragment does not update the URL fragment [WebKit#250320].