The announcement from Yahoo!, Live & Google that they will be supporting a new "canonical url tag" to help webmasters and site owners eliminate self-created duplicate content in the index is, in my opinion, the biggest change to SEO best practices since the emergence of Sitemaps. It's rare that we cover search engine announcements or "news items" here on SEOmoz, as this blog is devoted more towards tactics than breaking headlines, but this certainly demands attention and requires quick education.
To help new and experienced SEOs better understand this tag, I've created the following Q+A (please feel free to print, email & share with developers, webmasters and others who need to quickly ramp up on this issue):
How Does it Operate?
The tag is part of the HTML header on a web page, the same section you'd find the Title attribute and Meta Description tag. In fact, this tag isn't new, but like nofollow, simply uses a new rel parameter. For example:
This would tell Yahoo!, Live & Google that the page in question should be treated as though it were a copy of the URLwww.seomoz.org/blogand that all of the link & content metrics the engines apply should technically flow back to that URL.
The Canonical URL tag attribute is similar in many ways to a 301 redirect from an SEO perspective. In essence, you're telling the engines that multiple pages should be considered as one (which a 301 does), without actually redirecting visitors to the new URL (often saving your dev staff considerable heartache). There are some differences, though:
Whereas a 301 redirect re-points all traffic (bots and human visitors), the Canonical URL tag is just for engines, meaning you can still separately track visitors to the unique URL versions.
A 301 is a much stronger signal that multiple pages have a single, canonical source. While the engines are certainly planning to support this new tag and trust the intent of site owners, there will be limitations. Content analysis and other algorithmic metrics will be applied to ensure that a site owner hasn't mistakenly or manipulatively applied the tag, and we certainly expect to see mistaken use of the tag, resulting in the engines maintaining those separate URLs in their indices (meaning site owners would experience the same problems noted below).
301s carry cross-domain functionality, meaning you can redirect a page at domain1.com to domain2.com and carry over those search engine metrics. This is NOT THE CASE with the Canonical URL tag, which operates exclusively on a single root domain (it will carry over across subfolders and subdomains).
Over time, I expect we'll see more differences, but since this tag is so new, it will be several months before SEOs have amassed good evidence about how this tag's application operates. Previous rollouts like nofollow, sitemaps and webmaster tools platforms have all had modifications in their implementation after launch, and there's no reason to doubt that this will, too.
How, When & Where Should SEOs Use This Tag?
In the past, many sites have encountered issues with multiple versions of the same content on different URLs. This creates three big problems:
Search engines don't know which version(s) to include/exclude from their indices
Search engines don't know whether to direct the link metrics (trust, authority, anchor text, link juice, etc.) to one page, or keep it separated between multiple versions
Search engines don't know which version(s) to rank for query results
When this happens, site owners suffer rankings and traffic losses and engines suffer lowered relevancy. Thus, in order to fix these problems, we, as SEOs and webmasters, can start applying the new Canonical URL tag whenever any of the following scenarios arise:
While these examples above represent some common applications, there are certainly others, and in many cases, they'll be very unique to each site. Talk with your internal SEOs or SEO consultants to help determine whether, how & where to apply this tag.
What Information Have the Engines Provided About the Canonical URL Tag?
Quite a bit, actually. Check out a few important quotes fromGoogle:
Is rel="canonical" a hint or a directive?
It's a hint that we honor strongly. We'll take your preference into account, in conjunction with other signals, when calculating the most relevant page to display in search results.
Can I use a relative path to specify the canonical, such as<link rel="canonical" href="product.php?item=swedish-fish" />?
Yes, relative paths are recognized as expected with the<link>tag. Also, if you include a<base>link in your document, relative paths will resolve according to the base URL.
Is it okay if the canonical is not an exact duplicate of the content?
We allow slight differences, e.g., in the sort order of a table of products. We also recognize that we may crawl the canonical and the duplicate pages at different points in time, so we may occasionally see different versions of your content. All of that is okay with us.
What if the rel="canonical" returns a 404?
We'll continue to index your content and use a heuristic to find a canonical, but we recommend that you specify existent URLs as canonicals.
What if the rel="canonical" hasn't yet been indexed?
Like all public content on the web, we strive to discover and crawl a designated canonical URL quickly. As soon as we index it, we'll immediately reconsider the rel="canonical" hint.
Can rel="canonical" be a redirect?
Yes, you can specify a URL that redirects as a canonical URL. Google will then process the redirect as usual and try to index it.
What if I have contradictory rel="canonical" designations?
Our algorithm is lenient: We can follow canonical chains, but we strongly recommend that you update links to point to a single canonical page to ensure optimal canonicalization results.
• The URL paths in the <link> tag can be absolute or relative, though we recommend using absolute paths to avoid any chance of errors.
• A <link> tag can only point to a canonical URL form within the same domain and not across domains. For example, a tag on http://test.example.com can point to a URL on http://www.example.com but not on http://yahoo.com or any other domain.
• The <link> tag will be treated similarly to a 301 redirect, in terms of transferring link references and other effects to the canonical form of the page.
• We will use the tag information as provided, but we’ll also use algorithmic mechanisms to avoid situations where we think the tag was not used as intended. For example, if the canonical form is non-existent, returns an error or a 404, or if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred.
• The tag is transitive. That is, if URL A marks B as canonical, and B marks C as canonical, we’ll treat C as canonical for both A and B, though we will break infinite chains and other issues.
This tag will be interpreted as a hint by Live Search, not as a command. We'll evaluate this in the context of all the other information we know about the website and try and make the best determination of the canonical URL. This will help us handle any potential implementation errors or abuse of this tag.
You can use relative or absolute URLs in the “href” attribute of the link tag.
The page and the URL in the “href” attribute must be on the same domain. For example, if the page is found on “http://mysite.com/default.aspx”, and the ”href” attribute in the link tag points to “http://mysite2.com”, the tag will be invalid and ignored.
However, the “href” attribute can point to a different subdomain. For example, if the page is found on “http://mysite.com/default.aspx” and the “href” attribute in the link tag points to “http://www.mysite.com”, the tag will be considered valid.
Live Search expects to implement support for this feature sometime in the near future.
What Questions Still Linger?
A few things remain somewhat murky around the Canonical URL tag's features and results. These include:
The degree to which the tag will be trusted by the various engines - will it only work if the content is 100% duplicate 100% of the time? Is there some flexibility on the content differences? How much?
Will this pass 100% of the link juice from a given page to another? More or less than a 301 redirect does now? Note that Google's official representative from the web spam team, Matt Cutts, said today that it passes link juice akin to a 301 redirect but also noted (when SEOmoz's ownGillian Muessigasked specifically) that "it loses no more juice than a 301," which suggests that there is some fractional loss when either of these are applied.
The extent of the tag's application on non-English language versions of the engines. Will different levels of content/duplicate analysis and country/language-specific issues apply?
Will the engines all treat this in precisely the same fashion? This seems unlikely, as they'd need to share content/link analysis algorithms to do that. Expect anecdotal (and possibly statistical) data in the future suggesting that there are disparities in interpretation between the engines.
Yahoo! strongly recommends using absolute paths for this (and, although we've yet to implement it, SEOmoz does as well, based on potential pitfalls with relative URLs), but the other engines are more agnostic - we'll see what the standard recommendations become.
Yahoo! also mentions the properties are transitive (which is great news for anyone who's had to do multiple URL re-architectures over time), but it's not clear if the other engines support this?
Live/MSN appears to have not yet implemented support for the tag, so we'll see when they formally begin adoption.
Are the engines OK with SEOs applying this for affiliate links to help re-route link juice? We'd heard at SMX East from a panel of engineers that using 301s for this was OK, so I'm assuming it is, but many SEOs are still skeptical as to whether the engines consider affiliate links as natural or not.
Looking to optimize your site for a better ranking? Have a second thought while writing your Title Tag. Title Tags are used to label your web pages. The title tag helps your web page to be listed in search results. So the more relevant your title tag, higher will be the ranking of your site.
Keywords: Use some of your keywords to write your title tag. Also arrange your keywords in the order of their importance. The most important keyword should come first in the title tag. Eg: if your keywords are Outsourcing SEO, Internet marketing, PPC Services ,
Your title tag can be: <TITLE>Outsourcing SEO services, PPC Services India </TITLE>
Avoid Keyword Stuffing: Do not stuff your Title Tag with keywords.
Word Limit: Try to write the title in 6 to 12 words. Search engines require your title to be not more than 65 characters (including spaces). If you have a longer Title, the text gets cut off.
Common Words: Use words which a common man would type while searching for your site. Words which are too technical should be avoided.
Content: Your title tag should be catchy and as unique as possible. Each web page of your site should have a different Title Tag.
Inner pages or subpages are the pages that are included inside your website and they generally talk in detail about some specific product or service that you provide. Till lately SEO experts were focusing only on the Home / Landing page of your site. With Search Engines now giving more importance to inner pages for the ranking of your site, optimizing the inner pages have also become important.
General rules which apply to Landing page optimization are also applicable to inner pages.
Content: The content is crucial. It is here where the visitor can know more about your particular product/service. Ensure it is grammatically correct and written in a language which your visitor can understand.
Keywords: Being an inner page has an advantage. As this page talks only about a specific product/service, you can use more keywords describing that particular product/service. This would give you greater visibility.
Links: Link your inner pages to other pages of the site for easy navigation. It makes it easier for visitors to move inside your site.
Sitemap: Submit a sitemap directly to search engines. This enables the spiders to crawl on all the pages of your site.
Blog: Regularly update your blog and link this blog to your inner pages. This would drive traffic to the inner pages.
Visual Appeal: Ensure that you inner pages are appealing to the visitor and in sync with the Home Page and your Brand Image.