http://www.thegooglecache.com/white-hat-seo/measuring-link-index-freshness/
Disclosure: I work for Moz.
One of the hardest metrics to compare link indexes on is their percentage of “live links”. At first, it seems like a simple endeavor – just download the backlinks and check if they are live. It turns out that the process is far more complicated for a number of reasons.
- Redirect chains may not include the target domain in the URL, and there is no way to know which outlink begins the redirect chain without following every outlink on that linking page.
- Links aren’t always links – does the index count form action as a link? iframe? embed src?
- Linking sites might be temporarily down (do you count that as a missing link or ignore it). How do you know if it is temporary or permanent?
- Different indexes have different crawler types (Ahrefs gets some Javascript, which means you need to check with a headless browser, but the setup can affect what is rendered)
- Link indexes offer a variety of depths – Majestic Historical and Majestic Fresh, Ahrefs Historical, Ahrefs Fresh and Ahrefs Live. Moz simply has Link Explorer.
- The API sorts determine which link you pull from a root linking domain. This can be very problematic.
Ultimately, it means that a lot of data scrubbing needs to be done in order to get to a trustworthy list of links and a lot of caveats need to be described in order to get as close to apples-to-apples as you can.
Today I completed a very modest study of 30 domains’ backlinks in Ahrefs Live (which powers their API) and Moz. Ahrefs Live has the highest % of Live Links of the Ahrefs offering because they have a strict recrawl speed governing what links make their way into that particular index. In order to complete this test, I had to establish several flags and filters both with Moz and Ahrefs.
- I excluded any redirect chains
- I only included followed links
- I sorted by PA and AhrefsRank and selected 1 link per domain on each. This is a huge caveat because the sorts determine what link you get. For example, if AhrefsRank last saw a link on domain.com/article as well as domain.com/category and thinks the category page has a higher AhrefsRank, then it will return that link. If Moz has the exact same index, but thinks the article page has a higher PA, it will return that rank. If the link moves off the category page, Ahrefs would be punished and Moz would be rewarded even though both have the same links in their index.
- I only counted <a> links
- I set a timeout of 30 seconds and a connection timeout of 15 seconds. Any URL that failed was thrown out of the test – the index was not punished.
- I set the flag to not_deleted on the Moz API in order to get the list of links we believe are live, in the same way that Ahrefs Live believes their links are live.
So, what were the results of this small test on 30 sites? Well, they were unsurprising, in my opinion.
- Ahrefs Live has a higher percentage of live links than Moz Link Explorer, with 79% and 69% respectively.
- Moz has a higher number of live links in its index than Ahrefs Live, with an estimated 9124 live root linking domains in Moz vs. 6171 live root linking domains in Ahrefs Live. This estimate was made by multiplying the live percentage rate by the number of reported root linking domains according to each index.
This is to be expected. If an index intentionally restricts its membership to recently crawled URLs, it is going to have a higher rate of live links than one that does not. And, on the flip side, if an index has a looser restriction on recentness of crawl, then it will have more total live links because some live links would be excluded from the tighter index just because they hadn’t been recrawled (assuming all other things are equal).
It is exciting to see the differences between the indexes because it means there is value to be gained from both. I’ll say the same thing I have been saying for years – you should use all the link indexes you can afford.
No tags for this post.