Citation Labs Process and Tools for Large Scale Broken Link Building

I designed my “scraper suite” of tools (not the link prospector) as stand alone pieces of a Broken Link Building tool that just launched. What we found was that my process, and the nature of the “dead web” itself, makes it quite difficult to processitize large scale BLB in a “add-URLs-and-click-go” tool. There are so many small decisions and potential insights made along the way that automation obscures.

The process below is not elegant, but it does help you cover a lot of ground – you can find and analyze thousands of resources and root out the dead pages and sites quite quickly. I’ve been building 40+ links a month with this method for one of my clients, and selling BLB prospecting as a service to others. As I’ve said before, link rot is your friend.

Here’s the process and the tools that I use (video coming soon). I’m sharing them in the hopes that in a few weeks you can try them out, find the flaws and help make me much smarter and faster :)

1) Search for 10-100s of links pages
You guessed it – I use my link prospector for this. I use prospecting phrases that categorically represent the topic area I’m promoting and plug those right into the tool. If you don’t use the link prospector you could try the prospecting queries mentioned here… and tildes… lots of tildes :)

What I’m looking for are curated links pages where my content could fit – with light scrutiny – whether we found a broken link or not. This of course presupposes you’re promoting good quality, long form content for which there are hundreds of “curators.” If you search and each of the top 10 pages are good prospects then you’re in a good space for this effort.

I rarely spend much time “qualifying” the links pages these days. I used to go through hundreds of pages and pick only those I thought were had the highest relevance. Now I rely on my tools and common sense and just pass a big ol batch of links pages right on over to… step 2.

2) Scrape outbound links from the pages
Next, using my outbound link scraper (one of the 4 tools in my scraper suite), I scrape the outbound links from the batch of links pages I’ve found. Recently, from a batch of 427 links pages I extracted 12,635 resource URLs. Were each of the links and resources pages perfectly relevant to my subject matter? They were close enough. I needed to cover lots of ground!

3) Check the status of each OBL
Once extracting outbound links from the resource pages it’s time to check status. And, yup, we’ve got a tool for that in the scraper suite – the URL Status Checker. This tool checks and reports on the status of each URL, just as the name suggests. The tool DOES split large sets up into batches of 1000. This means the 12,000+ URL project will be in 12 separate CSVs. Yes, that’s a PITA, but that’s the row you’ve chosen to hoe here so put your back into it. Merge the CSVs from this tool and sort them to isolate the dead and non-responding pages from your set. This obviously can take some time.

And then…

4) Recheck the status of each dead OBL for final verification
Recheck because the tool’s not 100% accurate, and it’s heartbreaking further on in the process to THINK a site with 30k unique linking domains is dead when really it’s just responding slowly. So recheck everything that the tool reports dead. Sometimes I check three times, but that’s only when I’m feeling especially obsessive.

5) Gather metrics for the dead/nonresponding pages
Here’s where our new Metrics tool (utilizing Linkscape) comes into play. You will have to purchase “rows” in order to use this tool. Alternately you can use MajesticSEO’s bulk backlink checker, though they limit you to 300 rows. That’s one thing that’s awesome about our metrics suite – no limits ;). To access our metrics hover over “Metrics” and then click “Get Metrics.” Name your set and paste your dead sites and pages right in there. Poof – in minutes you now know which of the dead pages or sites are worth digging into.

6) Review dead pages by hand
Here’s where the intuition starts, and the slow combing through URLs. And yes, this is where automation falls apart at least at our current levels of coding and expertise. So, first things first, sort that list by number of inbound links to the page. This will give you a sense of the opportunities with the most links. You’ll need to add some columns to your spreadsheet so here are my thoughts on columns that will help your reviewing.

  • How is it “Dead?”: For many gov sites, for example, the non-www doesn’t redirect to the www. subdomain. And yet they sometimes have hundreds of linking domains pointed to the nonwww. Other times the site is a parked domain now. Still other times it’s just plain gone. All of this information is important for further on when you’re writing your outreach emails.
  • Topic: What is this dead page about, according to Does the site or page actually have anything to do with the site I’m promoting? I use MajesticSEO too at this point also to check out anchor text as that gives a good clue if Archive doesn’t have info.
  • How Big is the Op: is the whole site dead? just one section? just one page? All this needs to be recorded so you can figure out how to best pull backlinks for the opportunity.
  • What are we promoting? Is this a 1-1 replacement op (these are the best, whether you write it as you find it dead or already have something written). Is it a fix suggestion + similar-resource link request?

Be sure to save any and all dead sites you find with 100 or more links – even if you can’t use them now. You can always come back later or possibly come up with an angle down the road.

7) Pull and qualify dead backlinks
With our metrics tool you can also pull backlinks for the dead ops you’d like to pursue. Ah but I wish I could tell you it was as easy as taking all the links and heading over to the contact finder (one of the tools in our scraper suite). Nope – you’ve got to scrub this data or you’ll end up wasting your time. First and foremost I have to tell you to NOT get excited about apparently large lists of opportunities. I’ve found it quite common to start with 16000 unique linking domains and boil those down to 50 actual opportunities for outreach. So don’t tell clients how many ops you have for them UNTIL you’ve done your distilling – you will end up profoundly overselling what you have.

You can use our metrics suite for Linkscape data if you like and input each URL you’d like backlinks for. You then designate how many backlinks you want us to pull and pay us per backlink (1 backlink = a row). I like to use MajesticSEO for this too as they seem quite a bit fresher and more comprehensive. That said, MJ’s comprehensiveness comes with with a great deal more “scrubbing” required. Whether you use Linkscape data sourced through our metrics suite or Majestic you can use our scraper suite tools for boiling them down further.

  • URL Filter Tool: this is a regex tool. I use it to isolate the links and resource pages as these are often the best opportunities for what I promote. Input your backlinks into the input field and then add this string: (resources?|links?|faqs|(web([^/]*)?)sites?|references?) to the “Match Any” box. This will extract URLs that are most-likely to be links pages.
  • 1 URL Per Domain: once you have your probable links and resource pages you should double check that you have 1 per domain. Rarely is it “good form” to send 2 separate emails to a webmaster requesting links on 2 separate pages with broken links.
  • Backlink Checker: especially when using majestic’s data I like to double check that the pages still exist, and that the links we’re suggesting be changed are still actually on the pages. You can do this with the backlink checker that’s a part of the scraper suite. Just copy in the domain of the dead page and paste in your list of suspects. The tool will show you on which pages the link still lives on.

WOW. That’s right. Your list of prospects is now significantly diminished. I’m pretty happy when I still have 50-80 real linking prospects to a now-dead resource. I’ve found upwards into 500s before. But even so I wasn’t ready to outreach… I still had to…

8) Pull contact info
Yes the scraper suite DOES have a contact finder… but it won’t solve all your problems. In fact I find it only around 25-30% accurate for contact info on links and resource pages (much better than nothing for sure…). It’s great for blog contact finding, but the old, custom-CMS kinds of sites where links pages are commonly published don’t always have standard ways of displaying contact info. Further, there can be tens or even hundreds of email addresses discovered with the tool for a given domain. This is why I have 1 team member whose sole job is to process the contact finder reports. She has to select the best email address or contact page and, when the tool doesn’t find anything, she goes onsite to look for contact info. Lastly she delivers a spreadsheet with two columns: URL where the dead link appears and contact info.

Then, at last, we move on to…

9) Outreach
Much has been written about broken link building outreach (see the interviews in the BLB resource list). I like to keep things short and sweet. I tell them where the broken link is and provide code for fixing it. I only mention 1 dead link on the page. If we’re affiliated I say as much, though often assert that the persona had some hand in writing the content.

As for conversions… I’ll simply say that I’m pleased by a 5% conversion. I expect/plan/prospect for a 2% conversion. I got 15% conversion once on a small run, exact 1-1 replacement campaign in which the dead site had gotten parked. My outreach email told webmasters they were linking to a spam site. This is quite rare, but do note that the closer you can get to a 1:1 replacement the better your results will be. I’ve not yet done content recreation as a service so can’t speak to the effectiveness of that approach (I think it would be good, just haven’t yet set up campaigns in this way).

10) …but amidoinitrite?
Could be that I’m not, or that I could be more effective at certain stages. I’m coming nowhere near Napoleon Suarez’s 8-12% (what the hell happened to him anyways, are you out there Napoleon? BLB needs you!). It could be that by scaling, and not going links-page by links-page I’m sacrificing conversion rate for larger scale. Not sure. I’d love feedback and look forward to reading YOUR broken link building article down the road :)

Also, see 48 Broken Link Building Resources.

Leave a Reply