Israel Science and Technology Directory

Internet Linkcheck

Analyzing Linkcheck output

This page follows the previous page that included instructions on how to generate a file that includes Linkcheck output:

The Linkcheck report provides a list of internal and external links that were rejected or redirected. By examining this list, we can decide to modify or delete links in scanned files. However, this process is not straightforward, because not all the links that appear in the list are broken.

The information presented on this page has three major objectives:

  1. To present the format of Linkcheck output.
  2. To present a list of error codes and their implication for the usability of the link.
  3. To recommend a systematic approach for review of Linkcheck reports.

Format of the Linkcheck output

I attached a file that includes a sample output of the Linkcheck report for the Astronomy directory of this website. Each Linkcheck report includes a list of pages that have broken hyperlinks. The list of broken URLs appears under the URL of each page that was scanned.

The error code for the broken URL appears at the end of each URL in parentheses. If the URL is redirected to a different URL, then this URL appears in the next line.

Error codes in the Linkcheck report

To facilitate the explanation of error codes, I describe the communication between a client browser and the distant server that provides the requested information.

When a user clicks a link on a website, the browser (client, e.g., Chrome, Firefox) sends an HTTP request to the distant website server. This request includes the ID of the browser, formally called "user-agent", and the information (page, document, etc.) requested by the client.

The distant server responds by returning an "HTTP response" that includes three components: a status line, response headers, and an optional message body. The status line includes a three-digit number that indicates the result of the request (e.g., 200 for "OK", 404 for "Not Found").

Linkcheck scans each file for hyperlinks, and it sends an HTTP request to the address of each link, checking the availability of the resource. Similar to the response to a browser request, the server returns an "HTTP response" that includes a three-digit status code informing the result of the request.

Below, I list a series of status codes that appear in my Linkcheck reports.

Priority of revisions

If the website includes many directories with many files, I recommend setting up a schedule for checking sub-directories of the website. If the Linkcheck report includes many lines, I recommend revising first links that are easily corrected:

Secondly, URLs receiving a 404 or 410 code should be deleted. These codes are good examples of link-rot. If the link has an important content that you may wish to preserve, you may look up the address in the Wayback Machine at https://archive.org/ and, if appropriate, update the link.

Thirdly, you can manually check other redirected URLs and modify or delete them. URLs giving "connection failed" message should be checked last, as noted above.

Validity of Linkcheck results

To examine the validity of Linkcheck results, I examined the same directories using Xenu Link Sleuth in Windows. Xenu is an outstanding link checker for use in Windows. The results and the responses I got are similar/identical to Linkcheck results.

Final thoughts

Linkcheck is a very fast link checker. Its output is well organized. I recommend it to webmasters who are concerned about preventing link-rot in their websites. I hope that this page will be useful in your website maintenance.

ADVERTISEMENT