- Analyzing Linkcheck output
- Format of the Linkcheck output
- Error codes in the Linkcheck report
- Priority of revisions
- Validity of Linkcheck results
Analyzing Linkcheck output
This page follows the previous page that included instructions on how to generate a file that includes Linkcheck output:
The Linkcheck report provides a list of internal and external links that were rejected or redirected. By examining this list, we can decide to modify or delete links in scanned files. However, this process is not straightforward, because not all the links that appear in the list are broken.
The information presented on this page has three major objectives:
- To present the format of Linkcheck output.
- To present a list of error codes and their implication for the usability of the link.
- To recommend a systematic approach for review of Linkcheck reports.
Format of the Linkcheck output
I attached a file that includes a sample output of the Linkcheck report for the Astronomy directory of this website. Each Linkcheck report includes a list of pages that have broken hyperlinks. The list of broken URLs appears under the URL of each page that was scanned.
The error code for the broken URL appears at the end of each URL in parentheses. If the URL is redirected to a different URL, then this URL appears in the next line.
Error codes in the Linkcheck report
To facilitate the explanation of error codes, I describe the communication between a client browser and the distant server that provides the requested information.
When a user clicks a link on a website, the browser (client, e.g., Chrome, Firefox) sends an HTTP request to the distant website server. This request includes the ID of the browser, formally called "user-agent", and the information (page, document, etc.) requested by the client.
The distant server responds by returning an "HTTP response" that includes three components: a status line, response headers, and an optional message body. The status line includes a three-digit number that indicates the result of the request (e.g., 200 for "OK", 404 for "Not Found").
Linkcheck scans each file for hyperlinks, and it sends an HTTP request to the address of each link, checking the availability of the resource. Similar to the response to a browser request, the server returns an "HTTP response" that includes a three-digit status code informing the result of the request.
Below, I list a series of status codes that appear in my Linkcheck reports.
Code: connection failed
This is an HTTP response without a status code. Most links with this response code are functioning URLs. The lack of response to the HTTP request of Linkcheck may be due to the fact that Linkcheck is not a browser. Links that lead to this response should not be deleted before a second manual check.Code: 200
This code indicates that the request was accepted. But after this, the client (Linkcheck) may be redirected to a different address, as shown below:- (163:401) 'Israel I..' => https://www.osh.org.il/ (HTTP 200) - redirect path: - https://www.osh.org.il/ (301) - https://www.osh.org.il/heb/main (301) - https://www.osh.org.il/heb/main/ (200)Code: 301, 302, 307
These codes mean that the address of the item has been changed to a new URL. Linkcheck follows these referrals. However, they do not always lead to a valid address. On some websites, these re-directions may appear in a short series. URL re-directions slow speed of communication and such URLs should be updated.Code: 403
Formally, this code means that the client does not have access rights to the content. On manual entry in a browser, URLs that receive this response, display a brief "security verification" window and can be accessed using a browser. Therefore, URLs with a 403 code should not be deleted without a manual check.Code: 404
This code indicates that the server cannot find the requested resource. URLs that give this code should be modified or deleted.Code: 410
This code is seen rarely. It indicates that the requested content has been permanently deleted from server. URLs that give this code should be deleted.Code: 500
This code shows an Internal Server Error.Code: 503
This code indicates that the server is not available. In my experience, some URLs that give this response look fine in a browser.
Priority of revisions
If the website includes many directories with many files, I recommend setting up a schedule for checking sub-directories of the website. If the Linkcheck report includes many lines, I recommend revising first links that are easily corrected:
- Redirection from an address that starts with http://... to a secure https://...
- Redirection from a URL that starts with https://www... to a URL without "www" https://...
Secondly, URLs receiving a 404 or 410 code should be deleted. These codes are good examples of link-rot. If the link has an important content that you may wish to preserve, you may look up the address in the Wayback Machine at https://archive.org/ and, if appropriate, update the link.
Thirdly, you can manually check other redirected URLs and modify or delete them. URLs giving "connection failed" message should be checked last, as noted above.
Validity of Linkcheck results
To examine the validity of Linkcheck results, I examined the same directories using Xenu Link Sleuth in Windows. Xenu is an outstanding link checker for use in Windows. The results and the responses I got are similar/identical to Linkcheck results.
Final thoughts
Linkcheck is a very fast link checker. Its output is well organized. I recommend it to webmasters who are concerned about preventing link-rot in their websites. I hope that this page will be useful in your website maintenance.