Features

Find dead links everywhere in the shop and on the website

Website Monitoring Magazine

The heart of HTML are links. They connect different pages with each other. Internal and external. This proves that these functions are very important for at least two reasons:

  • Users are quickly annoyed: When you click on a link, you want to know what's behind it. If you then get the "page not found" response, you'll think twice about trusting the creators of the page.

  • SEO: It is important for Google and Co. to recognize connections in order to better evaluate the importance of pages. On the web, the connections are called links. So if you want subpages to be found, they have to be linked cleanly.

koality.io Crawler

koality.io offers the possibility to continuously check the most important pages for dead links. This way you can find almost all technical errors, which prevents the worst case. Editorial errors are rather not found in this way.

This is also the reason why koality.io provides very extensive crawls. With this, up to 500 subpages can be crawled for issues. We have now supplemented these crawls with a "dead link" crawler. This crawler, once started, automatically searches for links that lead to nowhere.

Server responses - status codes

We find dead links. But what actually is a dead link? In our context, any link that points to a page that cannot be used by the user. Technically speaking, let's look at the HTTP status code for this.

  • Page not found (404 - Not found): This is the classic and will probably occur most often. These are pages that do not exist or have been deleted. Often happens after moving a page or with external links.

  • Server Error (500 - Internal Server Error): These are the most exciting errors from a technical point of view. For some reason the application crashed completely when called. These are then errors that should be immediately reported to a technician or the agency.

  • The service is currently unavailable (503 - Service Unavailable): This error often happens when the web server is under too high load. In other words, too many people are on the site at the same time and the computers are no longer able to answer all the requests.

  • Too many simultaneous requests (429 - Too many Requests): Some servers have built-in protection mechanisms, such as a WAF (Web Application Firewall). These notice that, for example, an attack from outside is happening and close the connection as a precaution. Since such a crawl sometimes feels similar to such a firewall (many requests from the same user), this mechanism can also strike here from time to time.

  • No Answer (0 - No Answer): Sometimes the pages you link to simply don't exist anymore. In this case there is no server to answer. We have turned this into a status code of 0.

If you like this article, please subscribe to our newsletter. After that you will not become one of our articles about monitoring and agencies.

Yes, I want to subscribe to your newsletter

Technical details

Technically, the koality.io crawler makes cUrl requests against all URLs it finds on a page. This is done in parallel, but only with a maximum of 3 simultaneous requests. So we simulate three users clicking quickly through the page.

We render the pages that we subject to analysis using a Chrome browser. This way we can ensure that even pages that rely heavily on JavaScript cleanly reveal their links.

Pro tip: Log Files

While the crawler is running, it can be very useful to have the log files open and see what is happening. Sometimes it is the case in modern applications that errors are suppressed in order not to irritate the user or to make systems more stable. In the log files you can see the whole truth.

Caution!

Crawls, like this one, create load on the server. This is quite normal and is in the nature of crawling a website. koality.io tries to do this as gently as possible. Nevertheless, we recommend performing large crawls only during peak usage times. At times when you can be sure that the server infrastructure can withstand this load. Therefore, you should always do the first crawl at unimportant times and see how the system behaves.

It's nice that you are reading our magazine. But what would be even nicer is if you tried our service. koality.io offers extensive Website monitoring especially for web projects. Uptime, performance, SEO, security, content and tech.

I would like to try koality.io for free