Title: Crawl Errors Affect the Website Rank
1(No Transcript)
2What does crawl error mean
- During crawls, search engines encounter errors
that prevent them from accessing your page. The
bots that index your pages will not be able to
read your content due to these errors. - In the legacy version of Google Search Console,
crawl errors are reported in a report called
Crawl Errors. - Two main sections make up the Crawl Errors
report - Site errors Googlebot is unable to access your
entire site due to these errors. - URL errors Googlebot cannot access a certain URL
when it encounters this error. - As of the latest Google Search Console version,
errors will be displayed per URL under Reports,
Index Coverage.
3Site errors
- It also displays how many indexings have taken
place over time, according to the new Search
Console Index Coverage section. - Issues theyve run into and whether theyve been
resolved by you - Googles index of valid pages
- Pages not indexed by Google
- When Google indexes some valid pages but finds
some errors - Now Lets elaborate on the types of the crawl
error report.
A websites crawl errors block your site from
being accessed by the search engine bot. The most
common reasons are
You cant communicate with a search engine if
this happens. Your website could not be accessed
if it is down, for instance. Most of the time,
this issue is temporary. If Google doesnt crawl
your site right away, it will do so later. Google
probably has tried a couple of times and hasnt
been able to crawl your site after seeing crawl
errors in your Google Search Console.
4This means the bot couldnt access your website
if your search console results show server
errors. A timeout could have occurred. The
website was unable to load so quickly that the
search engine presented an error message. The
page may not load due to flaws in your code. The
server might also be overwhelmed by all the
requests from your site.
To find out if there are any parts of your
website you dont want to be indexed, Googlebot
crawls your robots.txt file before crawling your
website. The crawl will be delayed if that bot
cant reach the robots.txt file. Be sure to
always have it accessible.
There you have it, explaining a bit more about
your sites crawl errors. We will now look at how
specific pages might result in crawl errors.
5URL errors
In a nutshell, URL errors result from crawl
errors when bots attempt to spider a particular
webpage. Whenever we talk about URL errors, we
usually begin by discussing 404 Not Found errors.
These types of errors should be checked
frequently (using Google Search Console or Bing
Webmaster tools) and fixed. You can use the 410
page if the page/subject has been removed from
your website and is never expected to return.
Please use a 301 redirect instead of a similar
page if your content is similar on another page.
As well as ensuring your sitemap is up to date,
make sure your internal links are working. The
most common cause of these URL errors, by the
way, is internal links. Consequently, you are
responsible for many of these issues. You can
also adjust or remove inbound links to the
removed page if you remove the page from your
site at some point. These links are no longer
relevant or useful. This link remains the same,
so it will be found and followed by a bot, but it
will fail to return results (404 Not Found). This
should appear on your site. Keep your internal
links up to date!
6Very specific URL errors
Occasionally, URL errors appear on certain
websites only. To show them separately, Ive
listed them below
- URL errors specific to mobile devices
- Viruses and malware errors
Mobile device crawl errors are based on
page-specific errors. Mobile devices crawl errors
usually do not surface on responsive websites.
You may just want to disable Flash content for
the time being. By maintaining a separate mobile
subdomain like m.example.com, you may encounter
more errors. Your desktop site might be
redirecting to your mobile site through an
incorrect redirect. It is even possible to block
parts of these mobile sites by adding a
robots.txt file.
This means that Google or Bing has discovered
malicious software on that URL, if you encounter
malware errors in your webmaster tools. In other
words, it could mean that software has been
discovered that is being used, such as, for
gathering data or to interfere with their
operations.(Wikipedia). Remove the malware found
on that page.
- There are errors in Google News
Certain Google News errors. It is possible for
your website to receive these crawl errors if it
is in Google News. Google documents these errors
quite well. Your website may contain errors
ranging from the absence of a title to the fact
that no news article seems to be present. Make
sure to examine your site for such errors.
7How do you fix a crawl error
1. Using robots meta tag to prevent the page from
being indexed
During this process, your pages content will not
even be seen by the search bot, which moves
directly to the next page. If your page contains
the following directive, you can detect this
issue
2. Links with Nofollow
In this case, the content of your page will be
indexed by the crawler but links will not be
followed.
83. Blocking the pages from indexing through
robots.txt
The robots start by looking at your robots.txt
file. Here are some of the most frustrating
things you can find The websites pages will not
be indexed since all of them are blocked. The
site may be blocked only on some pages or
sections, for example As a result, no product
descriptions will be indexed in Google for pages
in the Products subfolder. Users, as well as
crawlers, are adversely affected by broken links.
A crawl budget is spent every time a search
engine indexes a page (or tries to index it).
Broken links mean that the bot wont be able to
reach relevant and quality pages because it will
be wasting its time indexing broken links.
4. Problems with the URL
The most common cause of URL errors is a typo in
the URL you add to your page. Check all the links
to be sure they are correctly typed, and spelled
correctly.
96. Restricted pages
7. Problems with the server
5. Out-of-date URLs
There is a chance that these pages are only
accessible to registered users if many of your
websites pages return, for instance, a 403 error
code. So that crawl budget is not wasted on these
links, mark them as nofollow.
There may be server problems if several 500
errors (for example, 502) occur. The person
responsible for the development and maintenance
of the website can fix them by providing the list
of pages with errors. Bugs or site configuration
issues that lead to server errors will be handled
by this person.
Its important that you double-check this issue
if youve recently upgraded to a new website,
removed bulk data, or changed the URL structure.
Ensure that none of your websites pages
reference deleted or old URLs.
8. Limited capacity of servers
Overloaded servers may be unable to handle
requests from users and bots. The Connection
timed out message is displayed when this occurs.
Only a website maintenance specialist can solve
the problem, since he or she will estimate
whether additional server capacity is necessary.
109. Misconfigured web server
There are many complexities involved in this
issue. While you can see the site properly as a
human, the site crawlers receive an error
message, and all of the pages cease to be
crawled. Certain server configurations can cause
this A web application firewall will block
Google bot and other search bots by default. To
summarize, this problem must be solved by a
specialist, with regard to all its related
aspects. Crawlers base their first impressions on
the Sitemap and robots.txt. By providing a
sitemap, you are telling search engines how you
would like them to index your web page. Here are
a few things that can go wrong once your
sitemap(s) are indexed by the search engine.
10. Errors in format
A format error can be due to an invalid URL, for
instance, or to a missing tag. The sitemap file
may also be blocked by robots.txt (at the very
beginning). The bots were therefore unable to
access the sitemaps content.
1111. Sitemap contains incorrect pages
Getting to the point, lets go over the content.
The relevance of URLs in a sitemap can still be
estimated, even if you arent a web developer.
Review your sitemap very carefully and ensure
that each URL in the sitemap is relevant,
current, and correct (no typos or misspellings).
If bots cannot crawl the entire website due to a
limited crawl budget, sitemap indications can
guide them towards the most valuable pages. Dont
put misleading instructions in the sitemap make
sure that robots.txt or meta directives are not
preventing the bots from indexing the URLs in
your sitemap. This category of problems is the
most challenging to resolve. As a result, we
suggest you complete the previous steps before
you proceed with the next step. Crawlers may
become disoriented or blocked by these problems
in the site architecture.
1212. Problems with internal linking
- A correctly structured website allows the
crawlers to easily access each page by forming an
indissoluble chain. - There is no other page on the website linking to
the page you want to rank. Search bots will not
be able to find and index it this way. - An excessive number of transitions leading from
the main page to the page you want to be ranked.
Theres a possibility that the bot will not find
it if a transition has more than four links. - In excess of 3000 links on a single page (too
many links to crawl for a crawler). - Link locations are hidden behind inaccessible
elements of the site forms to fill out, frames,
plugins (Java and Flash first of all). - There is rarely a quick fix for an internal
linking problem. Working with website developers
requires us to look at the site structure in
depth.
1314. Slow loading time
13. Incorrect redirects
- A redirect is needed to direct visitors to a more
appropriate page (or, better yet, the one the
website owner feels is appropriate). Here are
some things you may overlook regarding redirect - Using 302 and 307 redirects instead of permanent
ones is a signal to the crawler to keep returning
to the page repeatedly, wasting the crawl budget.
As a result, when using the 301 (permanent)
redirect, the original page doesnt need to be
indexed anymore, use the 301 (permanent) redirect
for it. - Two pages may be redirected to each other in a
redirect loop. Thus, the crawl budget is wasted
as the bot gets caught in a loop. Look for
possible mutual redirection and remove it if it
exists.
You will see your crawler go through your pages
faster if your pages load quickly. Every
millisecond counts. Load speed is also correlated
to the position of a website on SERP. Check your
websites speed with Google PageSpeed Insights.
It can be affected by a number of factors if the
load speed is deterring users. Website
performance can be slow due to server-side
factors the available bandwidth isnt adequate
anymore. Please consult your price plan
description to find out how much bandwidth you
have available. A very common issue is
inefficient code on the front-end. You are at
risk if the website contains a large number of
scripts or plug-ins. Be sure to check regularly
that your photos, videos, and content related to
them load quickly, and that the page doesnt load
slowly.
1415. Poor website architecture leading to
duplicate pages
- 11 Most Common On-site SEO Issues by SEMrush
reveals that duplicate content is the cause of
50 of site failures. This is one of the main
reasons you run out of the crawl budget. A
website is only given a certain amount of time by
Google, so its not appropriate to index the same
content over and over. Additionally, the site
crawlers arent aware of which page to trust
more, so the wrong copy may be given priority
unless you use canonicals to reverse the process. - There are several ways you can fix the problem by
identifying duplicate pages and preventing them
from crawling - Eliminate duplicate pages
- Parameterize robots.txt as necessary
- Meta tags should contain the necessary parameters
- Put a 301 redirect in place
- Make use of relcanonical
1516. Misuse of JavaScript and CSS
17. Content created in Flash
The use of Flash can be problematic for SEO (most
mobile devices do not support Flash files) and
for user experience. Flash elements are not
likely to be indexed by crawlers due to their
text content and links. Therefore, we recommend
that you dont use it on your website.
Googles official statement in 2015 was So long
as you do not block JavaScript or CSS from being
crawled by Googlebot, your web pages will be
rendered and understood the same way as a modern
web browser. It isnt relevant for other search
engines (Yahoo, Bing, etc.) though. Moreover,
generally implies that indexation may not be
guaranteed in some cases.
18. Fragments in HTML
There is both good and bad news when it comes to
your site having frames. This is probably a sign
of how mature your site is. Since HTML frames are
extremely outdated and poorly indexed, you should
replace them as soon as possible.
16(No Transcript)