We use cookies
This site uses cookies from cmlabs to deliver and enhance the quality of its services and to analyze traffic..
We use cookies
This site uses cookies from cmlabs to deliver and enhance the quality of its services and to analyze traffic..
Last updated: Jan 05, 2023
Disclaimer: Our team is constantly compiling and adding new terms that are known throughout the SEO community and Google terminology. You may be sent through SEO Terms in cmlabs.co from third parties or links. Such external links are not investigated, or checked for accuracy and reliability by us. We do not assume responsibility for the accuracy or reliability of any information offered by third-party websites.
The process of indexing website pages can be monitored from the Google Search Console. In the process, there may be issues with pages, including "indexed though blocked by the robots.txt". In this guide, we will help you understand what there is to understand about this issue.
To avoid this problem, you can learn how to fix "indexed, though blocked by robots.txt". Check out the full discussion below.
A robots.txt file contains a collection of instructions used by web crawlers as a guide in the process of crawling a website. You can use robots.txt to tell web crawlers which pages to visit or not to visit.
This file can be used if your website has several pages that are created only for users. In other words, you don't want web crawlers to find and display them on the SERP. Examples of these are checkout pages or those that require payment access that can only be accessed after logging in.
As you already know, robots.txt is able to block web crawlers from accessing some pages that you have specified from appearing in search results.
Even so, there are times when the web crawler does not follow the instructions in the file, so it continues to crawl and index the page.
You can find out about the issue through the Google Search Console. An error warning will appear in the GSC with the words "indexed, though blocked by robots.txt" or "indexed, even though blocked by robots.txt".
If this error warning appears, it means that Google has indexed a URL that is blocked in robots.txt. Google displays a warning if they are not sure whether you want the page to be indexed or not.
This can be a problem, especially if the page displays private information or data. Therefore, it is important to know how to fix "indexed, though blocked by robots.txt"..
The way search engines work is by using a search robot, or web crawler. This robot will browse every website on the internet, save the data to their database or index, and display it to searchers.
The role of robots.txt is very important to managing the web crawler browsing process on the website. This file is useful as a guide where the web crawler will visit it first before crawling all other pages on the website.
With robots.txt, you can provide various instructions, such as the 'nofollow' attribute to block robot access to a URL and 'dofollow' to allow robots to crawl that URL.
If you want to create a robots.txt file, you can simply use the robots.txt generator tools from cmlabs. With this tool, you can create a robots.txt file easily and quickly.
To better understand how to fix "indexed, though blocked by robots.txt", let's first understand the process of indexing a page. The index itself is a database or data repository belonging to search engines. It is about all websites found by search robots.
Websites that are in the search engine index have gone through the indexing process, which is the process of storing the contents of the website page content. It does not only contain website URLs but also all text, images, videos, tags, and attributes in the HTML code of the page.
The indexing process also analyzes the contents of the stored data, such as the language used, country of origin, page role, and so on. The web crawler will also analyze whether a page is a duplicate or not.
Search engines can decide whether a page will be indexed or not. There are several reasons why search engines decide not to index a page, namely:
Now that you know what robots.txt is for and how the indexing process works, it's time for you to learn how to fix "indexed, though blocked by robots.txt" in GSC.
The thing you have to do before knowing how to fix "indexed, even though blocked by robots.txt" is to determine whether the error page really needs to be indexed or not.
This is because the actions you will take in these two conditions will be different. If you don't want the URL to be indexed by Google, then here are some things you can do:
The easiest way to prevent a page from being crawled is to check your website's robots.txt file. Make sure that the page you want to block has a disallow statement.
Although this seems trivial, website managers often forget to set a statement on the page they want to block.
You need to know that crawling and indexing are two different processes. Crawling refers to the search process carried out by web crawlers to find every page on a website. While indexing is done to analyze and save a page.
If a page keeps getting an "indexed, though blocked by robots.txt" warning, you can use the "noindex" meta tag. With this meta tag, search engines will not index the page even though it has been crawled.
A page may be crawled even if it has been blocked by robots.txt. The reason is that there are other websites that provide links. The trick is to first check the backlinks that point to pages you don't want indexed and delete them.
If a page gets an "indexed, though blocked by robots.txt" warning and it's a page you intended to index, then you should check the crawling settings on that page.
This is because the page that you intend to appear on the SERP is not actually crawled by Google, even though it is already indexed. You can check your crawl settings in the following ways:
First, check the settings in the robots.txt file, whether the page you want to index is actually blocked in robots.txt. You can access the robots.txt file by typing in domainname.com/robots.txt.
After that, you can find out if the page you want to index has a disallow statement. The form of the disallow statement looks like this:
Disallow: /
If the disallow is specified for a specific user agent, then it looks like this:
User-agent: *
Disallow: /
Pages that have disallow statements will not be crawled by robots, so you have to change the statements on that page to 'allow'.
Websites can block access to a user agent such as Googlebot or Ahrefsbot so they cannot be crawled. When this happens, you may still be able to find them on other search engines.
However, your website will not be found when using Google or Ahrefs because the access of both user-agents is blocked. This problem can occur due to blockages on several systems on the website, such as .htaccess, CDN, firewall, server configuration, and others.
The best way you can solve this problem is to contact your hosting provider or CDN to find out where the blocking is coming from and how you can fix it.
An intermittent block is a condition where the cause of a page not being crawled is difficult to determine. To solve this, you need to check the history of your website's robots.txt.
Use a tool like GSC robots.txt Tester to look at the previous version of the file and check for any incorrect instructions in that version. Solutions to this issue may vary and depend on the cause.
One of the causes that often occurs is due to the use of cache. When the test mode cache is active, robots.txt blocks page access. However, when the live mode cache is active, the page can be crawled.
To solve this, you can remove the robots.txt file from the cache in test mode.
If you have checked the three things above and no problems were found, then the cause could be due to a block in the IP address.
The solution to this problem is to contact the hosting provider or CDN. The reason is that the condition of IP blocks is very difficult to track, so you need help to find the source of the block and how to solve it.
This is an explanation of how to fix "indexed, even though blocked by robots.txt". Hopefully, with this guide, you can solve the crawling problems that exist on web pages.
If you need further handling, you can use SEO services that provide a professional team to help with the SEO optimization process, including website crawlability.
WDYT, you like my article?
Free on all Chromium-based web browsers
Free on all Chromium-based web browsers
In accordance with the established principles of marketing discourse, I would like to inquire as to your perspective on the impact of SEO marketing strategies in facilitating the expansion of enterprises in relation to your virtual existence.