Master SEO vocabulary with all essential terms and meanings here.

Collaborator

Latest Update

Last updated: Mar 03, 2025

Parasite SEO

Last updated: Feb 17, 2025

Short-Tail Keyword:

Last updated: Jan 07, 2025

Explore Other Terms

Latest

( 11 from 38 terms )

Robots.txt

Last updated: Aug 12, 2022

https://cmlabs.co/en/seo-terms/robots-txt

Link copied!

WHAT IS ROBOTS.TXT?
Robot.txt is a file used by search engines’ crawlers in your website to classify pages that people can visit. In certain cases, web developers provide a PUBLIC page for users, not search engines such as Google, Bing, and Yahoo.
The purpose of this file is a robot exclusion protocol. It is a de facto standard in the communication law and a border between websites and non-human users.
Robots exclusion protocol or robots txt allows web developers to decide in which part/file/folder of their website that can be accessed by bot or crawler.

How Robot.txt Works

Robot.txt provides instructions for bots. The web crawler will try to crawl robot.txt first before crawling all other pages in a domain. The instructions written on robot.txt will be carried out by the crawler such as no follow, do follow, or other instructions.

The crawler bot will follow the most specific set of instructions in the robots.txt file. If there are contradictory instructions in the file, the bot will follow more detailed instructions.

Robot.txt Function

The functions of robot.txt are as follows:

To Control Crawler/Useragent Activity

One of the main functions of robot.txt is to control the crawler activity on the website. Without using robot.txt, crawlers will crawl all pages including duplicate content. If you don't allow bots to crawl it, then you'll need to add instructions to robot.txt

Blocking Pages From Appearing in the SERP

There are times when you don't want a page on your website to appear in the SERP for several reasons. Case in point, you are not targeting product subcategory pages to appear on the SERP. Therefore, you can instruct the crawler not to display it.

Samples of Codes or Robots.txt Syntax

user-agent: Googlebot disallow: /login user-agent: Googlebot-news disallow: /media user-agent: Googlebot-image

Based on the syntax sample above, here is the explanation:

Googlebot user-agent is prohibited to crawl into the /loginfolder.
Googlebot-news user-agent is prohibited to crawl into the /media folder.
Googlebot-image user-agent is allowed to look over into all of the folders inside the www.cmlabs.co website without any limitations.

Sample and Implementation of robots.txt URL

In general cases, robots.txt implementation is NOT VALID for a subdomain, protocol, and port. However, it will be VALID for all files in all of the sub-directories on the host, protocol, and port.

Check the sample location of the robots.txt file in the directory of the website server:

Valid Example

http://robots.co/robots.txt http://robots.co/folder/file/robots.txt

cmlabs

Invalid Example

http://other.cmlabs.co/robots.txt https://cmlabs.co/robots.txt 
http://cmlabs.co:8181/robots.txt

cmlabs

Important note

When this page is published (on May 21st, 2020), the definition and implementation of the robots.txt are only applicable to Google. In another word, other search engines such as Bing, Yahoo, Yandex, etc do not always use the same standard.

However, a global standardization has been a discussion among the international communities.

Misunderstanding

Robots.txt is not the right file to be used to hide a file or page from the crawler of search engines.

The right answer for: what should we do to hide files from Google? Is by inserting nonindex tag.

RESPONSE HEADER

HTTP/1.1 200 OK (…) X-Robots-Tag: noindex (…)

Changes in Protocol Standards

On July, 1st 2019, Google through its official blog announced that robots.txt protocol was prepared to be the Internet standard. It means that all of the search engines will be agreed to this provision.

Robots Exclusion Protocol Draft

Related Terms

User-agent / bot

User-agent is a robot that is used by search engines to crawl all websites on the internet.

Find other important terms in the following SEO Terms:

cmlabs

WDYT, you like my article?

Latest Update

Barnacle SEO

Last updated: Mar 03, 2025

Parasite SEO

Last updated: Feb 17, 2025

Short-Tail Keyword:

Last updated: Jan 07, 2025

Explore Other Terms

Latest

( 11 from 38 terms )

Robots.txt

How Robot.txt Works

Robot.txt Function

To Control Crawler/Useragent Activity

Blocking Pages From Appearing in the SERP

Samples of Codes or Robots.txt Syntax

Sample and Implementation of robots.txt URL

Important note

Misunderstanding

Changes in Protocol Standards

Related Terms

cmlabs

Subscribe to Our Newsletter

Enter your email to receive news from us

Need help?