Changing the way you receive insight through tools and analyitcs
Continues learning of on-going lifetime risk in keyword
Niche/market monitoring and prioritization
Early detection of anomalies of Google SERPs
Home SEO Terms Robots.txt
Written by cmlabs
|

Robots.txt

Last updated: Mar 15, 2021
Disclaimer: Our team is constantly compiling and adding new terms that are known throughout the SEO community and Google terminology. You may be sent through SEO Terms in cmlabs.co from third parties or links. Such external links are not investigated, or checked for accuracy and reliability by us. We do not assume responsibility for the accuracy or reliability of any information offered by third-party websites.

Written by cmlabs

CMLABS' SEO TERMS

WHAT IS ROBOTS.TXT

Definition of robots.txt

Robot.txt is a file used by search engines’ crawlers in your website to classify pages that people can visit. In certain cases, web developers provide a PUBLIC page for users, not search engines such as Google, Bing, and Yahoo.

The purpose of this file is a robot exclusion protocol. It is de facto standard in the communication law and a border between websites and non-human users.

Robots exclusion protocol or robotstxt allows web developers to decide in which part/file/folder of their website that can be accessed by bot or crawler.

Samples of Codes or Robots.txt Syntax

user-agent: googlebot disallow: /login  user-agent: googlebot-news disallow: /media  user-agent: googlebot-image

Based on the syntax sample above, here is the explanation:

  • Googlebot user-agent is prohibited to crawl into the /loginfolder.
  • Googlebot-news user-agent is prohibited to crawl into the /media folder.
  • Googlebot-image user-agent is allowed to look over into all of the folders inside the www.cmlabs.co website without any limitations.

Sample and Implementation of robots.txt URL

In general cases, robots.txt implementation is NOT VALID for a subdomain, protocol, and port. However, it will be VALID for all files in all of the sub-directories on the host, protocol, and port.

Check the sample location of robots.txt file in the directory of website server:

CONTOH VALID http://robots.co/robots.txt http://robots.co/folder/file/robots.txt
CONTOH TIDAK VALID http://other.cmlabs.co/robots.txt https://cmlabs.co/robots.txt http://cmlabs.co:8181/robots.txt

Important note

When this page is published (on May 21st, 2020), the definition and implementation of the robots.txt are only applicable to Google. In another word, other search engines such as Bing, Yahoo, Yandex, etc do not always use the same standard.

However, a global standarization has been a discussion among the international communities.

Misunderstanding

Robots.txt is not the right file to be used to hide a file or page from the crawler of search engines.

The right answer for: what should we do to hide files from Google? Is by inserting nonindex tag.

<meta name="robots" content="noindex"> <meta name="googlebot" content="noindex">

RESPONSE HEADER

HTTP/1.1 200 OK (…) X-Robots-Tag: noindex (…)

Changes of Protocol Standards

On July, 1st 2019, Google through its official blog announced that robots.txt protocol was prepared to be the Internet standard. It means that all of the search engines will be agreed to this provision.

Related Terms

User-agent / bot

User-agent is a robot that is used by search engines to crawl all websites on the internet.

The definition of one term may be imprecise and should be adjusted accordingly. So don't hesitate to contact us via email at hello@cmlabs.co. In order to provide better quality and relevance of content, we thank you for your contribution of suggestions and input.
Share Now




Copied to Clipboard !