cmlabs logo

Supervene Search Odyssey

cmlabs Jakarta Jl. Pluit Kencana Raya No.63, Pluit, Penjaringan, Jakarta Utara, DKI Jakarta, 14450, Indonesia

(+62) 21-666-04470COST-EFFECTIVE FEES, UP TO 5%!

WE ARE OPEN TO PARTNERSHIP WITH VARIOUS NICHES

Franchise Organizations|Educational Institutions|Professional Services Firms|Startup Incubators / Accelerators|…and 34 more

ServicesAll-in-One Digital ServicesDigital MarketingCreative ServicesWeb & App DevelopmentSee All Services
CompanyAbout cmlabsContact UsCareerPress ReleaseWhistleblower Protection
InformationNotification CenterClient's TestimonyFAQ of cmlabs Services
LegalTerms & ConditionsPrivacy PolicyTerms of Services

Copyright © 2019-2026 PT CMLABS INDONESIA DIGITAL

Supervene Search Odyssey

cmlabs Jakarta Jl. Pluit Kencana Raya No.63, Pluit, Penjaringan, Jakarta Utara, DKI Jakarta, 14450, Indonesia

(+62) 21-666-04470
COST-EFFECTIVE FEES, UP TO 5%!

WE ARE OPEN TO PARTNERSHIP WITH VARIOUS NICHES

Franchise Organizations|Educational Institutions|Professional Services Firms|Startup Incubators / Accelerators| …and 34 more

ServicesAll-in-One Digital ServicesDigital MarketingCreative ServicesWeb & App DevelopmentSee All Services
CompanyAbout cmlabsContact UsCareerPress ReleaseWhistleblower Protection
InformationNotification CenterClient's TestimonyFAQ of cmlabs Services
LegalTerms & ConditionsPrivacy PolicyTerms of Services

Copyright © 2019-2026 PT CMLABS INDONESIA DIGITAL

  1. Home
  2. cmlabs News
  3. Open AI Launched GPTBot as A Web Crawler
Artificial Intelligence

Open AI Launched GPTBot as A Web Crawler

T
By Tati Khumairoh
Published at Aug 23, 2023, 08:38
Published at Sep 22, 2023, 07:50 By Tati Khumairoh

Shared 17 times

Disclaimer: We offer ad-free and organic news content to our readers.
Open AI Launched GPTBot as A Web Crawler

The AI platform ChatGPT doesn't immediately display results. They gather information from the internet to provide answers to users. Recently, they announced the presence of the web crawler GPTBot, which can visit any website available on the internet.

As an AI-based information provider platform, ChatGPT uses a website crawler to display results. Read about this bot and how to limit it here.

Key Takeaways
  • OpenAI introduced its website crawler called GPTBot. 
  • This website crawler functions to explore the internet and gather information that will be displayed in ChatGPT. 
  • Webmasters can prevent this bot from visiting their sites with the "disallow" command in the robots.txt file. 
  • Google is currently seeking alternative complements to robots.txt to anticipate this AI web crawler.

Earlier this month, OpenAI, as the provider platform for ChatGPT, launched a web crawler called GPTBot to access information on the internet and provide it on their platform. How do search engines respond, and how can you limit the crawler's access to your content? Find the answers below.

 

What is GPTBot?

In a move to advance artificial intelligence (AI) technology, OpenAI has introduced GPTBot, an advanced web crawler designed to enhance the capabilities of highly anticipated AI models like GPT-4 and GPT-5.

GPTBot plays a crucial role in AI development. This web crawler can be identified by the unique user agent token "GPTBot" and the following user agent string:

Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; GPTBot/1.0; +https://openai.com/gptbot)

 

Function and Limitations of GPTBot

In OpenAI's documentation, GPTBot will be used carefully to scan and collect data from web pages, with a promise to contribute to the advancement of future AI models.

They also ensure data source quality and ethics through strict filtering processes. For example, on websites that require payment and involve the collection of personally identifiable information (PII) or contain text that violates established policies, GPTBot will be systematically filtered.

 

Controlling GPTBot Interactions on Websites

Through the webmaster world discussion forum, many webmasters have noticed frequent visits from GPTBot to their websites. This has made them more cautious and prompted them to take actions like disallowing the robot.

If you believe this AI robot should not gather information from your site, here's how to limit it.

Implement specific directives in your site's robots.txt file. If you want to completely prevent GPTBot from accessing any part of your site, include the command:

 

User-agent: GPTBot

Disallow: /

 

By adding the above command, you have effectively told GPTBot not to crawl any part of your site.

However, if you want to grant specific access to GPTBot to certain parts of your site, you can implement commands like this:

 

User-agent: GPTBot

Allow: /directory-1/

Disallow: /directory-2/

 

In this scenario, GPTBot is allowed to access content in "directory-1," but explicitly prevented from crawling "directory-2." This control allows you to customize GPTBot's access according to your site's preferences and priorities.

 

Search Engine Responses

In response to OpenAI's move, Google seems to be considering alternative measures to prevent harmful GPTBot crawlers. In its blog, Google announced its intention to develop a complementary protocol to robots.txt, which has been in use for over three decades. 

The motivation behind this step is the emergence of new AI technology, not only from Google but also from other industry players.

This development follows reports of OpenAI leveraging internet content for its ChatGPT service. It's worth noting that this announcement is not final. Google's statement emphasizes its commitment to engage in discussions and collaboration with the online community.

In a statement released on Twitter, Google said:

 

"Today, we're starting a public discussion, inviting members of the web and AI community to provide input on the approach to the complementary protocol. We want to engage a variety of voices from different web publishers, civil society, academics, and other fields worldwide to join this discussion, and we'll gather those interested to participate in the coming months."

That concludes the overview of GPTBot, its current operation, and how you can limit its interactions on your website.

Article Source

As a dedicated news provider, we are committed to accuracy and reliability. We go the extra mile by attaching credible sources to support the data and information we present.

  1. OpenAI’s Documentation: https://platform.openai.com/docs/gptbot 
  2. Google Blog: https://blog.google/technology/ai/ai-web-publisher-controls-sign-up/
  3. Google Search Liaison Twitter: https://twitter.com/searchliaison/status/1677055071274795009 
  4. Forum Discussion on Webmaster World: https://www.webmasterworld.com/search_engine_spiders/5091131.htm

For more information about cmlabs and our services, please contact us at:

E-mail :

marketing@cmlabs.co

Phone : (0341) 475665

Disclaimer: All news published by cmlabs has undergone a strict verification and data processing process based on the cmlabs News Publication Guidelines However, the data or core news we write may undergo changes, reductions, or additions. Consequently, cmlabs assumes no liability for any losses or damages that may arise from the use of this information. We encourage readers to conduct additional verification before making decisions based on the information written on this page.

Shared 17 times

T

Tati Khumairoh

An experienced content writer who is eager in creating engaging and impactful written pieces across various industries. Using SEO approach to deliver high-quality content that captivates readers.

Another post from Tati

An Overview of Indonesia's Digital Economy on 43rd Summit
Written in Blogs
An Overview of Indonesia's Digital Economy on 43rd Summit
Wed, Sep 6, 2023, 9:18 AM GMT + 7
SEO Copywriting VS. SEO Content Writing
Written in Blogs
SEO Copywriting VS. SEO Content Writing
Thu, Nov 16, 2023, 11:41 AM GMT + 7
None Can Guarantee Google Ranking, What Does SEO Agency Sell?
Written in Blogs
None Can Guarantee Google Ranking, What Does SEO Agency Sell?
Wed, Feb 21, 2024, 11:22 AM GMT + 7
SEO Services Banyuwangi - Improve Website Performance!
Written in page.index.short-title
SEO Services Banyuwangi - Improve Website Performance!
Fri, Jan 21, 2022, 1:18 PM GMT + 7
View more articles

More from cmlabs News -your daily dose of SEO knowledge booster

cmlabs Won Bronze Medal & Top Outstanding Project at the Prestigious BRICS Industrial Innovation Contest 2025

cmlabs won a Bronze Medal & Top Outstanding Project at the BRICS Industrial Innovation Contest 2025 with Orion, an AI for data-driven marketing.

cmlabs Becomes Guest Speaker at BRI Group Meeting, Highlights the Importance of SEO for PR Corporate Branding

cmlabs participated as a Guest Speaker at the BRI Group meeting, discussing the importance of SEO in supporting PR and corporate branding.

Strengthening Our Steps, cmlabs Enhances Management Structure and Expands Digital Vision

cmlabs is solidifying our position as a game changer by refreshing our management structure and officially transforming into an all-in-one 360° digital marketing partner.

Other News?