GPTBot — Definition

GPTBot

TL;DR: GPTBot is OpenAI’s training crawler — it fetches web content used to train OpenAI’s models. It honors robots.txt, publishes its IP ranges, and is separate from OpenAI’s search and user-fetch bots.

What it means

GPTBot is the user-agent OpenAI uses when it crawls the web to gather content for training its models. It is OpenAI’s training crawler specifically — not the bot that powers ChatGPT search, and not the one that opens a page when a user asks ChatGPT about it. Those are OAI-SearchBot and ChatGPT-User respectively. GPTBot identifies itself with a user-agent token (currently GPTBot/1.3) and OpenAI publishes the IP ranges it crawls from, so site owners can verify a request really is GPTBot rather than a scraper using its name.

Why it matters

GPTBot is the crawler most site owners decide about first, because training is the use that gives nothing back — no citation, no referral traffic. Blocking it is the canonical “I don’t want to train models for free” move, and roughly a quarter of top sites now do. Crucially, GPTBot respects robots.txt, so a Disallow line genuinely stops it. Blocking GPTBot does not remove your brand from ChatGPT search — that’s governed by OAI-SearchBot, which you can allow independently. (One caveat: blocking GPTBot stops future training crawls; content already in past training sets stays.)

How it works / examples

To block GPTBot from your whole site:

User-agent: GPTBot
Disallow: /

To allow ChatGPT search while still blocking training, pair that with User-agent: OAI-SearchBot / Allow: /. Because GPTBot is compliant, robots.txt is enough — you don’t need a firewall rule for OpenAI specifically (you do for non-compliant scrapers like glossary/bytespider).

glossary/ai-crawler — the three crawler types GPTBot belongs to
glossary/ccbot — the other major training crawler to decide about
seo/ai-crawler-access — the full bot directory and access policy

Sources

OpenAI — Bots / Crawlers documentation