Understanding AI Bot Taxonomy
Learn to classify AI bots into Training Bots and Search Agents. Understand the Value Exchange Principle and why treating all bots the same is a strategic mistake.
Key Takeaways
- The difference between Training Bots and Search Agents
- The Value Exchange Principle for bot management
- Complete catalog of major AI bots and their purposes
- Why a blanket "block all bots" strategy is harmful
The Agentic Web Has Arrived
The internet is no longer just crawled; it is "read" and "acted upon" by autonomous agents. When someone asks an AI assistant to "find me a good project management tool," an agent reads your website, evaluates your product, and either recommends you—or your competitor.
The Critical Insight: Not all AI bots are equal. Some take your content to train their models (low value to you). Others index your content to recommend you in real-time queries (high value). Treat them accordingly.
Training Bots: The Takers
Training Bots download bulk data from your website to train Large Language Models. They take your intellectual property without driving traffic back. Their value exchange is low or negative.
Training Bots to Block:
- •GPTBot (OpenAI) - Trains GPT models, no traffic return
- •CCBot (Common Crawl) - Open repository for anyone to train on
- •Google-Extended - Trains Gemini/Vertex AI models
- •ClaudeBot (Anthropic) - Trains Claude models
- •Bytespider (ByteDance) - Aggressive scraper for LLM training
- •Amazonbot - Content harvesting, limited traffic return
Search Agents: The Traffic Drivers
Search Agents index content to surface in real-time queries. When someone asks Perplexity a question and your content is cited, you get traffic. Their value exchange is high.
Search Agents to Allow:
- •OAI-SearchBot (OpenAI) - Indexes for SearchGPT citations
- •Googlebot - Standard search indexing (essential)
- •PerplexityBot - Answer generation with citations and links
- •GoogleAgent-Mariner - Executes purchases for users (very high value)
- •AmazonBuyForMe - Automates shopping tasks
- •Google-Shopping - Product listing indexing
Value Exchange Principle: Before allowing or blocking any bot, ask: "Does this bot give me traffic, or just take my content?" Manage your robots.txt accordingly.
The Yellow Zone: Monitor Bots
Some bots have mixed value or uncertain purposes. These should be monitored with throttled access rather than immediately blocked or allowed.
Bots to Monitor:
- •Applebot-Extended - May affect Siri results, unclear value
- •Anthropic-AI - Different from ClaudeBot, unclear purpose
- •New undocumented AI agents - Assess before full access
- •High-frequency fetchers - May be legitimate or abusive
Practitioner assets
Turn this lesson into a repeatable GEO workflow
Use the checklist, sources, templates, and assessment prompts to move from theory to a client-ready diagnostic or implementation step.
- highDefine the prompt set, user intent, market, persona or vertical scenario for this lesson.
- highCapture current AI answer evidence with provider, date, excerpt, citations and competitor mentions.
- highIdentify the likely root cause: content gap, authority gap, technical access, source inconsistency, review signal or policy risk.
- mediumCreate the visible page, proof block, profile update, policy clarification or report artifact that resolves the gap.
- mediumAssign owner, due date, expected impact and remeasurement window before calling the work complete.
- Google Search Central: Robots.txt introductionGoogle Search Central · 2025
- Google Search Central: Intro to structured dataGoogle Search Central · 2025
- Schema.org vocabularySchema.org · 2025
- AI Bot Taxonomy Work Product TemplateA repeatable worksheet for applying AI Bot Taxonomy to a real brand or client account.
- Before/After Answer ProofA reporting format for showing how AI answer quality changed after the improvement shipped.
This lesson includes 5 assessment questions to reinforce the concepts before you apply them to a real GEO audit.