What is a robots.txt file?

A robots.txt file is a text file placed at the root of your website (yourdomain.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to access and index.

Does robots.txt block indexing?

Robots.txt blocks crawling, not indexing. A page blocked by robots.txt cannot be crawled, but Google can still index it if other pages link to it. To prevent indexing, use a noindex meta tag instead.

What is the Disallow directive?

Disallow tells a specified bot not to crawl a given URL path. For example, Disallow: /admin/ tells all bots to skip the /admin/ section of the site.

Is this robots.txt generator free?

Yes. Completely free with no account required.

Robots.txt Generator

Control 35+ AI & search bots with classified categories. Block AI training, allow citations, or use AI to generate the perfect robots.txt.

0 blocked · 34 allowed

Quick Presets

🤖 AI Training

GPTBotOpenAI

Trains GPT-4o/GPT-5 models

anthropic-aiAnthropic

Bulk training data for Claude

Google-ExtendedGoogle

Gemini AI model training

Applebot-ExtendedApple

Apple AI training data

cohere-aiCohere

Language model training data

BytespiderByteDance

TikTok AI training & recommendations

CCBotCommon Crawl

Open AI training datasets

DiffbotDiffbot

Structured ML pipeline data

🔍 AI Search / Citation

OAI-SearchBotOpenAI

ChatGPT search & citations

ChatGPT-UserOpenAI

User-triggered ChatGPT browsing

ClaudeBotAnthropic

Claude chat citations & browsing

claude-webAnthropic

Claude web-focused fetch

PerplexityBotPerplexity

Perplexity search indexing

Perplexity-UserPerplexity

Human-triggered Perplexity visits

DuckAssistBotDuckDuckGo

DuckAssist AI answers

YouBotYou.com

You.com AI search assistant

MistralAI-UserMistral

Le Chat AI citations

GoogleAgent-MarinerGoogle

Google Project Mariner agent

🔬 Research

AI2BotAllen Institute

Semantic Scholar & AI research

omgiliOmgili

Forum & discussion indexing

TimpibotTimpi

Decentralised search crawler

📱 Social Media

FacebookBotMeta

Facebook/Instagram link previews

meta-externalagentMeta

Backup Meta fetch agent

LinkedInBotLinkedIn

LinkedIn link preview extraction

TwitterbotX/Twitter

Twitter/X card previews

🌐 Traditional Search

GooglebotGoogle

Google Search indexing

BingbotMicrosoft

Bing Search & Copilot

ApplebotApple

Siri & Spotlight indexing

DuckDuckBotDuckDuckGo

DuckDuckGo search

YandexBotYandex

Yandex Search

BaiduspiderBaidu

Baidu Search (China)

AmazonbotAmazon

Alexa & Amazon AI features

SlurpYahoo

Yahoo Search crawler

ia_archiverInternet Archive

Wayback Machine archiving

Custom Rules

Sitemap URL

Host (optional, Yandex)

Generated robots.txt

# ─── 🤖 AI Training ───
# OpenAI — Trains GPT-4o/GPT-5 models
User-agent: GPTBot
Allow: /

# Anthropic — Bulk training data for Claude
User-agent: anthropic-ai
Allow: /

# Google — Gemini AI model training
User-agent: Google-Extended
Allow: /

# Apple — Apple AI training data
User-agent: Applebot-Extended
Allow: /

# Cohere — Language model training data
User-agent: cohere-ai
Allow: /

# ByteDance — TikTok AI training & recommendations
User-agent: Bytespider
Allow: /

# Common Crawl — Open AI training datasets
User-agent: CCBot
Allow: /

# Diffbot — Structured ML pipeline data
User-agent: Diffbot
Allow: /

# ─── 🔍 AI Search / Citation ───
# OpenAI — ChatGPT search & citations
User-agent: OAI-SearchBot
Allow: /

# OpenAI — User-triggered ChatGPT browsing
User-agent: ChatGPT-User
Allow: /

# Anthropic — Claude chat citations & browsing
User-agent: ClaudeBot
Allow: /

# Anthropic — Claude web-focused fetch
User-agent: claude-web
Allow: /

# Perplexity — Perplexity search indexing
User-agent: PerplexityBot
Allow: /

# Perplexity — Human-triggered Perplexity visits
User-agent: Perplexity-User
Allow: /

# DuckDuckGo — DuckAssist AI answers
User-agent: DuckAssistBot
Allow: /

# You.com — You.com AI search assistant
User-agent: YouBot
Allow: /

# Mistral — Le Chat AI citations
User-agent: MistralAI-User
Allow: /

# Google — Google Project Mariner agent
User-agent: GoogleAgent-Mariner
Allow: /

# ─── 🔬 Research ───
# Allen Institute — Semantic Scholar & AI research
User-agent: AI2Bot
Allow: /

# Omgili — Forum & discussion indexing
User-agent: omgili
Allow: /

# Timpi — Decentralised search crawler
User-agent: Timpibot
Allow: /

# ─── 📱 Social Media ───
# Meta — Facebook/Instagram link previews
User-agent: FacebookBot
Allow: /

# Meta — Backup Meta fetch agent
User-agent: meta-externalagent
Allow: /

# LinkedIn — LinkedIn link preview extraction
User-agent: LinkedInBot
Allow: /

# X/Twitter — Twitter/X card previews
User-agent: Twitterbot
Allow: /

# ─── 🌐 Traditional Search ───
# Google — Google Search indexing
User-agent: Googlebot
Allow: /

# Microsoft — Bing Search & Copilot
User-agent: Bingbot
Allow: /

# Apple — Siri & Spotlight indexing
User-agent: Applebot
Allow: /

# DuckDuckGo — DuckDuckGo search
User-agent: DuckDuckBot
Allow: /

# Yandex — Yandex Search
User-agent: YandexBot
Allow: /

# Baidu — Baidu Search (China)
User-agent: Baiduspider
Allow: /

# Amazon — Alexa & Amazon AI features
User-agent: Amazonbot
Allow: /

# Yahoo — Yahoo Search crawler
User-agent: Slurp
Allow: /

# Internet Archive — Wayback Machine archiving
User-agent: ia_archiver
Allow: /

User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/

Sitemap: https://yourdomain.com/sitemap.xml

Related Tools

View all tools

Email Privacy Checker

Use our free email privacy checker tool online.

Open Graph Checker

Use our free open graph checker tool online.

WWW Redirect Checker

Use our free www redirect checker tool online.

Free Robots.txt Generator

The robots.txt file is one of the first things search engine crawlers check when visiting your site. It acts as a set of instructions telling bots which pages to crawl and which to skip. A properly configured robots.txt file protects private admin areas, prevents duplicate content from being crawled, and directs crawlers to your XML sitemap. A misconfigured one can accidentally block your entire site from Google.

This free Robots.txt Generator creates a valid robots.txt file with correct syntax. Set rules for all bots or specific crawlers (Googlebot, Bingbot, etc.), specify allowed and disallowed paths, set your crawl delay, and add your XML sitemap URL — all without writing any code manually.

Copy the generated output and upload it to the root of your website as robots.txt. Always verify it at yourdomain.com/robots.txt and test it in Google Search Console's robots.txt tester after uploading.

Key Robots.txt Directives Explained

User-agent. Specifies which bot the following rules apply to. Use * for all bots, or a specific bot name like Googlebot to target a single crawler.

Disallow. Tells the specified bot not to crawl the given path. Disallow: / blocks crawling of the entire site. Disallow: /admin/ blocks only the /admin/ section.

Allow. Explicitly permits crawling of a path even when a parent path is disallowed. Useful for allowing specific pages within a blocked directory.

Sitemap. Points bots to your XML sitemap URL. All major search engines respect this directive and use the sitemap to discover pages.

Crawl-delay. Asks crawlers to wait a specified number of seconds between requests. Useful for servers with limited capacity, though Google does not officially support this directive.

Common Robots.txt Mistakes to Avoid

Blocking the whole site. Disallow: / with User-agent: * blocks all crawlers from your entire site. This is the most common and catastrophic robots.txt error.

Blocking CSS and JavaScript. Blocking stylesheets and scripts prevents Google from rendering pages correctly, which hurts rankings. Never disallow /wp-content/themes/ or similar asset directories.

Relying on robots.txt for security. robots.txt is publicly visible and respected only by legitimate bots. Malicious crawlers ignore it entirely. Do not use it to hide sensitive content — use proper authentication instead.

Confusing crawl blocking with index blocking. A page blocked in robots.txt can still appear in Google's index if linked from other pages. Use noindex meta tags to prevent indexing.

Related Tools

XML Sitemap Generator – Create a sitemap to include in your robots.txt.
Meta Tag Generator – Generate noindex and other meta directives.
HTTP Headers Checker – Verify X-Robots-Tag headers.

Frequently Asked Questions

What is robots.txt?

A text file at your domain root that tells crawlers which pages to access or skip.

Does it block indexing?

No — it blocks crawling. Use noindex meta tags to prevent actual indexing.

What is Disallow?

A directive telling a bot not to crawl a specific path or section.

Is this free?

Yes. Completely free, no sign-up needed.