Robots.txt Generator

Control 35+ AI & search bots with classified categories. Block AI training, allow citations, or use AI to generate the perfect robots.txt.

0 blocked · 34 allowed

Quick Presets

🤖 AI Training

GPTBotOpenAI

Trains GPT-4o/GPT-5 models

anthropic-aiAnthropic

Bulk training data for Claude

Google-ExtendedGoogle

Gemini AI model training

Applebot-ExtendedApple

Apple AI training data

cohere-aiCohere

Language model training data

BytespiderByteDance

TikTok AI training & recommendations

CCBotCommon Crawl

Open AI training datasets

DiffbotDiffbot

Structured ML pipeline data

🔍 AI Search / Citation

OAI-SearchBotOpenAI

ChatGPT search & citations

ChatGPT-UserOpenAI

User-triggered ChatGPT browsing

ClaudeBotAnthropic

Claude chat citations & browsing

claude-webAnthropic

Claude web-focused fetch

PerplexityBotPerplexity

Perplexity search indexing

Perplexity-UserPerplexity

Human-triggered Perplexity visits

DuckAssistBotDuckDuckGo

DuckAssist AI answers

YouBotYou.com

You.com AI search assistant

MistralAI-UserMistral

Le Chat AI citations

GoogleAgent-MarinerGoogle

Google Project Mariner agent

🔬 Research

AI2BotAllen Institute

Semantic Scholar & AI research

omgiliOmgili

Forum & discussion indexing

TimpibotTimpi

Decentralised search crawler

📱 Social Media

FacebookBotMeta

Facebook/Instagram link previews

meta-externalagentMeta

Backup Meta fetch agent

LinkedInBotLinkedIn

LinkedIn link preview extraction

TwitterbotX/Twitter

Twitter/X card previews

🌐 Traditional Search

GooglebotGoogle

Google Search indexing

BingbotMicrosoft

Bing Search & Copilot

ApplebotApple

Siri & Spotlight indexing

DuckDuckBotDuckDuckGo

DuckDuckGo search

YandexBotYandex

Yandex Search

BaiduspiderBaidu

Baidu Search (China)

AmazonbotAmazon

Alexa & Amazon AI features

SlurpYahoo

Yahoo Search crawler

ia_archiverInternet Archive

Wayback Machine archiving

Custom Rules

Share

Free Robots.txt Generator

The robots.txt file is one of the first things search engine crawlers check when visiting your site. It acts as a set of instructions telling bots which pages to crawl and which to skip. A properly configured robots.txt file protects private admin areas, prevents duplicate content from being crawled, and directs crawlers to your XML sitemap. A misconfigured one can accidentally block your entire site from Google.

This free Robots.txt Generator creates a valid robots.txt file with correct syntax. Set rules for all bots or specific crawlers (Googlebot, Bingbot, etc.), specify allowed and disallowed paths, set your crawl delay, and add your XML sitemap URL — all without writing any code manually.

Copy the generated output and upload it to the root of your website as robots.txt. Always verify it at yourdomain.com/robots.txt and test it in Google Search Console's robots.txt tester after uploading.

Key Robots.txt Directives Explained

User-agent. Specifies which bot the following rules apply to. Use * for all bots, or a specific bot name like Googlebot to target a single crawler.

Disallow. Tells the specified bot not to crawl the given path. Disallow: / blocks crawling of the entire site. Disallow: /admin/ blocks only the /admin/ section.

Allow. Explicitly permits crawling of a path even when a parent path is disallowed. Useful for allowing specific pages within a blocked directory.

Sitemap. Points bots to your XML sitemap URL. All major search engines respect this directive and use the sitemap to discover pages.

Crawl-delay. Asks crawlers to wait a specified number of seconds between requests. Useful for servers with limited capacity, though Google does not officially support this directive.

Common Robots.txt Mistakes to Avoid

Blocking the whole site. Disallow: / with User-agent: * blocks all crawlers from your entire site. This is the most common and catastrophic robots.txt error.

Blocking CSS and JavaScript. Blocking stylesheets and scripts prevents Google from rendering pages correctly, which hurts rankings. Never disallow /wp-content/themes/ or similar asset directories.

Relying on robots.txt for security. robots.txt is publicly visible and respected only by legitimate bots. Malicious crawlers ignore it entirely. Do not use it to hide sensitive content — use proper authentication instead.

Confusing crawl blocking with index blocking. A page blocked in robots.txt can still appear in Google's index if linked from other pages. Use noindex meta tags to prevent indexing.

Related Tools

Frequently Asked Questions

What is robots.txt?

A text file at your domain root that tells crawlers which pages to access or skip.

Does it block indexing?

No — it blocks crawling. Use noindex meta tags to prevent actual indexing.

What is Disallow?

A directive telling a bot not to crawl a specific path or section.

Is this free?

Yes. Completely free, no sign-up needed.