Robots.txt Generator
Control 35+ AI & search bots with classified categories. Block AI training, allow citations, or use AI to generate the perfect robots.txt.
Quick Presets
🤖 AI Training
Trains GPT-4o/GPT-5 models
Bulk training data for Claude
Gemini AI model training
Apple AI training data
Language model training data
TikTok AI training & recommendations
Open AI training datasets
Structured ML pipeline data
🔍 AI Search / Citation
ChatGPT search & citations
User-triggered ChatGPT browsing
Claude chat citations & browsing
Claude web-focused fetch
Perplexity search indexing
Human-triggered Perplexity visits
DuckAssist AI answers
You.com AI search assistant
Le Chat AI citations
Google Project Mariner agent
🔬 Research
Semantic Scholar & AI research
Forum & discussion indexing
Decentralised search crawler
📱 Social Media
Facebook/Instagram link previews
Backup Meta fetch agent
LinkedIn link preview extraction
Twitter/X card previews
🌐 Traditional Search
Google Search indexing
Bing Search & Copilot
Siri & Spotlight indexing
DuckDuckGo search
Yandex Search
Baidu Search (China)
Alexa & Amazon AI features
Yahoo Search crawler
Wayback Machine archiving
Custom Rules
Free Robots.txt Generator
The robots.txt file is one of the first things search engine crawlers check when visiting your site. It acts as a set of instructions telling bots which pages to crawl and which to skip. A properly configured robots.txt file protects private admin areas, prevents duplicate content from being crawled, and directs crawlers to your XML sitemap. A misconfigured one can accidentally block your entire site from Google.
This free Robots.txt Generator creates a valid robots.txt file with correct syntax. Set rules for all bots or specific crawlers (Googlebot, Bingbot, etc.), specify allowed and disallowed paths, set your crawl delay, and add your XML sitemap URL — all without writing any code manually.
Copy the generated output and upload it to the root of your website as robots.txt. Always verify it at yourdomain.com/robots.txt and test it in Google Search Console's robots.txt tester after uploading.
Key Robots.txt Directives Explained
User-agent. Specifies which bot the following rules apply to. Use * for all bots, or a specific bot name like Googlebot to target a single crawler.
Disallow. Tells the specified bot not to crawl the given path. Disallow: / blocks crawling of the entire site. Disallow: /admin/ blocks only the /admin/ section.
Allow. Explicitly permits crawling of a path even when a parent path is disallowed. Useful for allowing specific pages within a blocked directory.
Sitemap. Points bots to your XML sitemap URL. All major search engines respect this directive and use the sitemap to discover pages.
Crawl-delay. Asks crawlers to wait a specified number of seconds between requests. Useful for servers with limited capacity, though Google does not officially support this directive.
Common Robots.txt Mistakes to Avoid
Blocking the whole site. Disallow: / with User-agent: * blocks all crawlers from your entire site. This is the most common and catastrophic robots.txt error.
Blocking CSS and JavaScript. Blocking stylesheets and scripts prevents Google from rendering pages correctly, which hurts rankings. Never disallow /wp-content/themes/ or similar asset directories.
Relying on robots.txt for security. robots.txt is publicly visible and respected only by legitimate bots. Malicious crawlers ignore it entirely. Do not use it to hide sensitive content — use proper authentication instead.
Confusing crawl blocking with index blocking. A page blocked in robots.txt can still appear in Google's index if linked from other pages. Use noindex meta tags to prevent indexing.
Related Tools
- XML Sitemap Generator – Create a sitemap to include in your robots.txt.
- Meta Tag Generator – Generate noindex and other meta directives.
- HTTP Headers Checker – Verify X-Robots-Tag headers.
Frequently Asked Questions
What is robots.txt?
A text file at your domain root that tells crawlers which pages to access or skip.
Does it block indexing?
No — it blocks crawling. Use noindex meta tags to prevent actual indexing.
What is Disallow?
A directive telling a bot not to crawl a specific path or section.
Is this free?
Yes. Completely free, no sign-up needed.