Home Services Work About Book

§ · free tool

Robots.txt generator. AI crawlers included.

Build a clean robots.txt in one form. Pick a platform preset, toggle every AI crawler (training + retrieval) as a separate switch, add custom rules, and copy the output. No signup. Nothing leaves the browser.

See SEO services

Browser-only · nothing leaves this device

§ 01 · platform preset

§ 02 · inputs

§ 03 · AI crawlers (training)

These bots pull pages into future model training. Block one and you are excluded from that model's knowledge permanently.

§ 04 · AI crawlers (retrieval)

These bots fetch pages in real time to answer a user's prompt. Block one and your brand cannot be cited in that tool's answers.

· ·

§ 05 · robots.txt output

Copy-ready.

Stays in browser · nothing saved server-side

§ 06 · what robots.txt does

Robots.txt is a polite request, not a firewall.

Robots.txt is a plain-text file at the root of your domain that tells well-behaved crawlers which paths they may or may not fetch. Every major search engine (Google, Bing, DuckDuckGo) and every mainstream AI crawler (OpenAI's GPTBot, Anthropic's ClaudeBot, Perplexity's PerplexityBot) reads it before making a request. It is a voluntary protocol: malicious crawlers ignore it entirely. Use robots.txt to guide crawl efficiency and signal intent, not to protect confidential pages.

The file has four main directives: User-agent names which crawler the following rules apply to (* matches every bot), Allow and Disallow list URL path patterns, and Sitemap points to your XML sitemap. A Crawl-delay directive is honored by Bing and Yandex but ignored by Google, which manages crawl rate via Search Console settings. Order matters: the generator above ships the global User-agent: * block first, then bot-specific overrides below.

The AI-crawler split matters because training and retrieval are different decisions. Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Applebot-Extended) pull pages into the next model version. Blocking them removes your brand from tomorrow's world-knowledge permanently, which is a slow-decay problem you notice two model versions later. Retrieval crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot, DuckAssistBot) fetch pages in real time when a user asks a question. Blocking them makes you invisible in ChatGPT Search, Claude with web search, and Perplexity answers today. The generator exposes every bot as a separate toggle because the choice is rarely binary; most brands keep both groups open.

Tools in the same cluster: XML sitemap generator to build the sitemap you reference here. Schema markup generator for structured data on the pages that pass through. Hreflang generator for multi-locale sites.

§ 07 · anatomy

Every directive, explained.

01 · User-agent

User-agent: *

Names which bot the following rules apply to. The asterisk matches every bot. Specific names (GPTBot, Googlebot, Bingbot) override the wildcard for that bot only.

02 · Disallow

Disallow: /admin

Asks compliant crawlers not to fetch paths that begin with the given pattern. Wildcards are supported (/*.pdf$ blocks PDFs). An empty Disallow means "allow everything".

03 · Allow

Allow: /wp-admin/admin-ajax.php

Carves an exception inside a Disallow block. Useful for WordPress AJAX endpoints or Shopify variant URLs that live inside broadly-disallowed paths.

04 · Sitemap

Sitemap: https://site.com/sitemap.xml

Absolute URL to your XML sitemap. Can appear multiple times for multi-sitemap sites. Place at the bottom of the file by convention; order does not affect parsing.

§ 08 · questions

Six answers.

Should I block AI crawlers in robots.txt?

Usually no. There are two groups of AI crawlers and they do different things. Training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Applebot-Extended) pull your pages into the next model version. Retrieval crawlers (OAI-SearchBot, Claude-SearchBot, PerplexityBot, DuckAssistBot, ChatGPT-User) fetch pages to answer a user query in real time. Blocking retrieval crawlers makes your brand invisible in ChatGPT Search, Claude, and Perplexity answers today. Blocking training crawlers excludes you from tomorrow's model knowledge permanently. Most brands benefit from allowing both; the generator above defaults to allow, and the toggles let you block individually if your content policy demands it.

Where do I upload the generated robots.txt?

The file must live at the root of your domain, served at https://yourdomain.com/robots.txt with a 200 response. On Shopify, robots.txt.liquid is editable in the theme code editor since 2021 and ships at the root automatically. On WordPress, use the Yoast or Rank Math settings pane, or upload a physical file that overrides the dynamic one. On Webflow, add it in Project Settings, SEO, robots.txt. On Next.js, put it in the app directory as a robots.ts file or in public as robots.txt. Verify with curl after deploy: curl -I https://yourdomain.com/robots.txt should return 200 with Content-Type text/plain.

What is the difference between Disallow and noindex?

Disallow tells compliant crawlers not to fetch the URL. It does not remove the URL from the search index if the URL was already indexed or if other sites link to it. Google explicitly warns that Disallow is not a guarantee of exclusion from search results. For reliable exclusion, use a meta robots noindex tag on the page itself or an X-Robots-Tag HTTP header. A common mistake is disallowing a page in robots.txt, which then prevents the crawler from seeing the noindex tag inside the page, leaving the URL in the index indefinitely. Pick one method and use it correctly.

How does Cloudflare affect my robots.txt?

Cloudflare began default-blocking AI crawlers on new domains in July 2025 via its AI Scrapers and Crawlers bot-management rule. If your apex sits behind Cloudflare with that rule on, your robots.txt Allow directives are overridden at the edge: the crawler receives a 403 before robots.txt is read. To allow AI crawlers while keeping Cloudflare proxy on, go to Security, Bots, AI Scrapers and Crawlers and turn the block off, or allowlist specific user agents in the WAF. Verify after: curl -A Mozilla/5.0 (compatible; GPTBot/1.0) https://yourdomain.com/ should return 200.

Does a larger Disallow list hurt my SEO?

No, provided the rules are intentional. A tight Disallow list on admin paths, internal search endpoints, tag archives, and checkout pages is standard and recommended. Over-blocking legitimate pages hurts crawl efficiency and can erase rankings. Common mistakes: disallowing /wp-content/ on WordPress blocks the CSS and JS Google needs to render the page, triggering mobile-usability warnings; disallowing /products/ or entire collection paths on Shopify removes indexable revenue pages. The platform presets above ship with defaults that avoid these traps.

Does this tool save my data?

No. Every value you enter lives in memory for this browser tab only. Nothing is transmitted to a server, stored in a database, or synced. Close the tab and the data is gone. The Copy button puts the text on your clipboard and the Download button writes a local file; those are the only output paths. The tool uses localStorage to remember your last platform preset, which you can clear by wiping browser data.

§ 09 · ship crawl policy site-wide

One file. Not enough.

Our SEO engagements ship a full crawl policy: robots.txt, XML sitemaps, canonical tags, meta robots, X-Robots-Tag headers, hreflang clusters, and an AI-crawler allow-list audit. Written plan in 2 weeks.

See SEO services ← Browse tools

related tools