Robots.txt Tester | Free RFC 9309 In-Browser Matcher

RFC 9309 longest-match in your browser. Paste a robots.txt; pick a user agent; test one URL or up to 50 URLs at once. The matcher returns Allow or Disallow per URL, names the matching rule, and explains the precedence. No fetch — paste the robots.txt content directly.

robots.txt content

Try:

URL(s) to test (one per line, up to 50)

User agent

Sources used by this matcher

RFC 9309 — Robots Exclusion Protocol (longest-match algorithm + UA group precedence)
Google Search Central robots.txt spec
Google's open-source robotstxt parser — reference implementation in C++ (Apache 2.0)

The matcher runs entirely in JavaScript on your device. No robots.txt content is sent anywhere; the source links above open in a new tab as reference documentation only.

Privacy: matcher runs in-browser only. Paste content stays on your device.

§ 02 · how the matcher decides

Three rules. In this order.

Rule 1 — UA group selection. A robots.txt has multiple "groups", each starting with one or more User-agent: lines. For the UA you're testing, the matcher picks the most-specific UA-named group that matches; if no UA-named group matches, it falls back to the wildcard User-agent: * group. UA matching is substring-based per Google's spec — Googlebot-Image matches a group named Googlebot as well as one named Googlebot-Image, but the more specific name wins.

Rule 2 — Longest-match. Inside the selected group, the matcher walks every Allow and Disallow rule, finds those whose pattern matches the URL path, and picks the longest. Disallow: / (length 1) loses to Allow: /products/ (length 10). Wildcards (*) match any sequence including zero characters; the dollar sign ($) anchors to the end of the path.

Rule 3 — Tie-breaker. If Allow and Disallow patterns match at exactly the same length, Allow wins. This is the IETF spec; some older parsers got this wrong, but RFC 9309 (published 2022) and Google's open-sourced parser both implement Allow-wins-on-tie.

Edge cases the matcher handles: empty Disallow: (no rule body) means "no path is disallowed for this group" — the rule is a no-op. Disallow: / blocks the entire site. Allow: / with no Disallow above it is redundant but harmless. Comments (lines starting with # or trailing after a value) are stripped before parsing. Crawl-delay: is parsed but ignored — it's a non-standard directive Google doesn't honor and the IETF spec does not include.

The Sitemap: directive is global to the file (not group-scoped) and gets surfaced separately at the bottom of the result panel. Search engines read sitemap URLs from robots.txt as a discovery hint independent of any UA group.

§ 03 · when to use this

Four jobs this tester does.

Job 1: Pre-deploy check. Before pushing a robots.txt change to production, paste the new content and walk a representative sample of URLs through it. Common mistake: a too-broad Disallow: /search that also blocks /search/sitemap.xml — caught here in 2 seconds. Add the explicit Allow: /search/sitemap.xml and re-test.

Job 2: Multi-UA audit. A site might want to allow Googlebot but block GPTBot, allow Bingbot but block Bytespider. The matcher's UA dropdown lets you flip through the user agents in seconds. The companion Robots.txt Generator builds the syntax; this tool tests it.

Job 3: Migration regression. When migrating from one platform to another (Magento → Shopify, WordPress → Webflow), the new platform's default robots.txt almost always differs. Paste the old one, paste the new one, run both with the same URL set — diff the verdicts. Anything that flipped from Allow to Disallow on a high-value URL needs fixing before cutover.

Job 4: Indexation triage. Search Console reports "Submitted URL blocked by robots.txt" for a page you thought was crawlable. Paste the live robots.txt and the URL here; the matcher tells you which rule blocks it. Fix the rule, redeploy, re-fetch in Search Console.

For sibling tools: Robots.txt Generator for building new files, Lighthouse Score Checker for the live audit, Website Audit for the four-category scorecard including indexation checks.

§ 04 · questions

Six questions users ask.

What matching algorithm does this use?

RFC 9309 longest-match. For a given user agent, the matcher finds the most-specific group (UA-named beats wildcard *), collects Allow and Disallow rules, then for the test URL: longest matching rule wins; if Allow and Disallow have equal length, Allow wins. Wildcards (*) match any sequence including zero characters; the dollar sign ($) anchors to the end of the path. This is the same algorithm Googlebot uses (Google open-sourced their implementation in 2019).

Why test in-browser instead of using Google Search Console?

Three reasons. First: you can test a robots.txt before publishing — Search Console only tests live, deployed robots.txt. Second: you can test arbitrary user agents (GPTBot, ClaudeBot, Bytespider, Bingbot) with one form, not just Googlebot. Third: you can test multiple URLs in a single session without re-uploading. The trade-off: we don't access your Search Console data, so we can't tell you what Google has actually fetched.

Which user agents are pre-loaded?

16 common UAs across three categories: search engine crawlers (Googlebot, Googlebot-Image, Googlebot-News, Bingbot, DuckDuckBot, Yandex, Baiduspider); AI training crawlers (GPTBot, ClaudeBot, Google-Extended, CCBot, Bytespider, Applebot-Extended); AI retrieval crawlers (PerplexityBot, OAI-SearchBot, ChatGPT-User, Claude-User). You can also type a custom UA name. The matcher handles the standard precedence: a UA-named group beats a wildcard * group, regardless of order.

How do wildcards work?

Asterisk * matches any sequence of zero or more characters. Disallow: /search/* blocks /search/, /search/foo, /search/foo/bar. Disallow: /*.pdf$ blocks any URL ending in .pdf (the $ anchors to end of path). Disallow: /*?* matches any URL containing a query string. Wildcards do not span path segments by default in the strict sense, but Google's implementation (which the IETF spec mirrors) treats them as fully-greedy. Test edge cases here before deploying — the algorithm has subtleties.

Does this tool fetch anything?

No. You paste the robots.txt content directly. To get a site's live robots.txt, visit https://yoursite.com/robots.txt in any browser tab and copy the content. The matcher then runs entirely in JavaScript on your device. No fetch, no proxy, no log on Digital Heroes servers.

Can I test how my robots.txt affects an entire crawl?

Yes — paste up to 50 URLs (one per line) in the URL box. The matcher runs each through the rule set and returns a verdict per URL plus a summary count. This is faster than testing one URL at a time when auditing a CMS or migrated site for accidental Disallow patterns.

§ 05 · related tools

Build then test.

tool

Indexation issue? 30-min call.

Robots.txt + sitemap + canonical + indexation audits sit at the intersection of SEO and engineering. A 30-minute call walks the full sequence and a fixed-price quote.

Book a 30-minute call ← All tools

Three rules. In this order.

Four jobs this tester does.

Six questions users ask.

Build then test.

Robots.txt Generator

XML Sitemap Generator

Structured Data Validator

Lighthouse Score Checker

Website Audit

SEO service

Indexation issue? 30-min call.