All standards
recommendedTechnical· discoverability

Sitemap declared in robots.txt

robots.txt should declare your sitemap URL via 'Sitemap: <absolute-url>'. Ensures agents that read robots first can find the sitemap without guessing.

6 min read· Spec ↗· Updated 2026-04-25
On this page

What is the sitemap declared in robots.txt directive?

The sitemap_declared_in_robots check verifies that your robots.txt file explicitly declares the location of your sitemap using a Sitemap: directive. Even if you host a sitemap at the conventional /sitemap.xml path, this directive tells crawlers and agents exactly where to find your structured inventory of pages without forcing them to guess or probe common locations.

RFC 9309 §2.4 defines the syntax: each sitemap is declared on its own line, with the case-insensitive keyword Sitemap: followed by an absolute URL. Multiple Sitemap: directives are permitted, so you can declare separate sitemaps for different content types, languages, or update frequencies. The directive is independent of the User-agent blocks in your robots.txt—it applies globally to all automated agents.

Why does declaring your sitemap in robots.txt matter for AI agents?

When a new AI agent encounters your domain, it typically fetches robots.txt first to understand crawl policies. If your sitemap is declared there, the agent has an immediate, authoritative roadmap of your content structure—no need to guess /sitemap.xml, /sitemap_index.xml, or other conventions. This matters for agents like ChatGPT's web browsing plugin, Perplexity's citation crawlers, and emerging shopping agents that need to index product catalogs quickly and completely. An undeclared sitemap means the agent either misses pages entirely or wastes quota probing for common sitemap paths, potentially triggering rate limits or WAF blocks.

For agentic commerce flows—where a shopping assistant needs to enumerate available products—an explicit sitemap declaration can be the difference between full catalog coverage and a partial, stale index. Agents operating under tight token or request budgets will prioritize sites that advertise their structure up front. If your competitor declares their sitemap and you don't, citation-based search tools may favor the competitor's pages simply because they were easier to discover and index comprehensively.

This check is recommended for most sites. The robots.txt sitemap directive is a well-established convention supported by all major search engines and increasingly by AI agents. It's not strictly required—agents can fall back to probing common paths—but omitting it creates unnecessary friction and reduces discoverability.

The recommendation strengthens if you use non-standard sitemap paths, host multiple sitemaps, or operate in a competitive space where agent citation rate directly impacts traffic. It's less critical (though still beneficial) if your entire site is a single static page with no crawlable subpages, but even then, an explicit declaration costs nothing and future-proofs your setup as the site grows.

What RFC 9309 says about the sitemap directive in robots.txt

RFC 9309 §2.4 specifies the sitemap directive with the following rules:

  • Keyword: Sitemap: (case-insensitive)
  • Value: Must be an absolute URL (including protocol and domain)
  • Placement: On its own line, independent of User-agent blocks
  • Multiplicity: Multiple Sitemap: directives are allowed
  • Format: URL should point to a valid sitemap file (XML, text, or index)

Minimum valid example:

User-agent: *
Disallow:

Sitemap: https://example.com/sitemap.xml

Multiple sitemaps:

User-agent: *
Disallow: /admin/

Sitemap: https://example.com/sitemap-pages.xml
Sitemap: https://example.com/sitemap-products.xml
Sitemap: https://example.com/sitemap-blog.xml

The directive does not affect crawl permissions—it's purely informational. Agents still respect Disallow rules when crawling the declared sitemap's URLs.

What good sitemap declaration in robots.txt looks like

Companies with strong agent discoverability typically declare sitemaps clearly in robots.txt. Shopify-hosted stores and major content platforms routinely include these directives because their SEO infrastructure assumes crawlers read robots.txt first.

Example from a typical e-commerce site:

User-agent: *
Disallow: /cart
Disallow: /checkout
Disallow: /account

Sitemap: https://store.example.com/sitemap.xml

Example from a multi-region content site:

User-agent: *
Disallow: /private/

Sitemap: https://example.com/sitemap-index.xml
Sitemap: https://example.com/sitemap-en-us.xml
Sitemap: https://example.com/sitemap-en-gb.xml
Sitemap: https://example.com/sitemap-de.xml

The key pattern: absolute URLs, clear separation from disallow rules, and one sitemap per line for maintainability.

How do I add a sitemap directive to my robots.txt file?

  1. Locate or generate your sitemap. Most CMSs (WordPress, Shopify, Contentful) auto-generate sitemaps. For static sites, use a build plugin:

    • Next.js: next-sitemap package
    • Gatsby: gatsby-plugin-sitemap
    • Eleventy: @quasibit/eleventy-plugin-sitemap
  2. Edit robots.txt in your site root. If using a framework, check if it auto-generates robots.txt (Next.js does in public/robots.txt, Vercel serves it from /public).

  3. Add the directive below any existing User-agent blocks:

    User-agent: *
    Disallow: /private/
    
    Sitemap: https://yourdomain.com/sitemap.xml
    

    Replace https://yourdomain.com/sitemap.xml with your actual sitemap URL. Use the full absolute URL, not a relative path.

  4. Deploy and verify. Ensure the robots.txt is served at https://yourdomain.com/robots.txt with Content-Type: text/plain.

  5. If using a CDN (Cloudflare, Fastly), confirm the CDN doesn't cache a stale robots.txt. Purge the cache after deployment.

How can I test if my sitemap is declared in robots.txt?

curl -s https://yourdomain.com/robots.txt | grep -i sitemap

You should see one or more lines starting with Sitemap: followed by absolute URLs. If the output is empty, the directive is missing.

Or just run a free scan and we'll check this for you alongside 30+ other agent-readiness signals.

Frequently asked questions

Do I still need to declare my sitemap in robots.txt if it's at /sitemap.xml?

Yes. While /sitemap.xml is a common convention, the robots.txt sitemap directive is the authoritative way to tell agents where to find your sitemap. AI agents and crawlers check robots.txt first—explicit declaration eliminates guesswork, reduces wasted requests, and ensures comprehensive discovery even if you later move or rename your sitemap file.

Can I declare multiple sitemaps in robots.txt for different content types?

Absolutely. RFC 9309 explicitly allows multiple Sitemap: directives. Many e-commerce sites declare separate sitemaps for products, blog posts, and category pages. Each directive should be on its own line with an absolute URL. This helps agents prioritize which content to crawl first and improves indexing efficiency for large, diverse catalogs.

Does declaring a sitemap in robots.txt affect SaaS documentation discoverability?

Yes, significantly. AI agents building context for developer tools often start with robots.txt to map documentation structure. For SaaS platforms like Stripe, Twilio, or Vercel, an explicit sitemap declaration ensures API references, guides, and troubleshooting pages are discovered comprehensively. Undeclared sitemaps force agents to probe, potentially missing versioned docs or changelog sections critical for accurate code generation.

How does the robots.txt sitemap directive compare to submitting sitemaps via Google Search Console?

They serve different purposes. Search Console submission is specific to Google and requires manual configuration. The robots.txt directive is universal—any compliant crawler or agent reads it automatically. For AI agents like Perplexity, Claude, or ChatGPT web browsing, robots.txt is often the only discovery mechanism. Use both: Search Console for analytics, robots.txt for universal machine discoverability.

Will adding a sitemap directive to robots.txt help e-commerce product pages get cited by shopping agents?

Yes. Shopping agents like those powering AI-assisted commerce need fast, complete product catalog enumeration. An explicit sitemap in robots.txt gives them an immediate index of all SKUs, preventing partial crawls or stale inventory data. For competitive categories, this can be the difference between your product appearing in agent recommendations versus being overlooked entirely due to incomplete discovery.

Does Cloudflare or Vercel automatically handle sitemap declaration in robots.txt?

No. Both CDNs serve your robots.txt as-is from your deployment. You must manually add the Sitemap: directive to your robots.txt file (typically in /public/robots.txt for Next.js or root for static sites). After deployment, purge the CDN cache to ensure the updated file is served immediately. Neither platform auto-generates or modifies sitemap directives.

Is the sitemap directive case-sensitive in robots.txt?

No. RFC 9309 specifies that the Sitemap: keyword is case-insensitive, so sitemap:, SITEMAP:, and Sitemap: are all valid. However, the URL value itself is case-sensitive per standard URL rules. Best practice: use title-case Sitemap: for readability and consistency with examples in the spec and most major implementations.

Do news publishers need to declare sitemaps in robots.txt for AI citation engines?

Strongly recommended. Citation-based AI systems like Perplexity and Bing Chat prioritize sources that make content easy to discover and attribute. For news sites, declaring both article sitemaps and Google News sitemaps in robots.txt ensures breaking stories and evergreen analysis are indexed quickly. Undeclared sitemaps risk delayed discovery, reducing citation opportunities during critical news cycles when traffic and authority matter most.

Test it on your site
We check this — and 30+ other agent-readiness signals.
One scan. Per-finding evidence. Free.
Run a free scan
Related standards