The robots.txt file lives at the root of your domain (https://example.com/robots.txt) and communicates crawling instructions to search engine bots. It uses the Robots Exclusion Protocol, a widely adopted informal standard.
Basic Syntax
User-agent: *
Allow: /
Disallow: /admin/
Disallow: /private/
User-agent: Googlebot
Disallow: /staging/
Sitemap: https://example.com/sitemap.xml
Key Directives
User-agent— Specifies which bot the rules apply to.*means all bots.Disallow— Paths the bot should not crawl.Allow— Overrides a Disallow for a specific sub-path.Sitemap— Tells bots where to find the sitemap.
Critical Distinction: Disallow vs Noindex
- Disallow prevents crawling — the bot will not visit the page.
- Noindex (in a meta tag or HTTP header) prevents indexing — the bot visits but does not add to the index.
- If you Disallow a page, Google cannot see a Noindex tag on it, so it might still appear in search results if linked from elsewhere.