Sitemap
An XML file that lists all important URLs on a website, helping search engines and AI crawlers discover and prioritize content for indexing.
A sitemap is your website’s table of contents for machines, ensuring that search engines and AI crawlers know exactly which pages exist, when they were last updated, and how important they are relative to one another.
What is a Sitemap?
A sitemap is a file, typically in XML format, that provides a structured list of URLs on a website along with metadata about each page. It serves as a roadmap for search engine crawlers and AI bots, enabling them to discover and prioritize content more efficiently than relying solely on link-following.
Types of Sitemaps
| Type | Format | Purpose | Primary Audience |
|---|---|---|---|
| XML Sitemap | .xml | Machine-readable URL listing | Search engines, AI crawlers |
| HTML Sitemap | .html | Human-readable page directory | Website visitors |
| Image Sitemap | .xml with image tags | Lists images for indexing | Image search engines |
| Video Sitemap | .xml with video tags | Lists video content | Video search platforms |
| News Sitemap | .xml with news tags | Lists recent news articles | Google News |
| Sitemap Index | .xml | Points to multiple sitemaps | Crawlers on large sites |
XML Sitemap Structure
A standard XML sitemap follows a well-defined structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://genrank.co/</loc>
<lastmod>2026-02-01</lastmod>
<changefreq>weekly</changefreq>
<priority>1.0</priority>
</url>
<url>
<loc>https://genrank.co/glossary/sitemap</loc>
<lastmod>2026-02-05</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>https://genrank.co/blog/aeo-guide</loc>
<lastmod>2026-01-20</lastmod>
<changefreq>monthly</changefreq>
<priority>0.7</priority>
</url>
</urlset>
URL Tag Properties
Each <url> element supports four properties:
| Property | Required | Description |
|---|---|---|
| loc | Yes | The full URL of the page |
| lastmod | Recommended | Date of last modification (YYYY-MM-DD format) |
| changefreq | Optional | Expected change frequency (daily, weekly, monthly, etc.) |
| priority | Optional | Relative importance within your site (0.0 to 1.0) |
Important note: changefreq and priority are hints, not directives. Search engines may ignore them entirely. Google has stated publicly that it does not use changefreq or priority. The lastmod date is far more useful, but only if it accurately reflects when the page content was last meaningfully updated.
Creating and Managing Sitemaps
Automatic Generation
Most modern CMS platforms and static site generators create sitemaps automatically:
- WordPress - Built-in since version 5.5, or via plugins like Yoast SEO
- Astro -
@astrojs/sitemapintegration - Next.js - Built-in sitemap generation in App Router
- Shopify - Automatically generated for all stores
- Webflow - Auto-generated with each publish
Sitemap Best Practices
Include only indexable pages. Every URL in your sitemap should be a page you want indexed. Do not include pages blocked by robots.txt, marked with noindex, or returning non-200 status codes.
Keep lastmod accurate. Only update the lastmod date when the page content has actually changed in a meaningful way. Artificially updating dates erodes crawler trust.
Stay within limits. A single sitemap file can contain a maximum of 50,000 URLs and must not exceed 50MB uncompressed. For larger sites, use a sitemap index file.
Use absolute URLs. All <loc> values must be fully qualified URLs including the protocol (https://).
Submit to search engines. After creating your sitemap, submit it through Google Search Console and Bing Webmaster Tools. Also reference it in your robots.txt file.
Referencing in Robots.txt
User-agent: *
Allow: /
Sitemap: https://genrank.co/sitemap.xml
Adding the Sitemap: directive to robots.txt ensures that any crawler reading your robots.txt file also discovers your sitemap, regardless of whether you have submitted it through a webmaster tools interface.
Sitemap Index Files
For websites with more than 50,000 URLs or multiple content types, a sitemap index file organizes multiple sitemaps under a single reference point.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://genrank.co/sitemap-pages.xml</loc>
<lastmod>2026-02-01</lastmod>
</sitemap>
<sitemap>
<loc>https://genrank.co/sitemap-blog.xml</loc>
<lastmod>2026-02-05</lastmod>
</sitemap>
<sitemap>
<loc>https://genrank.co/sitemap-glossary.xml</loc>
<lastmod>2026-02-05</lastmod>
</sitemap>
</sitemapindex>
This approach makes management easier and allows different sections of your site to update independently.
Common Sitemap Mistakes
Including non-canonical URLs. If a page has a canonical tag pointing to a different URL, only the canonical URL should appear in the sitemap.
Listing blocked pages. URLs that are disallowed in robots.txt or marked noindex should not be in the sitemap. This sends conflicting signals.
Stale lastmod dates. Never set all pages to today’s date as a bulk update. This destroys the signal value of lastmod and may cause crawlers to distrust your dates entirely.
Missing the sitemap entirely. Smaller sites sometimes skip sitemaps assuming internal linking is sufficient. While internal linking is important, a sitemap provides an explicit, comprehensive URL list that removes ambiguity.
Forgetting to update after changes. Adding new pages, removing old ones, or restructuring your site requires corresponding sitemap updates. Automated generation handles this, but manual sitemaps can fall out of sync.
Why It Matters for AEO
Sitemaps play a direct role in Answer Engine Optimization by ensuring that AI crawlers and retrieval systems can discover and prioritize your most valuable content.
AI crawler discovery. AI crawlers, like traditional search crawlers, use sitemaps to discover pages they might otherwise miss through link-following alone. A comprehensive, well-maintained sitemap ensures your AEO-optimized content does not go undiscovered by AI systems.
Content prioritization signals. The lastmod date in your sitemap tells AI crawlers which content is freshest. Since AI systems increasingly value current information, accurate lastmod dates help ensure your recently updated content is re-crawled and re-indexed promptly.
Comprehensive coverage. For sites with large glossaries, resource libraries, or blog archives, a sitemap guarantees that deep pages with high informational value are surfaced to AI crawlers. These are often the exact pages most likely to be cited in AI-generated answers.
Indexation efficiency. AI crawlers operate under their own crawl budget constraints. A well-structured sitemap helps these crawlers spend their budget on your most important pages rather than wasting requests on low-value URLs. This efficiency translates directly into better AI indexation coverage and, ultimately, more opportunities for your content to be cited in AI answers.
Related Terms
Crawlability
SEOThe ease with which search engines and AI systems can discover, access, and navigate through a website's pages to index content for search results and data retrieval.
Indexation
SEOThe process by which search engines and AI systems discover, analyze, and store web pages in their databases, making them available for retrieval in search results and AI answers.
Internal Linking
SEOThe practice of connecting pages within your own website through hyperlinks, creating a network that helps both users and AI systems navigate content, understand relationships, and discover information.