SEO Updated February 5, 2026

Sitemap

An XML file that lists all important URLs on a website, helping search engines and AI crawlers discover and prioritize content for indexing.

A sitemap is your website’s table of contents for machines, ensuring that search engines and AI crawlers know exactly which pages exist, when they were last updated, and how important they are relative to one another.

What is a Sitemap?

A sitemap is a file, typically in XML format, that provides a structured list of URLs on a website along with metadata about each page. It serves as a roadmap for search engine crawlers and AI bots, enabling them to discover and prioritize content more efficiently than relying solely on link-following.

Types of Sitemaps

TypeFormatPurposePrimary Audience
XML Sitemap.xmlMachine-readable URL listingSearch engines, AI crawlers
HTML Sitemap.htmlHuman-readable page directoryWebsite visitors
Image Sitemap.xml with image tagsLists images for indexingImage search engines
Video Sitemap.xml with video tagsLists video contentVideo search platforms
News Sitemap.xml with news tagsLists recent news articlesGoogle News
Sitemap Index.xmlPoints to multiple sitemapsCrawlers on large sites

XML Sitemap Structure

A standard XML sitemap follows a well-defined structure:

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://genrank.co/</loc>
    <lastmod>2026-02-01</lastmod>
    <changefreq>weekly</changefreq>
    <priority>1.0</priority>
  </url>
  <url>
    <loc>https://genrank.co/glossary/sitemap</loc>
    <lastmod>2026-02-05</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.8</priority>
  </url>
  <url>
    <loc>https://genrank.co/blog/aeo-guide</loc>
    <lastmod>2026-01-20</lastmod>
    <changefreq>monthly</changefreq>
    <priority>0.7</priority>
  </url>
</urlset>

URL Tag Properties

Each <url> element supports four properties:

PropertyRequiredDescription
locYesThe full URL of the page
lastmodRecommendedDate of last modification (YYYY-MM-DD format)
changefreqOptionalExpected change frequency (daily, weekly, monthly, etc.)
priorityOptionalRelative importance within your site (0.0 to 1.0)

Important note: changefreq and priority are hints, not directives. Search engines may ignore them entirely. Google has stated publicly that it does not use changefreq or priority. The lastmod date is far more useful, but only if it accurately reflects when the page content was last meaningfully updated.

Creating and Managing Sitemaps

Automatic Generation

Most modern CMS platforms and static site generators create sitemaps automatically:

  • WordPress - Built-in since version 5.5, or via plugins like Yoast SEO
  • Astro - @astrojs/sitemap integration
  • Next.js - Built-in sitemap generation in App Router
  • Shopify - Automatically generated for all stores
  • Webflow - Auto-generated with each publish

Sitemap Best Practices

Include only indexable pages. Every URL in your sitemap should be a page you want indexed. Do not include pages blocked by robots.txt, marked with noindex, or returning non-200 status codes.

Keep lastmod accurate. Only update the lastmod date when the page content has actually changed in a meaningful way. Artificially updating dates erodes crawler trust.

Stay within limits. A single sitemap file can contain a maximum of 50,000 URLs and must not exceed 50MB uncompressed. For larger sites, use a sitemap index file.

Use absolute URLs. All <loc> values must be fully qualified URLs including the protocol (https://).

Submit to search engines. After creating your sitemap, submit it through Google Search Console and Bing Webmaster Tools. Also reference it in your robots.txt file.

Referencing in Robots.txt

User-agent: *
Allow: /

Sitemap: https://genrank.co/sitemap.xml

Adding the Sitemap: directive to robots.txt ensures that any crawler reading your robots.txt file also discovers your sitemap, regardless of whether you have submitted it through a webmaster tools interface.

Sitemap Index Files

For websites with more than 50,000 URLs or multiple content types, a sitemap index file organizes multiple sitemaps under a single reference point.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://genrank.co/sitemap-pages.xml</loc>
    <lastmod>2026-02-01</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://genrank.co/sitemap-blog.xml</loc>
    <lastmod>2026-02-05</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://genrank.co/sitemap-glossary.xml</loc>
    <lastmod>2026-02-05</lastmod>
  </sitemap>
</sitemapindex>

This approach makes management easier and allows different sections of your site to update independently.

Common Sitemap Mistakes

Including non-canonical URLs. If a page has a canonical tag pointing to a different URL, only the canonical URL should appear in the sitemap.

Listing blocked pages. URLs that are disallowed in robots.txt or marked noindex should not be in the sitemap. This sends conflicting signals.

Stale lastmod dates. Never set all pages to today’s date as a bulk update. This destroys the signal value of lastmod and may cause crawlers to distrust your dates entirely.

Missing the sitemap entirely. Smaller sites sometimes skip sitemaps assuming internal linking is sufficient. While internal linking is important, a sitemap provides an explicit, comprehensive URL list that removes ambiguity.

Forgetting to update after changes. Adding new pages, removing old ones, or restructuring your site requires corresponding sitemap updates. Automated generation handles this, but manual sitemaps can fall out of sync.

Why It Matters for AEO

Sitemaps play a direct role in Answer Engine Optimization by ensuring that AI crawlers and retrieval systems can discover and prioritize your most valuable content.

AI crawler discovery. AI crawlers, like traditional search crawlers, use sitemaps to discover pages they might otherwise miss through link-following alone. A comprehensive, well-maintained sitemap ensures your AEO-optimized content does not go undiscovered by AI systems.

Content prioritization signals. The lastmod date in your sitemap tells AI crawlers which content is freshest. Since AI systems increasingly value current information, accurate lastmod dates help ensure your recently updated content is re-crawled and re-indexed promptly.

Comprehensive coverage. For sites with large glossaries, resource libraries, or blog archives, a sitemap guarantees that deep pages with high informational value are surfaced to AI crawlers. These are often the exact pages most likely to be cited in AI-generated answers.

Indexation efficiency. AI crawlers operate under their own crawl budget constraints. A well-structured sitemap helps these crawlers spend their budget on your most important pages rather than wasting requests on low-value URLs. This efficiency translates directly into better AI indexation coverage and, ultimately, more opportunities for your content to be cited in AI answers.

Related Terms