What is an XML Sitemap?
An XML sitemap is a file that lists all the important URLs on your website in a format search engines can easily read. Think of it as a roadmap that guides search engine crawlers to every page you want indexed. While search engines can discover pages through links, a sitemap ensures that even pages with few internal links get found.
XML sitemaps follow a standardized protocol defined at sitemaps.org and look like this:
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page</loc>
<lastmod>2026-01-15</lastmod>
<changefreq>weekly</changefreq>
<priority>0.8</priority>
</url>
</urlset>
Why Sitemaps Matter for SEO
XML sitemaps are especially valuable in these situations:
- Large websites — Sites with thousands of pages benefit enormously, as crawlers might miss deep pages without a sitemap.
- New websites — New sites have few external links, making it harder for crawlers to discover all pages.
- Sites with poor internal linking — If some pages are orphaned (no internal links point to them), a sitemap ensures they're still found.
- Sites that change frequently — News sites, e-commerce stores, and blogs benefit from sitemaps with
lastmoddates that tell crawlers which pages have been updated. - Sites with rich media — Image and video sitemaps help search engines discover media content they might otherwise miss.
What to Include in Your Sitemap
Your sitemap should include:
- All indexable, canonical pages (200 status code, no noindex tag)
- The canonical version of each URL only (not duplicates)
- Pages you actually want to rank in search results
Your sitemap should NOT include:
- Pages blocked by robots.txt
- Pages with
noindexmeta tags - Redirect URLs (301, 302)
- Error pages (404, 500)
- Non-canonical duplicate URLs
- Admin, login, or private pages
Creating Your XML Sitemap
For Static Sites
You can create a sitemap manually for small sites, but this quickly becomes impractical. Online sitemap generators can crawl your site and generate one automatically.
For CMS Platforms
Most CMS platforms have built-in or plugin-based sitemap generation:
- WordPress — Core WordPress generates a basic sitemap at /wp-sitemap.xml. Plugins like Yoast SEO or Rank Math offer more control.
- Shopify — Automatically generates a sitemap at /sitemap.xml.
- Next.js — Use the built-in sitemap generation feature or packages like next-sitemap.
For Custom Sites
Generate sitemaps programmatically by querying your database for all published pages and outputting them in the XML format. Automate this process to update the sitemap whenever content changes.
Sitemap Best Practices
- Keep sitemaps under 50,000 URLs and 50MB — For larger sites, use a sitemap index file that references multiple sitemaps.
- Use absolute URLs — Always include the full URL with protocol (https://).
- Include lastmod dates — Only set these when pages are actually modified. Don't update timestamps artificially.
- Reference your sitemap in robots.txt — Add the line
Sitemap: https://yourdomain.com/sitemap.xml. - Submit to search engines — Submit your sitemap through Google Search Console and Bing Webmaster Tools.
- Keep it updated — Generate your sitemap dynamically or update it whenever you add, remove, or modify pages.
- Use UTF-8 encoding — Ensure all URLs are properly encoded.
Specialized Sitemaps
- Image sitemaps — Help Google discover images, especially useful for image-heavy sites.
- Video sitemaps — Provide metadata about video content on your pages.
- News sitemaps — Required for Google News inclusion, covering articles published in the last 48 hours.
Common Sitemap Mistakes
- Including non-canonical or redirected URLs
- Listing URLs blocked by robots.txt
- Setting all priorities to 1.0 (this makes the signal meaningless)
- Forgetting to update the sitemap when pages are added or removed
- Not submitting the sitemap to search engines
How AI SEO Powered by CGMIMM Helps
AI SEO powered by CGMIMM's Site Crawler automatically checks your XML sitemap for errors, missing pages, and inconsistencies. It identifies pages that are in your sitemap but return errors, pages that should be in your sitemap but aren't, and mismatches between your sitemap and your actual site structure. The platform integrates with Google Search Console to monitor indexing status and alert you to issues.