Technical SEO

XML Sitemaps: What They Are and How to Create Them

Understand how XML sitemaps help search engines discover your content and learn how to create, optimize, and submit them properly.

What is an XML Sitemap?

An XML sitemap is a file that lists all the important URLs on your website in a format search engines can easily read. Think of it as a roadmap that guides search engine crawlers to every page you want indexed. While search engines can discover pages through links, a sitemap ensures that even pages with few internal links get found.

XML sitemaps follow a standardized protocol defined at sitemaps.org and look like this:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/page</loc>
    <lastmod>2026-01-15</lastmod>
    <changefreq>weekly</changefreq>
    <priority>0.8</priority>
  </url>
</urlset>

Why Sitemaps Matter for SEO

XML sitemaps are especially valuable in these situations:

  • Large websites — Sites with thousands of pages benefit enormously, as crawlers might miss deep pages without a sitemap.
  • New websites — New sites have few external links, making it harder for crawlers to discover all pages.
  • Sites with poor internal linking — If some pages are orphaned (no internal links point to them), a sitemap ensures they're still found.
  • Sites that change frequently — News sites, e-commerce stores, and blogs benefit from sitemaps with lastmod dates that tell crawlers which pages have been updated.
  • Sites with rich media — Image and video sitemaps help search engines discover media content they might otherwise miss.

What to Include in Your Sitemap

Your sitemap should include:

  • All indexable, canonical pages (200 status code, no noindex tag)
  • The canonical version of each URL only (not duplicates)
  • Pages you actually want to rank in search results

Your sitemap should NOT include:

  • Pages blocked by robots.txt
  • Pages with noindex meta tags
  • Redirect URLs (301, 302)
  • Error pages (404, 500)
  • Non-canonical duplicate URLs
  • Admin, login, or private pages

Creating Your XML Sitemap

For Static Sites

You can create a sitemap manually for small sites, but this quickly becomes impractical. Online sitemap generators can crawl your site and generate one automatically.

For CMS Platforms

Most CMS platforms have built-in or plugin-based sitemap generation:

  • WordPress — Core WordPress generates a basic sitemap at /wp-sitemap.xml. Plugins like Yoast SEO or Rank Math offer more control.
  • Shopify — Automatically generates a sitemap at /sitemap.xml.
  • Next.js — Use the built-in sitemap generation feature or packages like next-sitemap.

For Custom Sites

Generate sitemaps programmatically by querying your database for all published pages and outputting them in the XML format. Automate this process to update the sitemap whenever content changes.

Sitemap Best Practices

  1. Keep sitemaps under 50,000 URLs and 50MB — For larger sites, use a sitemap index file that references multiple sitemaps.
  2. Use absolute URLs — Always include the full URL with protocol (https://).
  3. Include lastmod dates — Only set these when pages are actually modified. Don't update timestamps artificially.
  4. Reference your sitemap in robots.txt — Add the line Sitemap: https://yourdomain.com/sitemap.xml.
  5. Submit to search engines — Submit your sitemap through Google Search Console and Bing Webmaster Tools.
  6. Keep it updated — Generate your sitemap dynamically or update it whenever you add, remove, or modify pages.
  7. Use UTF-8 encoding — Ensure all URLs are properly encoded.

Specialized Sitemaps

  • Image sitemaps — Help Google discover images, especially useful for image-heavy sites.
  • Video sitemaps — Provide metadata about video content on your pages.
  • News sitemaps — Required for Google News inclusion, covering articles published in the last 48 hours.

Common Sitemap Mistakes

  • Including non-canonical or redirected URLs
  • Listing URLs blocked by robots.txt
  • Setting all priorities to 1.0 (this makes the signal meaningless)
  • Forgetting to update the sitemap when pages are added or removed
  • Not submitting the sitemap to search engines

How AI SEO Powered by CGMIMM Helps

AI SEO powered by CGMIMM's Site Crawler automatically checks your XML sitemap for errors, missing pages, and inconsistencies. It identifies pages that are in your sitemap but return errors, pages that should be in your sitemap but aren't, and mismatches between your sitemap and your actual site structure. The platform integrates with Google Search Console to monitor indexing status and alert you to issues.

Ready to Improve Your SEO?

Stop reading, start ranking. AI SEO powered by CGMIMM gives you the tools to put everything you just learned into practice — automatically.

Start Your 48-Hour Free Trial

Related Articles

Technical SEO

Technical SEO: The Complete Guide

Technical SEO

Site Speed Optimization: Why It Matters for SEO

Technical SEO

Mobile-First Indexing: How to Prepare Your Site

Technical SEO

Robots.txt: How to Control Search Engine Crawling