Thursday 29 April 2010

XML sitemaps

Search engines such as Google can make use of XML sitemaps to discover content. Such sitemaps are useful if a site:

  • Has dynamic content
  • Has pages that aren't easily discovered during the crawl process (e.g. rich media)
  • Is new and has few links to it
  • Has a large archive of content pages that are not well linked to each other, or are not linked at all

The XML sitemap protocol is an open standard defined by http://www.sitemaps.org.

According to the specification:

  • You can provide multiple sitemap files
  • Each sitemap file can only contain 50,000 URLs
  • Site map files must be < 10 MB
  • Compression is allowed
  • Multiple sitemap files should be listed in a sitemap index file

The location of a sitemap file is important to what URLs can be contained in it.

“A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.” – see http://www.sitemaps.org/protocol.php#location

See http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=156184 for Google specifics.

Thursday 29 April 2010