Skip to main content
All Collections
The XML Sitemap file: use and implementation
The XML Sitemap file: use and implementation

What is the XML sitemap used for? How to set it up effectively?

C
Written by Celina
Updated over a year ago

What is the XML sitemap for?

The XML sitemap is a document that lists URLs. It is used to manage a website’s crawl: if you’ve got a large website, the sitemap will indicate to the robots the most important pages to crawl first.

Therefore, it’s not a question of listing all the website’s URLs, but the URLs or main URL categories that you want indexed as a priority.

Is the sitemap essential?

No. 😊

If a website has only a few hundred pages, Googlebot should be fine with natural navigation. However, you must check carefully and ensure that the website has a maximum click depth of 4 clicks: knowing that Google will always favor natural navigation, avoid considering that the sitemap will correct a defective tree structure.

From 1,000 pages, a sitemap becomes useful, and from 10,000 pages it becomes necessary.

How to create a Sitemap?

  • Manually, by creating an XML file. This method is generally not recommended, unless you really know what you are doing! It’s advisable to use at least an XML editor to create this file.

  • By doing an IT development specific to the website: this method will be the most powerful and will adapt to all the websites’ issues. But it’s also more costly in terms of resources. Be careful with the URL’s maximum number in a sitemap file, you sometimes need to segment it into several files.

  • By using an automatic sitemap generation tool (can be easily found on Google). This method is tempting: very little work for a professional result! But be careful, it is not easily maintained because if something is changed, the generator must be restarted each and every time.

  • Thanks to your CMS (= website management tool): most CMS have plugins that allow us to create sitemaps, and everything is automated if an update is needed.

Rules to follow:

  • The XML file must be saved in UTF-8.

  • A sitemap can only list a maximum of 50,000 URLs and the XML file’s size must not exceed 50MB (52,428,800 bytes).

  • All URLs listed in an XML sitemap file must come from the same host, such as my-domain.com for example.

The XML Sitemap’s structure

The sitemap.xml file looks like this:

<url>

<changefreq>daily</changefereq>

<lastmod><2017-09-02TO9:04:58+00:00</lastmod>

<priority>1.00</priority>

</url>

<url>

<changefreq>daily</changefereq>

<lastmod><2015-09-02TO9:05:58+00:00</lastmod>

<priority>0.80</priority>

</url>

<url>

<changefreq>daily</changefereq>

<lastmod><2015-09-02TO9:24:53+00:00</lastmod>

<priority>0.80</priority>

</url>

</urlset>

File settings:

  • URL (in bold): required

  • Update frequency: optional

  • Last modification: optional

  • Priority: optional (0 being the lowest, and 1 being the highest)

Rules to follow:

  • Start with an opening <urlset> tag, and end with a closing </urlset> tag

  • Specify the namespace (protocol standard) in the <urlset> tag

  • Include for each URL a <url> entry as parent XML tag

  • Include a child <loc> entry for each parent <url> tag

-> If you want to know more, go to the "Good to know" part of this article (at the bottom of the page).

How to notify Google of a sitemap?

  1. Connect to the Google Webmaster Tools account, and go to "Exploration / Sitemaps".

  2. Click on the "Add / test a sitemap" button.

  3. Add the URL to the "Sitemap index", remembering to remove the domain name.

  1. Click on the "Send" button.

  2. Then, you can return regularly to this menu to see if Google indexes all the URLs sent via the Sitemap.

Our advice to optimize its use

Make sitemaps according to the types of pages (categories, products...).

For large websites that need a sitemap file, you can do 2 types:

  • 1 listing the last pages created (to try to gain indexing speed)

  • 1 per type of page (to try to measure the indexing rate by type of page, for example product sheets, categories, editorial articles, etc.)

Do sitemaps by language and / or country

If you’ve got a multilingual website, it's a good idea to separate the sitemap (or sitemaps) into several: one per language. If you’ve got several sitemaps (by page types), re-split them by languages.

If you’ve got a website which targets several countries, again you should separate them.

In both cases, the idea is to facilitate the study of the indexed pages’ rate, according to the types of pages, languages and countries.

Technical information:

  • XML sitemap: the <urlset> tag

The <urlset> tag is required. It gathers the sitemap file and references the protocol standard used.

  • XML sitemap: the <url> tag

The <url> tag is also required. It represents the parent tag for each referenced URL.

  • XML sitemap: the <loc> tag

The <loc> tag is the last of the three required tags. It represents the page’s URL. It must begin with the protocol’s name (http://, https://), and can’t be more than 2048 characters long.

  • XML sitemap: the <lastmod> tag

The <lastmod> tag is optional. It gives us the date of the file / page’s last modification. This date must be in W3C date and time format. For the sake of simplicity, the format YYYY-MM-DD is generally used.

  • XML sitemap: the <changefreq> tag

The <changefreq> tag is also optional. It represents the frequency of the page’s modification. This value provides general information to search engines, and is considered as an indication, not a command. While search engine crawlers can take this information into account, they don't necessarily apply it strictly.

Accepted values are: "always", "hourly", "daily", "weekly", "monthly", "yearly" and "never".

The value "always" should be used to describe documents that change with each access. The value "never" should be used to describe URLs considered to be archived.

  • XML sitemap: the <priority> tag

The <priority> tag is the last of the three optional tags. It represents a page’s priority compared to others on the website. The accepted values are between 0.0 and 1. By default (without the <priority> tag), the page’s priority is equal to 0.5.

This value is only used to report to search engines the pages that you think are most important to crawlers.

Check out similar articles:

The Google Search Console coverage report:

Have you found your answer?

Did this answer your question?