Most of the time, an XML sitemap is set up using a plugin or extensions and instantly forgotten about. The assumption here is that it is working on autopilot and is running smoothly.
This is not always the case. Most sites have an XML sitemap that points crawlers towards pages that are canonicalised, return 404 errors or are set to NOINDEX. This can affect crawlability and indexation and can consume crawl budget.
[bctt tweet=”An un-optimised XML Sitemap can lead to issues with crawlability and indexation”]
What is an XML Sitemap?
An XML sitemap is essentially a list of URLs that the search engines use to easily find and crawl every page on a website. It is a website’s version of a book’s index.
How to conduct an XML Sitemap Audit
Checking the XML sitemap is a relevantly quick process
Collect the URLs
Let’s take Matthew Woodward XML as an example. Simply navigate to the XML page, in this case, it is located at sitemap_index.xml as he is using the Yoast SEO plugin, but sometimes it is just located at sitemap.xml.
Sitemap page: https://www.matthewwoodward.co.uk/sitemap_index.xml
As we can see, there are two active sitemaps, post-sitemap.xml and page-sitemap.xml.
Open both in a new tab and you should see the list of URLs for the crawler.
You can either use a scraper browser plugin to extract the URL list or just highlight the table and copy and paste the data into Excel.
To tidy up the data, remove the ‘Image’ and Last ‘Mod’ columns and delete row 1 with the table headers. You should be left with a list of all the URLs contained in the XML sitemap.
Crawl The URLs With Screaming Frog
Once you have an excel file with the complete list of URLs, copy and list and open up Screaming Frog.
You will need to change the mode in screaming frog from the default spider to list mode.
Then select the paste option to input the URL list saved on your clipboard.
Once the spider has finished the crawl, you will need to check the following columns:
- Status, for redirects and 404 errors
- Meta Robots 1, for NOINDEX tags
Here, we can see there are 10 URLs that return a 301 status code and one URL that returns a 404 page not found error. No pages in this example returned a NOINDEX tag.
Thank you for the advice. If it wasn’t for you i wouldn’t have known better about XML sitemap audit or what it was in the first place.
Guilty for thinking that xml sitemap always runs smoothly on its own. Thanks for your tips!
This post just reminded me now to have my XML site map in place now for my new site.I totally agree with you without a proper sitemap on a site,it difficult for search engines to crawl and indexed a page.
This is a new thing for me. I am getting curious at least. Well time to learn more of it.
It is one thing to set up a site XML sitemap it another thing to see that it is working smoothly and that’s where auditing the sitemap comes it.I hope site owners find this useful to be able to have a site that search engines can actually crawled and indexed for a long time.
this article of your helped me to get to know XML Sitemap. It widens my knowledge about this because of the facts that you have mentioned in your article.
That’s very interesting. I know basic html and css. I also want to learn this.
This is so cool. I’m very interested in web developing.
Your website has very great content. I am learning so many things on this site. Keep sharing your knowledge.
I’ve heard about sitemap before and how important it is in building your own site. This is a great thing to know, however I think I have to widen my knowledge about it.
This article is helpful. But I have one suggestion though. Perhaps it would be better if your other articles are linked to this site through your mentions just like Screaming Frog. There could be readers who just stumbled upon this article and have not read the others. This would also be good for you for they will be greater traffic to your articles.