The importance of onsite issues should never be underestimated. The introduction of Panda in February 2011 gave clear intent from Google that website owners should keep their house in order.
The recent news that Panda is no longer a stand-alone update but now an integrated part of the overall algorithm shows that onsite issues are still vitally important.
I am still shocked by the number of websites that get this aspect of SEO wrong. Fine-tuning your website provides so many easy wins, have a look at my previous post where I showed how a couple of quick onsite edits resulted in organic traffic increased by nearly 1000%.
If your site is under optimised, over optimised, riddled with duplicate or thin content – you are literally throwing traffic out of the window.
If you worry that this might be you, do not fear as the process you need to fix the issue is easy and straightforward.
Step 1: Collect Your Indexed Pages
Teh simple test is to do a site search, for a full in-depth audit of your site, you will want to do a full check on how to find duplication.
Run a site search on Google to pull out all the indexed pages.
Skim through the Google Doc until you find something that does not belong. We are looking for:
- Thin category pages
- Login pages
- URL variations
WARNING: the main reason why I decided to write this post was that of the colleague of mine. One of our clients has a duplication issue and had a few too many indexed pages. My colleagues reasoning was that if he blocked the pages via Robots.txt the duplication would be fixed. DO NOT DO THIS.
If you block the pages via Robots.txt you are stopping the crawler accessing the pages, if you stop the crawler you cannot get the crawler to remove the pages from the index.
Step 2: NOINDEX
To remove the pages from the index, you will need to add the NOINDEX tag to the <head> section of all the URLs that you identified in the Google Doc.
The NOINDEX tag has the following format:
<META NAME=”ROBOTS” CONTENT=”NOINDEX”>
Note: I don’t use the NOFOLLOW tag as I want to crawler to keep on crawling the pages on my sites. I only use the NOFOLLOW tag in exceptional circumstances.
Step 3: Google’s Search Console
Once the NOINDEX tag has been added to the pages, the next step is to hurry the process along a little.
The first thing to do is submit a new sitemap in Google’s Search Console (formally Google’s Webmaster Tools). Adding a new sitemap will increase the likelihood that Google will re-crawl the website so they can see the NOINDEX tags.
On the left-hand navigation, click ‘Crawl’ then ‘Sitemaps’ and either resubmit or submit a sitemap.
The second action to take is to manually request the removal of the URLs within the Search Console. Please do not be lazy and just remove the URLs without adding the NOINDEX tag. The pages will drop out of the index temporarily but if you do not deal with them (NOINDEX, 301 etc), they will get indexed again. As Google state on the page:
Temporarily remove URLs that you own from search results. To remove content permanently, you must remove or update the source page.
To do this, select your site in the Search Console, and on the left-hand side navigation, select ‘Google Index’ and then ‘Remove URLs’.
Once you are there, click ‘Temporary Hide’ and paste in your URLs one by one and on the following page request their removal from the index.
There are a number of different options available to you on the removal request screen. These are:
- Temporarily hide the page from the search results and remove from cache
- Remove the page from cache only
- Temporarily hide directory
For the purpose of this exercise, we want to “Temporarily hide the page from the search results and remove from cache”.
Step 4: Robots.txt
Now that we have stopped the pages reentering the index (NOINDEX tag) and removed them from the index (Google Search Console) we can also add an extra layer of protection and stop the crawlers accessing certain sections of your site.
Step 5: Keeping it tidy
I tend to run a quick site search every month on the sites I work with. I know roughly how many pages should be indexed. If there is anything out of place, I investigate further. It only takes 30 seconds to do a site search and the benefits vastly outstrip a waste of 30 seconds.