Staging or development sites are usually hosted on a separate domain or a subdomain of the primary site. It is a lot more common than you think to find these environments crawlable and indexed by Google. Here is a nice and easy guide on how to find if that page are causing you sitewide offsite duplication.
Why is this a issue?
Google is usually smart enough to work out which is the primary site and rank that over a staging environment, but do you want to take that chance? You do not want to send users to a half-finished site that might provide a less than great user experience or journey.
You will also need to deal with any devaluation from a onsite content perspective by having two exact versions of your site indexed.
How To find indexed staging sites
For this example, I wanted to find real world examples.
So, I headed to Google and did a quick search for “web development London”. Web design and development companies usually host their staging environments on a subdomain of their agency site.
Taking a couple of sites, I random I quickly found a whole load of staging environments.
Example 1: bondmedia.co.uk
Using the search string “site:bondmedia.co.uk -inurl:www” returns me all the indexed pages for that site that do not have www in the URL.
Here we can already see lots of examples of sites that should not be open to Google.
A site about a Buddy Holly show
A Dimond website
Note the domain of all these pages, they are all subdomains off the main primary domain and not the sites themselves.
Example 3: devstars.com
Using exactly the same process again but this time just swapping out the domain shows near exact results
How to get rid of an indexed staging site
The initial reaction of the vast majority of product teams I have ever met is to jump in and robots.txt block the offending pages out.
While this is advisable, if the pages are already indexed it creates ghost pages that are stuck in Google’s index forever.
Google can’t access the pages again to see that it shouldn’t index them so they just remain. Here is an example.
Notice the “No information is available for this page” under the search result and how one result isn’t pulling though the title tag. This happens when an indexed page is blocked via the robots file.
The best way to sort it out is to set up a GSC property, request all pages be removed and then put the block in place.
If a block is already in place then Google’s URL removal tool will be your new best friend.