What to do if Google indexed your development site

In my last post, I talked about how to keep Google from indexing your development or staging site. But if it’s already happened, here’s how to fix it.

Let’s say, for example, your development site is http://dev.nerdpress.net, and your correct site is https://www.nerdpress.net. If Google finds and indexes your dev site, the search results could end up looking something like this:

Dev Site showing up in search results

That’s not great, but it’ll be even worse once you remove the dev site, and the link goes to a 404-not-found error page. (And then Google would eventually catch up, and remove the link altogether.)

Heads up: The following fix requires that you basically kill your dev site. If you still want to have a dev or staging site, pick a different subdomain and put it there. Then follow the instructions here to make sure this doesn’t happen again.

First, add both domains to Google Search Console.

Go to Google Search Console (formerly Webmaster Tools) and add and verify both properties (if they’re not already added). From the main page of Search Console, hit the red “Add a Property” button and follow the steps.

For the dev site, I recommend using the “Domain name provider” verification method (it may be under the “Alternate Methods” tab). The other verification methods may not continue to work after you set up the redirects in the next step. Follow the instructions that Google provides for your DNS (Domain Name Service) provider.

Your DNS settings are probably managed by your web hosting company. However, they could also be managed by your domain registrar (such as GoDaddy or Namecheap). If you’re using Cloudflare, you’ll need to change the settings there.

Second, tell Google the content is gone, and redirect visitors from the dev site to the real site.

Update February, 2020: My new recommendation is to set up redirects for real visitors, and also set up your dev site to respond to Google (and other search engines) with a 410 “Gone” response. This tells Google that you’ve deliberately removed the content, which is a strong indicator that they should remove the URL out of the index. Google knows a 410 response is very deliberate and likely to be permanent, so they’ll update the index quickly. This will help get things cleared up in a matter of weeks, instead of months or years!

So by setting up 410 responses for Google — and 301 redirects for real visitors who click on the links to the dev site in the meantime — you get the best of both worlds. Google will remove the content faster, and in the meantime, people will find the content they’re looking for, by being redirected to your live site.

To set this up, you’ll need to modify your .htaccess file. This is a simple text file that sites in the top folder of your server (usually something like /public_html/). (Note, if your server is running Nginx instead of Apache, it’s best to ask your host to set this up for you – Nginx doesn’t use the .htaccess file.)

It actually just takes a few lines, and they should go at the very top of the .htaccess file:

# Issue a 410 "Gone" for Googlebot and other crawlers
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} Googlebot|Baidu|Bingbot|DuckDuckBot|Slurp|Yandex [NC]
RewriteRule (.*) - [R=410,L]

# Redirect other visitors to the live site
RewriteEngine on
RewriteCond %{HTTP_HOST} ^dev.yourdomain.com [NC]
RewriteRule ^(.*)$ https://www.yourdomain.com/$1 [NC,R=301,L]

The first section checks the user agent, and if it’s Googlebot (or one of the others), it’ll issue the 410 response and that’s that.

If the user gets to the second section (by virtue of not being a bot), we then check to see if request is for the dev site. If it is, we redirect the visitor to the correct domain (including the full URL).

(In case you’re wondering about the bits in [brackets]: the NC makes it case-insensitive (“No Case”),  the R=301 says it’s a redirect with the status of 301, and the L means it’s the Last rule, so don’t do anything after this.)

Of course, you’ll need to change yourdomain.com to your actual domain to get this to work.

Once it’s in place, be sure to test to be sure it redirects everything correctly (including posts/pages, not just the homepage).

You can use FTP to download, edit, and re-upload the .htaccess file. Or, if your account has cPanel, you can use the File Manger there. Be sure to change the setting to “show hidden files.”

If you’re scared to edit your .htaccess file, I recommend asking your host to take care of this for you; they should be happy to help.

Third, submit a change of address to Google.

In the top right corner of the search console, select your dev site. Then click the gear icon and select “Change of Address.”

Initiate a change of address screenshot

That will walk you through a few simple steps:

Change of Address screenshot

Finally, patience.

You’re going to need to be patient. It can take weeks or months to untangle this mess. Google has to crawl the site, follow the redirects, learn that the final URL is actually the correct page, and then update the index accordingly.

To keep tabs on how the fix is going, you can search site:dev.yoursite.com and check out the search results. Note the number of results returned. Over time, it should go down and eventually reach zero.

Filed Under:

Tagged With:

Related Posts

Comments

  1. The easiest solution to this, of course, is to not use a dev site at all, and to simply push updates directly to your live site LIKE A BOSS!

    Okay, just kidding. 😉

  2. My test website already got indexed and website is removed from server as well. In this case I won’t be able to redirect it. What is best solution to overcome this issue.

    1. Hi Phan –
      I wouldn’t necessarily say that “http” is “incorrect,” but I do agree it’s better to use https with sample code. I just updated the post. 🙂
      Thanks!

  3. Hi Andrew
    I developed a website for example ” domain.com “, I do a mistake in robots.txt to prevent google from indexing my domain ( I developed the website on “domain.com” not “dev.domain.com”). after one month I realize google has indexed 94 urls of my domain. these urls have test names and test content and I don’t want to maintain them. What should I do?
    I need to remove whole site urls and contetnts and have fresh start with these “domain.com” from scratch.

    Thanks in advance

    1. If Google has indexed the dev site, then odds are good that it’s competing with your production site in the search results. I’ve also seen cases where the Google has decided that the dev site is the canonical domain, and has essentially booted the entire production site/URLs out of the index. So if you suddenly password protected the dev URLs, you’d block legit visitor traffic from searches too.

      Http auth passwords are a great way to prevent this problem in the first place – I didn’t include that in my other post on this, though, since it’s more complicated for most people to set up.

  4. Hello, is this still the best process in 2020? We have a development website that has been indexed for a few months, and our new website went live recently, creating lots of duplicate pages and content out there.
    I’ve seen some posts that say do a 301, while others say allow for a 404 error and let Google find the correct pages on their own.

    1. Hi Nick,
      I’m glad you asked — I’ve actually been working on a better solution here, but haven’t updated the post just yet since I haven’t tested it fully.
      The better way to go would be to issue a 410 (“Content Gone”) response, but only to Googlebot. And for any other visitors, still do the 301 redirect.

      That’s the best of both worlds. When Google sees a 410, they’ll remove a URL out of the index faster than a 404, since often 404s are mistakes — a 410 sends a very clear “we’ve deliberately removed this!” message. And in the meantime, any real humans that click the link in the search results, will actually get to the content their looking for, so it’s better for users, too.

      To implement this, you’ll need to add code in the .htaccess file to check for Googlebot (before your 301 redirect code) and then issue the 410. It’ll be something along these lines (this isn’t all the code you need, but hopefully it gets you going in the right direction):

      RewriteEngine On
      RewriteCond %{HTTP_USER_AGENT} Googlebot [NC]
      RewriteRule (.*) - [R=410,L]

      Hope that helps!

    2. Hey Nick –
      I just updated the post with a slightly better version of the code (including a few other popular search engine crawlers for the 410 status code). 🙂
      Hope that helps!

  5. Hey Andrew, thanks for this helpful tutorial. We ran into this issue and followed your steps exactly to correct. However, after putting the 410 status code in along with the 301 I’m running into an error when trying to submit the change of address.

    Search console keeps saying they can’t fetch the staging site, I’m assuming because of the 410 status code, therefore, it is not allowing me to submit the change of address.

    Any ideas here?

    1. Hi Paula,
      Hmm… I’m guessing you’ll need to disable the 410 redirect temporarily, so that you can submit the change of address successfully. Once that’s done in GSC, re-enable the 410 code and I think you should be good.
      Please let us know if/how that works, and I’ll adjust the instructions accordingly!

  6. Replying to my earlier comment, but I don’t see it here. After removing the 410 we were able to submit the change of address in GSC with no problem.

    I did get the message from GSC that the change of address has “started” which made me wonder if I should leave off that 410 for now, so they have time to crawl each page and see that each one has changed address and see the 301 directives to each page on the live site. What are your thoughts? Thanks for taking the time to help!

    1. I just re-read the details of the “change of address” tool, here:
      https://support.google.com/webmasters/answer/9370220

      It does sound like if you remove the 301’s (for Google) then it might stop updating the address — but they’re also not factoring in returning 410 responses, either.

      I suppose the question now is: Are the staging URLs that are indexed in Google actually generating significant traffic and click-throughs? If so, you may actually want to keep the 301s and wait for the Change of Address tool to do its thing.

      If they’re not actually generating any significant traffic, you’re probably better off just re-enabling the 410 responses for Google, to get the URLs dropped out of the search results as fast as possible.

  7. Hey Andrew, our developer forgot to no-index our dev site, and some pages got indexed and are now ranking. I’m on WordPress so I went in and checked the box to discourage engines from indexing. A few ranking pages are still out there, does all of the information still apply in February 2021? Can i just redirect my dev link to the proper live page or should i follow all steps?

    1. Hi Garrett – Yep, this all still applies. Redirecting the dev site to the live site is a bare minimum (and if you do that, discouraging engines from indexing actually doesn’t really apply, since the redirect will happen first anyway).
      Really, it’s best to do the other steps to help speed the process.
      Good luck!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.