SA Developer .NET

Welcome to SA Developer .NET Sign in | Join | Help
in Search

Securing a web environment

Last post 08-04-2008, 8:35 by Orikuuido. 6 replies.
Sort Posts: Previous Next
  •  07-28-2008, 12:58 13725

    Securing a web environment

    Hi all,

    My parent company's branch in the U.K. has the following problem:

    We have 2 environments in the web hosting environment:

    1. Staging / QA

    2. Live / Production.

    Now, the problem is, sometimes we have to deploy changes to the UK staging environment that contain sensitive information (company acquisitions, mergers, etc) that have to go through the approval phase (spelling mistake check, formatting, etc), which is fine since only we (developers), and them (approvers, CEO's, etc) have the addresses for the staging environment (IP's, URL's etc). But then our old friend GoogleBot comes and indexes the pages, so if you searched for keywords on Google (such as the name of the company) you get references to the staging site and the whole secret goes flying out of the window.

    Is there any way I can block GoogleBot from indexing the page (using IIS to restrict the IP), or is there a better solution than this? 

    Regards,

    The H......................


    The Question is the Answer, and the Answer is the Question!
  •  07-28-2008, 13:20 13726 in reply to 13725

    Re: Securing a web environment

    Unless I'm mistaken you should add a file called "robots.txt" with the files names you don't want included... Google it you should find some help on it...

    "I would love to change the world, but they won't give me the source code"
    Meeting Place - chat online with anyone, anytime, no downloads or plugins required
  •  07-28-2008, 13:34 13728 in reply to 13726

    Re: Securing a web environment

    In the pages that have the sensitive information, I think you could put this meta tag in your code

    <meta type="robots" content="none" />

    As to whether it works or not, I don't know but like Heat_Rash mentioned, Go0o0o0o0ooGle it Smile


    I'm going to regret forging this...
  •  07-28-2008, 13:48 13731 in reply to 13728

    Re: Securing a web environment

    Hi Orikuuido,

    You were close, the code is actually:

    <META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

    to disallow any and all robots from indexing a specific page. According to Google Webmaster Support, this is the better way of doing it. The other way is to do like Heat_Rash said and have a robots.txt file that will block it.

    I think I will use the <META> tag option for all the stuff that should be eyes only.


    The Question is the Answer, and the Answer is the Question!
  •  07-29-2008, 9:26 13743 in reply to 13726

    Re: Securing a web environment

    Heat_Rash:
    Unless I'm mistaken you should add a file called "robots.txt" with the files names you don't want included... Google it you should find some help on it...

    Heat is correct in saying you can use robots.txt to block friendly web crawlers such as googlebot from indexing sensitive areas of your website because the friendly web crawlers obey the rules. (http://www.robotstxt.org/)

    The malicious web crawlers however disregard the rules and will crawl every inch of your website harvesting emails, contact details and hidden gems. I would suggest at using website authentication to prevent malicious users / crawlers from obtaining sensitive info from your website.


    SA Developer .Net Online Community Support
    Please remember to click “Mark as Answer” on the post that helps you, and to click “Unmark as Answer” if a marked post does not actually answer your question.
  •  08-04-2008, 8:33 13867 in reply to 13743

    Re: Securing a web environment

    Hi Guys,

    in LAMP circles .htaccess is a favourite. I don't know the equivalent for MS servers though, but I would go for that option.


    I'm going to regret forging this...
  •  08-04-2008, 8:35 13868 in reply to 13867

    Re: Securing a web environment

    There are two important considerations when using /robots.txt:

    • robots can ignore your /robots.txt. Especially malware robots that scan the web for security vulnerabilities, and email address harvesters used by spammers will pay no attention.
    • the /robots.txt file is a publicly available file. Anyone can see what sections of your server you don't want robots to use.

    So don't try to use /robots.txt to hide information.

    Scary?!


    I'm going to regret forging this...
View as RSS news feed in XML
Powered by Community Server (Commercial Edition), by Telligent Systems