Coding Tips

Robots.txt File – No Website Should Be Without One

by Jerry West
Updated April 15, 2005

The Robots Exclusion Protocol -- Robots.txt File
by Jerry West

When a search engine spider or robot visits a web site if first checks for the presence of a robots.txt file. If this file is found, the search engine spider or robot will analyze the contents of the file for:

User-agent: *
Disallow: /

The Robots Exclusion Protocol is a method that allows website administrators to indicate which parts of their site should NOT be visited by a search engine robot.

There can only be one robots.txt file per domain. If you have users with sub-domains you must either merge all information to the one robots.txt file or instruct your users to use the Robots Meta Tag.

The robots.txt file is case sensitive and you should use all lowercase letters.

What To Put Into the robots.txt file

The "robots.txt" file usually contains a record looking like this:

User-agent: *
Disallow: /cgi-bin/
Disallow: /temp/
Disallow: /images/

In the above example, three directories are excluded. You need to separate the "Disallow" line for each directory.

A good source is: The Robots Text Pages.

If you wish to check the syntax of your robots.txt file, visit:

The Robot.txt Syntax Checker


Robots.txt File Facts

  • if it is present, search engines will obey it
  • without a robots.txt file Google will not index your site as deep
  • you cannot exclude "bad sites" using a robots.txt file as bad sites ignore the file
  • exclude your images folder to not allow the search engines (like Yahoo! and Google) to grab your images for their image directory

© 2000 - 2005,
Jerry West is the Director of Internet Marketing for WebMarketingNow. He has been consulting on the web since 1996 and has assisted hundreds of companies gain an upper-hand over their competition. Visit Web Marketing Now for the latest in marketing tips that are tested and proven.

The above article can be reproduced on your site or e-zine as long as the signature file.

Article Search Phrases: robots.txt, robots txt, exclude search engines, disallow search engines