Robots Exclusion Protocol
When a compliant Web Robot visits a site, it first checks for a "robots.txt" URL on the site. If this URL exists, the Robot parses its contents for directives that instruct the robot not to visit certain parts of the site. Directives making sense for the site can be created. Sometimes people find they have been indexed by an indexing robot, or that a resource discovery robot has visited part of a site that for some reason shouldn't be visited by robots.
In recognition of this problem, many Web Robots offer facilities for Web site administrators and content providers to limit what the robot does. This is achieved through two mechanisms:
1. The Robots Exclusion Protocol: A Web site administrator can indicate which parts of the site should not be visited by a robot, by providing a specially formatted file.
2. The Robots META tag: A Web author can indicate if a page may or may not be indexed, or analyzed for links, through the use of a special HTML META tag.
There can be a single robots.txt on a site. Specifically, "robots.txt" files should not be in user directories, because a robot will never look at them. To use the Robots Exclusion Protocol, one has to liaise with the server administrator, and get added the rules to the "robots.txt", using the Web Server Administrator's Guide to the Robots Exclusion Protocol. If the administrator is unwilling to install or modify "robots.txt" rules, and the need is to prevent being indexed by indexing robots like WebCrawler and Lycos, one can add a Robots Meta Tag to all pages not to be indexed. Note this functionality is not implemented by all indexing robots.