The ROBOTS name value of the META element was proposed and codified by participants of Distributed Indexing/Searching Workshop that the World Wide Web Consortium sponsored in May, 1996.
The "robots meta tag", as it's come to be known, was originally created for use by end-users who don't have access to their web SERVErs robots.txt file. Since its creation, the tag has been unilaterally extended to provide instructions (to robots) that robots.txt cannot.
Providing robot instructions using META tags requires the simplest form of META tags:
<meta name="ROBOTS"
content="value">
Where value is replaced by one or more instructions (keywords) which provide directions for spidering robots. Multiple instructions are separated by commas. The original robots tag specification only provided six instructions for spiders (ALL, FOLLOW, INDEX, NOINDEX, NOFOLLOW, and NONE), but a few search engines have "extended" the specification to include new instruction values. Instructions known to Websnob as of April 2004 (in order of usefulness) are:
Technically, the values of the content attribute in a meta tag are case sensitive, so that "NOINDEX", "NoIndex", and "noindex" are separate instructions, but no robot has been verified as making this distinction. Websnob recommends using the all-caps version (like the original specification did) just to be pedantic.
The META element is a valid element in every version of HTML since HTML 2.0.
Support for robot exclusion through meta tagging isn't quite universal, but most major (and some minor) search agents respect the instructions provided by the meta ROBOTS tag. The search services and software companies listed here have publically stated their support for the tag standard, as of April 2004.
Alkaline's page about Alkaline Robots, HTML and Meta Tags says that it honors the original robot meta tag specification, and mentions NOINDEX and NOFOLLOW explictedly.
Alkaline also recognizes a name="alkaline" tag for pages that need to provide Alkaline with specific instructions. This tag has a different set of labels than than the ROBOTS tag.)
ChristCrawler's Information Page states that it honors NOINDEX, NOFOLLOW, and NOARCHIVE. The NOARCHIVE value will prevent ChristCENTRAL from providing cached copies of pages to its users. (Webmasters wishing to selectively block ChristCrawler may instead use NOINDEX, NOFOLLOW, and NOARCHIVE as content values for a name=ChristCrawler meta element.)
FAST's Web Crawler FAQ reveals that FAST-Webcrawler honors NOINDEX and NOFOLLOW.
The Fluid Dynamics Search Engine (a commercial search engine script) mentions its support of the robots tag on its features page.
Google Information for Webmasters includes an explanation of their META tag usage. Googlebot honors NOFOLLOW, NOINDEX, and NOARCHIVE when used with name=ROBOTS or name=GOOGLEBOT.
Note: Some webmasters have claimed that using the NOARCHIVE value led to their pages dropping in rank in Google search results. Google has not confirmed this.
ht://Dig FAQ includes a question about the ROBOTS tag, explaining that htdig honors NOINDEX and NOFOLLOW.
Strangely enough, The Internet Archive's removal information doesn't mention the robots meta tag, but its partner site, Alexa does. Alexa's "For Webmasters" page states that ia_archiver honors NOINDEX, NOFOLLOW, and NOARCHIVE.
Linkwalker's
technical specifications claim that the Linkwalker robot Obeys all
robot protocols
.
mozDex: robot states that mozDex's robot understands the original six robots values.
Objects Search's "About Robot" page says their agent honors the original six values of the robots tag.
Psbot (an image- searching bot) honors NOINDEX and NOFOLLOW.
Canadian Content - Spider explicitly says that RoboCrawl understands NOINDEX, but dosn't mention any of the other values.
AltaVista "Avoiding the Index" tutorial states that Scooter honors NOINDEX, NOFOLLOW, and NOIMAGEINDEX. The third value (only honored by Scooter) prevents the images on a tagged page from being included in AltaVista's Image Search. AltaVista created (and used) NOIMAGECLICK, but AltaVista's help section no longer lists it as a supported label.
Inktomi still provides search results to Hotbot and some other sites. Inktomi's Spam Removal Guidelines FAQ recommends the robots meta tag to webmasters.
Szuckacz's robot information says that the szukacz robot honors NOINDEX and NOFOLLOW.
Verticrawl's page about the capacities of the Verticrawl crawler claim the Verticrawl robot honors "index/follow, etc."
VWbot's home page (which doesn't appear to have been updated since 1996) briefly mentions (in the last paragraph), "VWbot obeys the Robot Exclusion Protocol, together with the more recent META tag robot control".
The Page Exclusion section of Webinator's manual says Thunderstone's robot (which they offer for download, in addition to using for their own Thunderstone Web Site Catalog) honors NOINDEX, NOFOLLOW, and NONE.
Whizbang's crawler information page says their bot will " honor any robots exclusion directives that might be placed in <META> tags of individual pages". I take that to mean it recognizes the six labels from the original specification.
The Yuntis Web Robot Help page explains that the experimental Yuntis robots understands all the labels from the original specification, as well as ARCHIVE, NOARCHIVE, SERVE, and NOSERVE.