Websnob > META Tags > ROBOTS META tag

The `ROBOTS` `META` Tag

History and Purpose

The ROBOTS name value of the META element was proposed and codified by participants of Distributed Indexing/Searching Workshop that the World Wide Web Consortium sponsored in May, 1996.

The "robots meta tag", as it's come to be known, was originally created for use by end-users who don't have access to their web SERVErs robots.txt file. Since its creation, the tag has been unilaterally extended to provide instructions (to robots) that robots.txt cannot.

Syntax

Providing robot instructions using META tags requires the simplest form of META tags:

<meta name="ROBOTS" content="value">

Where value is replaced by one or more instructions (keywords) which provide directions for spidering robots. Multiple instructions are separated by commas. The original robots tag specification only provided six instructions for spiders (ALL, FOLLOW, INDEX, NOINDEX, NOFOLLOW, and NONE), but a few search engines have "extended" the specification to include new instruction values. Instructions known to Websnob as of April 2004 (in order of usefulness) are:

NOINDEX: Instructs a search engine not include a page in its index (search results).
NOFOLLOW: Instructs a search engine not to follow any of the links in a page.
NOARCHIVE: Instructs a search engine not to provide archived copies of a page to its users.
NOIMAGEINDEX: Is similar to NOFOLLOW, but instructs a search engine not to index images found on the page. (This instruction is currently honored by AltaVista's "scooter" robot, Ditto, and MSN Search.)
NOMEDIAINDEX: The image-searching engine Ditto is widely reported to accept NOMEDIAINDEX as a synonym for NOIMAGEINDEX, although I can't find any mention of that on Ditto's site. MSN Search also uses this value.
NONE: Instructs a search engine not to do anything with a page. In theory, this is the same as instructing NOINDEX,NOFOLLOW,NOARCHIVE,NOIMAGEINDEX,NOSERVE", but some search engines may only consider it equivilent to NOINDEX,NOFOLLOW". NONE cannot be combined with any other instruction.
FOLLOW: Lets a search engine know that it's OK to follow links on a page. Since this is the default for most engines, FOLLOW isn't really necessary, although some webmasters include it when using NOINDEX or NOARCHIVE, just to be explicit about everything.
INDEX: Tells a robot that it's permitted to include the page in its search results. Much like NOFOLLOW, this instruction is probably redundant.
ALL: Only exists because the originators of the robots meta tag specification felt NONE needed a logical opposite. ALL (which can't be combined with any other instruction) tells a search robot to do whatever the hell it wants with a page. Nobody needs to use this instruction; it's just telling the robot to do what it was going to do anyway.
NOSERVE: This label is used by Yuntis as a synonym for NOARCHIVE. (My guess is it's a case of independent invention: Somebody at Yuntis didn't know about NOARCHIVE, and thought up NOSERVE.) Yuntis also supports NOARCHIVE, so just ignore NOSERVE. I'd bet there are no pages anywhere using NOSERVE.
SERVE: Used to tell Yuntis that it's permitted to serve archived copies of pages to users. Not really necessary, since that's the default.
ARCHIVE: The logical opposite of NOARCHIVE, but only explictly mentioned by one of the four archiving engines, Yuntis.
NOIMAGECLICK: AltaVista used to use this label for its image search robot. In its original form AltaVista Image Search produced search results consisting of thumbnail images which linked directly to the full-size image on the image's home server. Using NOIMAGECLICK in the META ROBOTS element of the page that inlined (contained) the image would instruct AltaVista to instead link the page. AltaVista has since made linking to the page the default behavior, so NOIMAGECLICK is redundant. (It's not even mentioned in AltaVista's help section anymore.)

Technically, the values of the content attribute in a meta tag are case sensitive, so that "NOINDEX", "NoIndex", and "noindex" are separate instructions, but no robot has been verified as making this distinction. Websnob recommends using the all-caps version (like the original specification did) just to be pedantic.

The META element is a valid element in every version of HTML since HTML 2.0.

Agents Known to Honor the `ROBOTS` Tag

Support for robot exclusion through meta tagging isn't quite universal, but most major (and some minor) search agents respect the instructions provided by the meta ROBOTS tag. The search services and software companies listed here have publically stated their support for the tag standard, as of April 2004.

Alkaline

Alkaline's page about Alkaline Robots, HTML and Meta Tags says that it honors the original robot meta tag specification, and mentions NOINDEX and NOFOLLOW explictedly.

Alkaline also recognizes a name="alkaline" tag for pages that need to provide Alkaline with specific instructions. This tag has a different set of labels than than the ROBOTS tag.)

ChristCrawler (ChristCENTRAL.com)

ChristCrawler's Information Page states that it honors NOINDEX, NOFOLLOW, and NOARCHIVE. The NOARCHIVE value will prevent ChristCENTRAL from providing cached copies of pages to its users. (Webmasters wishing to selectively block ChristCrawler may instead use NOINDEX, NOFOLLOW, and NOARCHIVE as content values for a name=ChristCrawler meta element.)

FAST-Webcrawler (Alltheweb.com)

FAST's Web Crawler FAQ reveals that FAST-Webcrawler honors NOINDEX and NOFOLLOW.

fdse

The Fluid Dynamics Search Engine (a commercial search engine script) mentions its support of the robots tag on its features page.

Googlebot (Google.com)

Google Information for Webmasters includes an explanation of their META tag usage. Googlebot honors NOFOLLOW, NOINDEX, and NOARCHIVE when used with name=ROBOTS or name=GOOGLEBOT.

Note: Some webmasters have claimed that using the NOARCHIVE value led to their pages dropping in rank in Google search results. Google has not confirmed this.

htdig

ht://Dig FAQ includes a question about the ROBOTS tag, explaining that htdig honors NOINDEX and NOFOLLOW.

ia_archiver (webarchive.org)

Strangely enough, The Internet Archive's removal information doesn't mention the robots meta tag, but its partner site, Alexa does. Alexa's "For Webmasters" page states that ia_archiver honors NOINDEX, NOFOLLOW, and NOARCHIVE.

Linkwalker (twentyfourseven.com)

Linkwalker's technical specifications claim that the Linkwalker robot Obeys all robot protocols .

mozDex (mozdex.org)

mozDex: robot states that mozDex's robot understands the original six robots values.

ObjectsSearch

Objects Search's "About Robot" page says their agent honors the original six values of the robots tag.

Psbot

Psbot (an image- searching bot) honors NOINDEX and NOFOLLOW.

RoboCrawl (canadiancontent.net)

Canadian Content - Spider explicitly says that RoboCrawl understands NOINDEX, but dosn't mention any of the other values.

Scooter (AltaVista.com)

AltaVista "Avoiding the Index" tutorial states that Scooter honors NOINDEX, NOFOLLOW, and NOIMAGEINDEX. The third value (only honored by Scooter) prevents the images on a tagged page from being included in AltaVista's Image Search. AltaVista created (and used) NOIMAGECLICK, but AltaVista's help section no longer lists it as a supported label.

Slurp (Inktomi.com)

Inktomi still provides search results to Hotbot and some other sites. Inktomi's Spam Removal Guidelines FAQ recommends the robots meta tag to webmasters.

szukacz (Szuckacz.pl)

Szuckacz's robot information says that the szukacz robot honors NOINDEX and NOFOLLOW.

Verticrawl

Verticrawl's page about the capacities of the Verticrawl crawler claim the Verticrawl robot honors "index/follow, etc."

VWbot (Vancouver-webpages.com)

VWbot's home page (which doesn't appear to have been updated since 1996) briefly mentions (in the last paragraph), "VWbot obeys the Robot Exclusion Protocol, together with the more recent META tag robot control".

Webinator (Thunderstone.com)

The Page Exclusion section of Webinator's manual says Thunderstone's robot (which they offer for download, in addition to using for their own Thunderstone Web Site Catalog) honors NOINDEX, NOFOLLOW, and NONE.

WhizBang

Whizbang's crawler information page says their bot will " honor any robots exclusion directives that might be placed in <META> tags of individual pages". I take that to mean it recognizes the six labels from the original specification.

Yuntis

The Yuntis Web Robot Help page explains that the experimental Yuntis robots understands all the labels from the original specification, as well as ARCHIVE, NOARCHIVE, SERVE, and NOSERVE.

[an error occurred while processing this directive]

The ROBOTS META Tag