Proprietary Engine Tags

The following search engines, web directories, and indexing programs all use proprietary tags not supported by other sites or software. In fact, some of these tags are no longer supported by the engines that promoted them, but are still listed here in case you're wondering about a META tag you noticed in some obscure corner of the Web.

Aesop.com

Aesop.com is a second-rate search engine operated by the search-engine optimization company of the same name. It uses one custom tag for auto-categorization and rank boosting: Any page with name="Aesop" receives a category icon next to its listing, and preferred placement ahead of sites without the tag.

The Aesop tag has six potential values: information, interactive, links, multimedia, personal, and sales. A web page may only have one value in its Aesop tag.

For more information about this search engine, read Websnob's review of Aesop.

Reference: The Aesop Meta Tag

AlkalineBOT

The Alkaline Search Engine Server is a commercial search engine sold by Vestris, Inc. Alkaline's robot, AlkalineBOT, recognizes one custom tag, name="Alkaline", with four potential values.

skip tells AlkalineBOT not to index the page in question. (The same effect as using noindex with the robots tag.)

skiplinks tells AlkalineBot not to index the links on the page (much like name="robots" content="nofollow" works.)

skipmeta tells AlkalineBot not to index the other META headers of the page.

skiptext tells AlkalineBot not to index the "free text" of the page. This instruction effectively tells AlkalineBot to index only the META headers.

Reference: Alkaline Robots, HTML and Meta Tags

BlogChalking.com

BlogChalking isn't actually a search engine; it's a site that encourages weblog authors to include a standardized mini-description of themselves on their home pages in order to faciliate easy indexing of weblogs. They use a combination of graphics, user-readable text, and META tagging.

Blogchalking.com's META value is name="blogchalk", and includes the weblog author's location, languages spoken, first name, sex, and age category. For example, the blogchalk for my personal weblog would be:

<meta name="blogchalk" content="United States, Michigan, Trenton, English, Michael, Male, 31-35">

Frankly, I think that looks more like the classficatory data for a dating service than a search engine, but I'm not a hardcare weblogger. There is at least one small search engine using blogchalk data.

Reference: none (The only way to get Blogchalking.com to show you a sample tag is too fill out their sample form.)

Blogfeed

Blogfeed isn't a search engine; it's a Perl script that helps bloggers create RSS feeds by parsing the HTML pages of blogs. In addition to reading the standard Description, Blogfeed requires users to add three META tags:

name="rss:link" is the URL of the blog. (In theory, Blogfeed could get the same information from a BASE element -- bloggers have a bad habit of creating META tags when they could just parse the HTML.)

name="rss:language" identifies the language of the blog using ISO abbreviations. (This value could also be placed in another HTML element -- the HTML element itself has a lang attribute that uses the same abbreviations.)

name="rss:image" is the full URI for a graphic file to be associated with the blog's RSS feed.

References: Blogfeed home page

ChristCENTRAL.com

ChristCENTRAL claims to be developing a Google-like index of Christian websites. I'm not sure if this engine is vaporware, or just a joke that's not funny.

ChristCENTRAL's web robot, ChristCrawler, utilizes one custom META tag. name="ChristCrawler" takes the values noarchive, nofollow, and noindex, treating them exactly the same as if they were used in the robots META tag. Using the ChristCrawler tag allows web authors to give ChristCrawler one set of instructions, and all the other engines a different set.

Reference: ChristCrawler Description and Information

Dipsie.com

A vaporware search engine claiming to index more of the web than any other engine, Dipsie uses one proprietary tag: name=dipsie.bot. The content value is an integer from 1 to 30, indicating how many days the page author wants Dipsie's webcrawler to wait between visits. (If the dipsie.bot tag isn't used, Dipsie claims it will default to visiting once every seven days.)

Reference: Dipsie: dipsie.bot - The Dipsie Web Crawler

Feedster.com

Feedster is a search engine that indexes syndicated RSS content feeds. The have one proprietary META tag, name="feedsteridentity". It's content is a 12- character alphanumeric identifier (essentially, a customer identification number) that allows Feedster members to add information to their site's listings in Feedster.

Reference: Feature #7: Identity, Identity, Identity

fdse

fdse is the robot for the Fluid Dynamics Search Engine, a commercial product written in Perl by Zoltan Milosevic.

fdse supports fdse-description, fdse-keywords, fdse-PICS, fdse-Refresh, and fdse-Robots as synonyms for the traditional, non-prefixed versions of the same tags. If given both versions of a tag, fdse prefers the fdse- versions, allowing web authors to give fdse separate instructions.

Reference: Support for proprietary FDSE-Keywords, FDSE-Description, FDSE-Robots META headers

Fireball.de

The German search engine Fireball is currently powered by AltaVista, but used to support a wide range of META tags, including German translations of well-known English tags, and parts of the Dublin Core.

beschreibung, dc.description, adoccom, and abstract were treated as synonyms for Description.

schlusselwort, mathdmv.keywords, subject, dc.subject, gegenstand were used as synonyms for Keywords.

autor, owner, author-corporate, author-personal, mathdmv.author, dc.creator, dc.contributor, contributor were treated as synonyms for Author.

audience, zielgruppe are synonyms for site-index.pl's Distribution.

eigentmer and urheberrechte are synonyms for Copyright.

herausgeber and dc.publisher were used as synonyms for Publisher.

pagetype, page-type, objecttype, object-type, and resourcetype were used to identify the function of the page: Anleitung, Anzeige, Bericht, Bild, Buch, Download, Email-Archiv, FAQ, Forschungsbericht, Foto, HTML-Formular, Karte, Katalog, Kleinanzeige, Link-Liste, Plan, Private Homepage, Produktinfo, Reportage, Software, Sound, Statistik, Verzeichnis, and Video.

Fireball's other unique tag, page-topic had the synonyms pagetopic, thema and seitenthema. page-topic classified a web page into one of 25 categories: Bauen, Bildung, Branche, Dienstleistung, Erotik, Forschung, Gesellschaft, Kultur, Medien, Medizin, Politik, Produkt, Recht, Reise, Religion, Sexualität, Spiel, Sport, Technik, Tourismus, Umwelt, Verwaltung, Wirtschaft, Wissenschaft, and Wohnen. Multiple topics could be separated by spaces.

References: Documentation to: Metadata indexing and searching in large search services and Dreamwarrior - GILs Generator.

Geocities.com

For a period in the late 1990s (I'm not sure of the exact date), Geocities allowed its members to self- classify their pages use name="mytopic" and the categories of its "GeoAvenues" directory. The content of the tag was a colon-separted hierarchy, such as Society:Religion:Buddhism. Sites could be listed at the second- or third-level of the hierarchy.

During the period that mytopic was used, the top-level categories of GeoAvenues were: Arts & Literature, Autos, Business & Money, Campus Life, Computers & Technology, Entertainment, Family, Health, Home & Living, People & Chat, Society, Sports & Recreation, Travel, and Women. (Geocities made an HTML error in implementing the mytopic tag: They didn't tell users to character-encode the ampersand when it appeared in a topic name. That's bad HTML.)

Geocities itself no longer uses this tag, but it hangs around on old Geocities pages (and even a few pages that began on Geocities, then moved elsewhere).

References: none available

Geotags.com

The Geotags search engine organizes web resources by the geographic regions those resources apply to. It uses four proprietary METAtags.

name="geo.location" has been deprecated in favor of geo.position.

name="geo.region" is an optional tag that indicates the nation and state/province of the geographic location, using ISO region abbreviations.

name="geo.placename" is a optional tag containing a human-readable, unqualified name for the geographic location.

name="geo.position" indicates the latitude and longitude of the geographic location associated with the resource. This tag is mandatory for inclusion in Geotags.

Geotags claims to be using ht://dig as its indexing software. I don't know if it recognizes the htdig META tags.

Most of Geotags's META tags are also used by Syndic8.

Reference: Geo Tag Elements and GeoSearch Add Tags

GeoURL.org

GeoURL is a "location-to-URL reverse directory" which allows users to find websites by clicking on maps. Really.

To be listed in GeoURL, a site must use a meta tag stating the site's longitude and latitude. GeoURL currently reads its own name="ICBM", as well as Geotag's geo.location tags, which use the same range of content values. GeoURL also recognizes the dc.title value from The Dublin Core Metadata Scheme -- if dc.title exists on a page, GeoURL uses it instead of the HTML TITLE.

The blog-search engine Blizg also reads the name="ICBM" tag, but not the geo.location tag.

Reference: GeoURL ICBM Address Server.

Gigablast.com

A pale imitation of Google, Gigablast is a one-man operation that started utilizing some custom (and U.S.-centric) META values in September 2003. Their current META set includes:

name="author" is the unqualifed name of the page's author. (Some other engines use author as well.)

name="classification" is a comma-separated list of keywords. (In fact, I'm not sure how Gigablast differentiates keywords and classification, reinforcing my belief that classification is a useless meta tag.)

name="city" is a comma-separated list of unqualified municipality names (which may include abbreviations and alternate names of the same city).

name="country" is the unqualifed name of a nation. (The use of unqualified text strings, instead of pre-qualified abbreviations is going to be a problem for Gigablast. People don't always agree about the names of countries. If you don't believe me, call the nearest Chinese consulate and ask them their opinion of Taiwan.)

name="language" is an unqualifed string identifing the language of the web page.

name="state" is the unqualified name of the (U.S.) state the website is best associated with.

name="zipcode" is a comma-separated list of five-digit United States ZIP codes.

For more information about Gigablast, read Websnob's review of Gigablast.

References: Gigablast Demo Page, Search Engine Forums: Metadata : A comeback?.

Google.com

The most sucessful search engine today, Google utilizes one custom META tag. name="Googlebot" is used to supplement (or replace, if necessary) the traditional ROBOTS tag, allowing web authors to single out Google's robot for special instructions. Googlebot accepts four values:

noindex, nofollow, and noarchive are treated exactly the same as they are when appearing in the robots tag.

nosnippets is unique to Google, and changes how Google displays a description for a page in search results. Normally, Google will "snip" and display one or two lines (which match the words the Google user searched for) from the page, creating what some webmasters derisively call "the ransom note results". Using nosnippet instructs Google to omit the snippet. In most cases, this means the page in question will have no description at all, but if the page in question has an Open Directory description, that description is still displayed.

References: Google: Remove Content from Google's Index.

htdig

ht://Dig is volunteer-supported indexing engine distributed under the GNU license. It is one of the oldest continually-developed web-indexing programs, and can read several proprietary META tags.

name="htdig-keywords" provides additional keywords for htdig to index. (htdig also recognizes the standard Keywords tag).

name="htdig-noindex" tells htdig not to index the page. (htdig also recognizes the standard robots tag).

The remaining three custom tags are part of ht://dig's notification system. name="htdig-email" contains one or more e-mail addresses, separated by commas. name="htdig-notification-date" is a date in YYYY-MM-DD format. name="htdig-email-subject" is the subject line ht://dig should use when e-mailing the addresses listed in htdig-email. (ht://dig's notification system is used to remind particpating authors when to update time-sensitive documents. When htdig-notification-date is reached, ht://dig will automatically send a reminder (with the title htdig-email-subject) to the address(es) listed in htdig-email.

htdig can also be configured to use the Description and Date tags. The format of Date is user-configurable, and will be displayed in search results as a "last modified" date.

Reference: ht://dig: Recognized META information in HTML documents

html-to-rfc.pl

Part of the "Cheap HTML Parser" utilities released by Jim Davis in 1994, this Perl script converts an HTML document to an ASCII document formatted according to the IETF instructions for RFC writers. While it's not really an indexing script, html-to-rfc.pl is noteworthy as an early user of META tags, and as the probable first implementation of the Author and Date tags. html-to-rfc.pl used four proprietary tags for formatting its ASCII output.

name="author" is the author's name, surname first.

name="date" is the month and year of publication. Unlike most implementations of the date tag, the month is expressed as the full English name.

name="status" indicates the document's status as an Internet Draft or RFC. This is an IETF-controlled vocabulary.

name="title" is used to provide a title for the ASCII version of the document. (Why it wouldn't be the same as the HTML version is a mystery to me.)

Reference: Cheap HTML Parser in Perl

Inktomi Enterprise Search

Inktomi Enterprise Search is the customizable search engine (formerly known as Ultraseek) that Inktomi sells to various business and educational clients. It can be configured to rank search results according to the last modification date of the document, but that date can be overriden with a name="date" tag, with content being a date in YYYY-MM-DD format.

I don't know if "the real Inktomi" will acknowledge a date tag, but at least one of it client sites, Hotbot, does offer the option of filtering results by date.

Reference: Custom Inktomi: Ranking by date

MapleSquare.com

Maple Square was a search engine for Canadian content. In addition to the standard Description and Keywords tags, Maple Square used a name="Location" tag. The content for Location was "Country, Province, City", using the two-letter ISO abbreviations for countries and the the Canadian postal abbreviations for provinces.

Reference: Maple Square - How to Include Your Site

MOMspider

Roy Fielding's MOMspider is a web robot designed in 1993/1994 to help maintain large-scale websites that have multiple authors. Not only is MOMspider the first web robot to use META headers, it's the reason META headers were created by Fielding.

In its default configuration, MOMspider looked for http-equiv="expires" and two custom tag values: http-equiv="Owner" was an unqualified name of the page's author and http-equiv="Reply-To" was the owner's e-mail address. http-equiv="keywords" is mentioned in an example of user-configurable headers, but it is unknown if anybody ever used this header with MOMspider.

Reference: MOMspider -- Making Document Metainformation Visible

NetInsert

NetInsert is a Sweden-based web directory founded in 1998. In addition to the typical Description and Keywords tags, NetInsert uses several proprietary values.

name="netinsert" contains a string of numbers identifying the category the page should be listed in. Websites cannot be listed in NetInsert without this tag.

name="news" may contain a 128 characters of news about the site. When NetInsert detects this tag, it will add a small news icon to the site's listing.

name="expire" (not to be confused with http-equiv="Expires") tells NetInsert when to remove a page from the directory.

name="e-mail" is the e-mail address NetInsert should cache and contact if it has trouble accessing the web page.

name="revision" is used to identify revisions of a web document. NetInsert's instructions don't specify the format for the content of this tag, but it's allowed 64 characters, so it's probably just an unqualified text descriptor.

name="revisit" is apparently inspired by SearchBC's revisit value, but only accepts intervals in days, not week or months. NetInsert also accepts the semi-mythical Revisit-After value as a synonym for revisit.

For more information about NetInsert, read Websnob's review of NetInsert.

Reference: NetInsert - Meta Tags

SearchBC

SearchBC is a Vancouver-based search engine that only indexes sites in the province of British Columbia. In additon to recognizing a large range of common META tags, SearchBC's robot has two proprietary tags.

name="VW96.ObjectType" expanded on a suggestion by Dublin Core developers, and defined the purpose of the web document. Legal values are: Homepage, FAQ, RFC, Document, World, RealWorld, Index, Magazine, Mall, Dictionary, Archive, SearchEngine, Hypercatalog, Keybank, Manual, Book, Database, Journal, Catalog, Linecard, and HOWTO.

name="revisit" instructed SearchBC how often to reindex a web page. The content value was an integer followed by one of the keywords days, weeks, or months. SearchBC does not obey this tag anymore.

References: VW.96 schema description and META Tag Builder.

SiftGroups

SiftGroups, according to its manufacturer, is an "outsourced online community and vertical portal suite of technology and services". That sounds like marketdroid-speak to me, but what do I know about community-building?

Anyway, SiftGroup's search engine (which is based on Inktomi Enterprise Search) recognizes name="date", name="class", and name="subject". The date tag is used just as Inktomi uses it, while class and subject use operator-defined vocabularies to classify pages by use and topic.

Reference: Sitegroups User Guide: Inktomi

site-index.pl

Little remembered today, Robert Thau's site-index.pl played a pivotal role in the spread of META tags when it was released in 1994. Thau's script (intended for use only by webserver administrators) parsed META headers of local pages to create a site.idx file that could be submitted to ALIWEB.

Although no one uses site-index.pl or ALIWEB in 2002, Thau's script left an undeniable legacy: It was the script that Description and Keywords were invented for. The popularity of site-index.pl in the mid-1990s encouraged web authors to use those META tags, which in turn encouraged AltaVista to use them when it debuted in 1995.

Most of site-index.pl's (and ALIWEB's) metadata labels were based on an expired Internet Draft by the Internet Anonymous FTP Archives (IAFA) working group. In addition to the now-ubiquitous Description and Keywords tags, site-index.pl supported five other values that weren't adopted by any search engines.

name="iafa-description" name="iafa-keywords" were used in the original version of site-index.pl in place of Description and Keywords. After the creation of the more generic terms, site-index.pl continued to support these values as synonyms of the more popular tags.

name="Distribution" was used by site-index.pl to determine whether or not a page could be submitted to Aliweb. Distribution originally had two content values: global pages could be included in Aliweb or any other index. local pages could only be submitted to "local" indexes belonging to the organization that published the page. (Some sources cite a third value, IU for "Internal Use", which I gather to mean that a page shouldn't be included in any search engines. IU is not implemented in any version of site-index.pl I've seen; it may have been added by an imitator like item-index.pl.)

name="Resource-Type" had two values. Document was used for most "normal" (non-interactive, non-dynamic) web pages, while Service was used for search engines, feedback forms, and other pages that provide more than static text. (iafa-type was also accepted as a synonym for Resource-Type.)

References: Thau's first annoucement of the site-index.pl and Thau's announcement of a revised version. (The second annoucement introduces Description and Keywords.)

Suntek

Suntek Computer Systems Ltd is a Hong Kong-based company specializing in bilingual (English and Chinese) search software. Their engine and robot is used by several governments, universities, and companies in China.

Suntek's software META support is open-ended (it will read any tag the engine operator tells it to), but it's mentioned here for one reason: It only accepts name="date" tags whose content uses a specific format.

Reference: Suntek's Metatag Support

Syndic8.com

Syndic8 is a searchable directory of syndicated content on the Web. It uses HTML META tags to associate metadata with a web page's syndicated content (which may actually be syndicated in a non-HTML format like RSS).

name="dmoz.id" is used to identify the Open Directory Project category a page is listed in. (The ODP itself doesn't have anything to do with this tag value.) The content for this tag is the category's file path on dmoz.org.

name="geo.country" identifies the country a feed originates from. Content is the two-letter ISO abbreviation for the country.

name="geo.placename" is an unqualified, human-readable name identifying the geographic origin of the content feed.

name="geo.position" identifies the geographic origin of a content feed using longitude and latitude. (Geotags uses this too.)

name="tgn.id" and name="tgn.name" use The Paul J. Getty Museum's Thesaurus of Geographic Names to identify the geographic focus of a content feed. The content values will be an integer and human-readable name, respectively.

Reference: Syndic8.com: All About Feed Metadata