<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
        "http://www.w3.org/TR/REC-html40/strict.dtd">
<HTML LANG="en-US">
<head profile="http://dublincore.org/documents/dcq-html/">
<TITLE>META Tag Snob: Proprietary Engine META Tags</TITLE>
<base href="http://www.bauser.com/websnob/meta/proprietary.html">
<META NAME="description" CONTENT="">
<meta name="keywords" content="meta tag, robots, search engines">  
<meta name="DCTERMS.created" scheme="DCTERMS.W3CDTF" content="2002-05-25">
<!--#exec cgi="../cgi/head.pl"-->
<script src="pop.js" type="text/javascript"></script>
</HEAD><BODY>

<p id="breadcrumbs"><a href="/websnob/" rel="Start">Websnob</a> &gt;
<a href="/websnob/meta/">META Tags</a> &gt;
<strong>Proprietary tags</strong></p>

<p class="advert"><!--#exec cgi="../adverts/ad_1.pl"--></p>

<h1>Proprietary Engine Tags</h1>

<p>The following search engines, web directories, and indexing programs all
use proprietary tags not supported by other sites or software. In fact,
some of these tags are no longer supported by the engines that promoted
them, but are still listed here in case you're wondering about a
<samp>META</samp> tag you noticed in some obscure corner of the Web.</p>

<h2 id="Aesop">Aesop.com</h2>

<p><a href="http://www.aesop.com/">Aesop.com</a> is a second-rate search
engine operated by the search-engine optimization company of the same name.
It uses one custom tag for auto-categorization and rank boosting: Any page
with <samp>name</samp>="<var>Aesop</var>" receives a category icon next to
its listing, and preferred placement ahead of sites without the tag.</p>

<p>The <var>Aesop</var> tag has six potential values:
<var>information</var>, <var>interactive</var>, <var>links</var>,
<var>multimedia</var>, <var>personal</var>, and <var>sales</var>. A web
page may only have one value in its <var>Aesop</var> tag.</p>

<p>For more information about this search engine, read <a
href="/websnob/engines/Aesop.html">Websnob's review of Aesop</a>.</p>

<p>Reference: <a href="http://www.aesop.com/metatag.htm">The Aesop Meta
Tag</a></p>

<h2 id="Alkaline">AlkalineBOT</h2>

<p>The <a href="http://alkaline.vestris.com/">Alkaline Search Engine
Server</a> is a commercial search engine sold by Vestris, Inc. Alkaline's
robot, AlkalineBOT, recognizes one custom tag,
<samp>name</samp>="<var>Alkaline</var>", with four potential values.</p>

<p><var>skip</var> tells AlkalineBOT not to index the page in question.
(The same effect as using <var>noindex</var> with the <a
href="robots.html">robots tag</a>.)</p>

<p><var>skiplinks</var> tells AlkalineBot not to index the links on the
page (much like <samp>name</samp>="<var>robots</var>"
<samp>content</samp>="<var>nofollow</var>" works.)</p>

<p><var>skipmeta</var> tells AlkalineBot not to index the other
<samp>META</samp> headers of the page.</p>

<p><var>skiptext</var> tells AlkalineBot not to index the "free text" of
the page. This instruction effectively tells AlkalineBot to index
<em>only</em> the <samp>META</samp> headers.</p>
  
<p>Reference: <a href=
"http://alkaline.vestris.com/docs/alkaline/adv-rmeta.html#ADV-RMETA-ROBOTS"
>Alkaline Robots, HTML and Meta Tags</a></p> 

<h2 id="BlogChalking">BlogChalking.com</h2>

<p><a href="http://www.blogchalking.com/">BlogChalking</a> isn't actually a
search engine; it's a site that encourages weblog authors to include a
standardized mini-description of themselves on their home pages in order to
faciliate easy indexing of weblogs. They use a combination of graphics,
user-readable text, and <samp>META</samp> tagging.</p>

<p>Blogchalking.com's <samp>META</samp> value is
<samp>name</samp>="<var>blogchalk</var>", and includes the weblog author's
location, languages spoken, first name, sex, and age category. For example,
the <var>blogchalk</var> for my personal weblog would be:</p>

<p><code>
&lt;meta name=&quot;blogchalk&quot; content=&quot;United States, Michigan,
Trenton, English, Michael, Male, 31-35&quot;&gt;
</code></p>

<p>Frankly, I think that looks more like the classficatory data for a
dating service than a search engine, but I'm not a hardcare weblogger.
There is at least <a href="http://bstpierre.org/bc/chalksearch.php">one
small search engine</a> using <var>blogchalk</var> data.</p>

<p>Reference: none (The only way to get Blogchalking.com to show you a
sample tag is too fill out their sample form.)</p>

<h2 id="Blogfeed">Blogfeed</h2>

<p><a href="http://www.kalsey.com/tools/blogfeed/">Blogfeed</a> isn't a
search engine; it's a Perl script that helps bloggers create <acronym
title="Really Simple Syndication">RSS</acronym> feeds by parsing the
<acronym title="HyperText Markup Language">HTML</acronym> pages of blogs.
In addition to reading the standard <var>Description</var>, Blogfeed
requires users to add three <samp>META</samp> tags:</p>

<p><samp>name</samp>="<var>rss:link</var>" is the <acronym title="Uniform
Resource Locator">URL</acronym> of the blog. (In theory, Blogfeed could get
the same information from a <samp>BASE</samp> element -- bloggers have a
bad habit of creating <samp>META</samp> tags when they could just parse the
HTML.)</p>

<p><samp>name</samp>="<var>rss:language</var>" identifies the language of
the blog using <acronym title="International Standard
Organization">ISO</acronym> abbreviations. (This value could also be placed
in another HTML element -- the <samp>HTML</samp> element itself has a
<samp>lang</samp> attribute that uses the same abbreviations.)</p>

<p><samp>name</samp>="<var>rss:image</var>" is the full URI for a graphic
file to be associated with the blog's RSS feed.</p>

<p>References: <a href="http://www.kalsey.com/tools/blogfeed/">Blogfeed
home page</a></p>

<h2 id="CanadianContent>CanadianContent.net</h2>

<p>A Canadian (obviously) search engine which has no proprietary tags of
its own, but <em>does</em> claim to honor the <var>revist</var> value
created by <a href="searchBC">searchBC</a>.</p>

<p>References: <a
href="http://www.canadiancontent.net/corp/spider.html">Canadian Content -
Spider</a></p>



<h2 id="ChristCENTRAL">ChristCENTRAL.com</h2>

<p><a href="http://www.christcentral.com/">ChristCENTRAL</a> claims to be
developing a Google-like index of Christian websites. I'm not sure if this
engine is vaporware, or just a joke that's not funny.</p>

<p>ChristCENTRAL's web robot, ChristCrawler, utilizes one custom
<samp>META</samp> tag. <samp>name</samp>="<var>ChristCrawler</var>" takes
the values <var>noarchive</var>, <var>nofollow</var>, and
<var>noindex</var>, treating them exactly the same as if they were used in
the <a href="robots.html">robots META tag</a>. Using the
<var>ChristCrawler</var> tag allows web authors to give ChristCrawler one
set of instructions, and all the other engines a different set.</p>

<p>Reference: <a
href="http://www.christcrawler.com/index.cfm#noindextags">ChristCrawler
Description and Information</a>

<h2 id="Dipsie">Dipsie.com</h2>

<p>A <a href="http://www.dipsie.com/">vaporware search engine</a> claiming
to index more of the web than any other engine, Dipsie uses one proprietary
tag: <samp>name</samp>=<var>dipsie.bot</var>. The <samp>content</samp>
value is an integer from 1 to 30, indicating how many days the page author
wants Dipsie's webcrawler to wait between visits. (If the
<var>dipsie.bot</var> tag isn't used, Dipsie claims it will default to
visiting once every seven days.)</p>

<p>Reference: <a href="http://www.dipsie.com/bot/">Dipsie: dipsie.bot - The
Dipsie Web Crawler</a></p>


<h2 id="Feedster">Feedster.com</h2>

<p><a href="http://www.feedster.com/">Feedster</a> is a search engine that
indexes syndicated <a href="">RSS</a> content feeds. The have one
proprietary <samp>META</samp> tag,
<samp>name</samp>="<var>feedsteridentity</var>". It's content is a 12-
character alphanumeric identifier (essentially, a customer identification
number) that allows Feedster members to add information to their site's
listings in Feedster.</p>

<p>Reference: <a href=
"http://www.feedster.com/blog/archives/142_Feature_6_Identity_Identity_Identity.html"
 >Feature #7: Identity, Identity, Identity</a></p>


<h2 id="fdse">fdse</h2> 

<p>fdse is the robot for the <a
href="http://www.xav.com/scripts/search/">Fluid Dynamics Search Engine</a>,
a commercial product written in Perl by Zoltan Milosevic.</p>

<p>fdse supports <var>fdse-description</var>, <var>fdse-keywords</var>,
<var>fdse-PICS</var>, <var>fdse-Refresh</var>, and <var>fdse-Robots</var>
as synonyms for the traditional, non-prefixed versions of the same tags. If
given both versions of a tag, fdse prefers the <var>fdse-</var> versions,
allowing web authors to give fdse separate instructions.</p>

<p>Reference: <a
href="http://www.xav.com/scripts/search/help/1013.html">Support for
proprietary FDSE-Keywords, FDSE-Description, FDSE-Robots META
headers</a></p>

<h2 id="Fireball">Fireball.de</h2>

<p>The German search engine <a href="http://www.fireball.de">Fireball</a>
is currently powered by AltaVista, but used to support a wide range of
<samp>META</samp> tags, including German translations of
well-known English tags, and parts of the Dublin Core.</p>

<p><var lang="de">beschreibung</var>, <var>dc.description</var>, <var>adoccom</var>,
and <var>abstract</var> were treated as synonyms for <var><a
href="engines.html#Description">Description</a></var>.</p>
 
<p><var lang="de">schlusselwort</var>, <var>mathdmv.keywords</var>,
<var>subject</var>, <var>dc.subject</var>, <var lang="de">gegenstand</var>
were used as synonyms for <var><a
href="engines.html#keywords">Keywords</a></var>.</p>

<p><var lang="de">autor</var>, <var>owner</var>, <var>author-corporate</var>,
<var>author-personal</var>, <var>mathdmv.author</var>,
<var>dc.creator</var>, <var>dc.contributor</var>, <var>contributor</var>
were treated as synonyms for <var><a
href="engines.html#author">Author</a></var>.</p>

<p><var>audience</var>, <var lang="de">zielgruppe</var> are synonyms for
site-index.pl's <var>Distribution</var>.</p>

<p><var lang="de">eigentmer</var> and <var lang="de">urheberrechte</var> are
synonyms for <var>Copyright</var>.</p>

<p><var lang="de">herausgeber</var> and <var>dc.publisher</var> were used
as synonyms for <var>Publisher</var>.</p>

<p><var>pagetype</var>, <var>page-type</var>, <var>objecttype</var>,
<var>object-type</var>, and <var>resourcetype</var> were used to identify
the <em>function</em> of the page: <var lang="de">Anleitung</var>, <var
lang="de">Anzeige</var>, <var lang="de">Bericht</var>, <var
lang="de">Bild</var>, <var lang="de">Buch</var>, <var
lang="de">Download</var>, <var lang="de">Email-Archiv</var>, <var
lang="de">FAQ</var>, <var lang="de">Forschungsbericht</var>, <var
lang="de">Foto</var>, <var lang="de">HTML-Formular</var>, <var
lang="de">Karte</var>, <var lang="de">Katalog</var>, <var
lang="de">Kleinanzeige</var>, <var lang="de">Link-Liste</var>, <var
lang="de">Plan</var>, <var>Private Homepage</var>, <var
lang="de">Produktinfo</var>, <var lang="de">Reportage</var>, <var
lang="de">Software</var>, <var lang="de">Sound</var>, <var
lang="de">Statistik</var>, <var lang="de">Verzeichnis</var>, and <var
lang="de">Video</var>.</p>

<p>Fireball's other unique tag, <var>page-topic</var> had the synonyms
<var>pagetopic</var>, <var lang="de">thema</var> and <var
lang="de">seitenthema</var>. <var>page-topic</var> classified a web page
into one of 25 categories: <var lang="de">Bauen</var>, <var
lang="de">Bildung</var>, <var lang="de">Branche</var>, <var
lang="de">Dienstleistung</var>, <var lang="de">Erotik</var>, <var
lang="de">Forschung</var>, <var lang="de">Gesellschaft</var>, <var
lang="de">Kultur</var>, <var lang="de">Medien</var>, <var
lang="de">Medizin</var>, <var lang="de">Politik</var>, <var
lang="de">Produkt</var>, <var lang="de">Recht</var>, <var
lang="de">Reise</var>, <var lang="de">Religion</var>, <var
lang="de">Sexualit&auml;t</var>, <var lang="de">Spiel</var>, <var
lang="de">Sport</var>, <var lang="de">Technik</var>, <var
lang="de">Tourismus</var>, <var lang="de">Umwelt</var>, <var
lang="de">Verwaltung</var>, <var lang="de">Wirtschaft</var>, <var
lang="de">Wissenschaft</var>, and <var lang="de">Wohnen</var>. Multiple
topics could be separated by spaces.</p>

<p>References: <a
href="http://www.lub.lu.se/tk/metadata/MDsearch-docu.html">Documentation
to: Metadata indexing and searching in large search services</a> and <a
href="http://www.dmedia.net/dreamwarrior/gilsgenerator.html">Dreamwarrior -
GILs Generator</a>.</p>
 

<h2 id="Geocities">Geocities.com</h2>

<p>For a period in the late 1990s (I'm not sure of the exact date), <a
href="http://www.geocities.com/">Geocities</a> allowed its members to self-
classify their pages use <samp>name</samp>="<var>mytopic</var>" and the
categories of its "GeoAvenues" directory. The <samp>content</samp> of the
tag was a colon-separted hierarchy, such as
<var>Society:Religion:Buddhism</var>. Sites could be listed at the second-
or third-level of the hierarchy.</p>
 

<p>During the period that <var>mytopic</var> was used, the top-level
categories of GeoAvenues were: <var>Arts &amp; Literature</var>,
<var>Autos</var>, <var>Business &amp; Money</var>, <var>Campus Life</var>,
<var>Computers &amp; Technology</var>, <var>Entertainment</var>,
<var>Family</var>, <var>Health</var>, <var>Home &amp; Living</var>,
<var>People &amp; Chat</var>, <var>Society</var>, <var>Sports &amp;
Recreation</var>, <var>Travel</var>, and <var>Women</var>. (Geocities made
an HTML error in implementing the <var>mytopic</var> tag: They didn't tell
users to character-encode the ampersand when it appeared in a topic name.
That's bad HTML.)</p>

<p>Geocities itself no longer uses this tag, but it hangs around on old
Geocities pages (and even a few pages that began on Geocities, then moved
elsewhere).</p>

<p>References: none available</p>

<h2 id="Geotags">Geotags.com</h2>

<p>The <a href="http://www.geotags.com/">Geotags</a> search engine 
organizes web resources by the geographic regions those resources apply to.
It uses four proprietary <samp>META</samp>tags.</p>

<p><samp>name</samp>="<var>geo.location</var>" has been deprecated in favor
of <var>geo.position</var>.

<p><samp>name</samp>="<var>geo.region</var>" is an optional tag that
indicates the nation and state/province of the geographic location, using
ISO region abbreviations.</p> 

<p><samp>name</samp>="<var>geo.placename</var>" is a optional tag
containing a human-readable, unqualified name for the geographic
location.</p>

<p><samp>name</samp>="<var>geo.position</var>" indicates the latitude and
longitude of the geographic location associated with the resource. This tag
is mandatory for inclusion in Geotags.</p>

<p>Geotags claims to be using <a href="#htdig">ht://dig</a> as its indexing
software. I don't know if it recognizes the htdig <samp>META</samp>
tags.</p>

<p>Most of Geotags's <samp>META</samp> tags are also used by <a
href="#Syndic8">Syndic8</a>.</p>

<p>Reference: <a href="http://geotags.com/geo/geotags2.html">Geo Tag
Elements</a> and <a
href="http://geotags.com/geobot/add-tags.html">GeoSearch Add Tags</a></p>

<h2 id="GeoURL">GeoURL.org</h2>

<p><a href="http://geourl.org/">GeoURL</a> is a "location-to-URL reverse
directory" which allows users to
find websites by clicking on maps. Really.</p>

<p>To be listed in GeoURL, a site must use a <samp>meta</samp> tag stating
the site's longitude and latitude. GeoURL currently reads its own
<samp>name</samp>="<var>ICBM</var>", as well as Geotag's
<var>geo.location</var> tags, which use the same range of
<samp>content</samp> values. GeoURL also recognizes the <var>dc.title</var>
value from <a href="schemas.html#DC">The Dublin Core Metadata Scheme</a> --
if <var>dc.title</var> exists on a page, GeoURL uses it instead of the HTML
<samp>TITLE</samp>.</p>

<p>The blog-search engine <a href="http://blizg.com/">Blizg</a> also reads
the <samp>name</samp>="<var>ICBM</var>" tag, but not the
<var>geo.location</var> tag.</p>

<p>Reference: <a href="http://geourl.org/add.html">GeoURL ICBM Address
Server</a>.</p>

<h2 id="Gigablast">Gigablast.com</h2>

<p>A pale imitation of Google, Gigablast is a one-man operation that
started utilizing some custom (and <abbr title="United
States">U.S.</abbr>-centric) <samp>META</samp> values in September 2003.
Their current <samp>META</samp> set includes:</p>

<p><samp>name</samp>="<var>author</var>" is the unqualifed name of the
page's author. (<a href="engines.html#author">Some other engines use
<var>author</var> as well.</a>)</p>

<p><samp>name</samp>="<var>classification</var>" is a comma-separated list
of keywords. (In fact, I'm not sure how Gigablast differentiates
<var>keywords</var> and <var>classification</var>, reinforcing my belief
that <a href="useless.html#classification"><var>classification</var> is a
useless <samp>meta</samp> tag</a>.)</p>

<p><samp>name</samp>="<var>city</var>" is a comma-separated list of
unqualified municipality names (which may include abbreviations and
alternate names of the same city).</p>

<p><samp>name</samp>="<var>country</var>" is the unqualifed name of a
nation. (The use of unqualified text strings, instead of pre-qualified
abbreviations is going to be a problem for Gigablast. <em>People don't
always agree about the names of countries.</em> If you don't believe me,
call the nearest Chinese consulate and ask them their opinion of
Taiwan.)</p>

<p><samp>name</samp>="<var>language</var>" is an unqualifed string
identifing the language of the web page.</p>

<p><samp>name</samp>="<var>state</var>" is the unqualified name of the
(<abbr title="United States">U.S.</abbr>) state the website is best
associated with.</p>

<p><samp>name</samp>="<var>zipcode</var>" is a comma-separated list of
five-digit United States <acronym title="Zone Improvement
Plan">ZIP</acronym> codes.</p>

<p>For more information about Gigablast, read <a
href="/websnob/engines/Gigablast.html">Websnob's review of
Gigablast</a>.</p>

<p>References: <a href="http://www.gigablast.com/tagsdemo.html">Gigablast
Demo Page</a>, <a
href="http://www.webmasterworld.com/forum16/1099.htm?highlight=gigablast"
>Search Engine Forums: Metadata : A comeback?</a>.</p>


<h2 id="Google">Google.com</h2>

<p>The most sucessful search engine today, <a
href="http://www.google.com/">Google</a> utilizes one custom
<samp>META</samp> tag. <samp>name</samp>="<var>Googlebot</var>" is used to
supplement (or replace, if necessary) the traditional <a
href="robots.html">ROBOTS tag</a>, allowing web authors to single out
Google's robot for special instructions. <var>Googlebot</var> accepts four
values:</p>

<p><var>noindex</var>, <var>nofollow</var>, and <var>noarchive</var> are
treated exactly the same as they are when appearing in the
<samp>robots</samp> tag.</p>

<p><var>nosnippets</var> is unique to Google, and changes how Google
displays a description for a page in search results. Normally, Google will
"snip" and display one or two lines (which match the words the Google user
searched for) from the page, creating what some webmasters derisively call
"the ransom note results". Using <var>nosnippet</var> instructs Google to
omit the snippet. In most cases, this means the page in question will have
<em>no description at all</em>, but if the page in question has an Open
Directory description, that description is still displayed.</p>

<p>References: <a
href="http://www.google.com/remove.html#exclude_pages">Google: Remove
Content from Google's Index</a>.</p> 
 
<h2 id="htdig">htdig</h2>

<p><a href="http://www.htdig.org/">ht://Dig</a> is volunteer-supported
indexing engine distributed under the GNU license. It is one of the oldest
continually-developed web-indexing programs, and can read several
proprietary <samp>META</samp> tags.</p>
 
<p><samp>name</samp>="<var>htdig-keywords</var>" provides additional
keywords for htdig to index. (htdig also recognizes the standard
<samp><a href="engines.html#keywords">Keywords</a></samp> tag).</p>

<p><samp>name</samp>="<var>htdig-noindex</var>" tells htdig not to index
the page. (htdig also recognizes the standard <samp>robots</samp> tag).</p> 

<p>The remaining three custom tags are part of ht://dig's notification
system. <samp>name</samp>="<var>htdig-email</var>" contains one or more
e-mail addresses, separated by commas.
<samp>name</samp>="<var>htdig-notification-date</var>" is a date in
YYYY-MM-DD format. <samp>name</samp>="<var>htdig-email-subject</var>" is
the subject line ht://dig should use when e-mailing the addresses listed in
<var>htdig-email</var>. (ht://dig's notification system is used to remind
particpating authors when to update time-sensitive documents. When
<var>htdig-notification-date</var> is reached, ht://dig will automatically
send a reminder (with the title <var>htdig-email-subject</var>) to the
address(es) listed in <var>htdig-email</var>.</p>

<p>htdig can also be configured to use the <var><a
href="engines.html#Description">Description</a></var> and <var>Date</var>
tags. The format of <var>Date</var> is user-configurable, and will be
displayed in search results as a "last modified" date.</p>   

<p>Reference: <a href="http://www.htdig.org/meta.html">ht://dig: Recognized
META information in HTML documents </a></p>

<h2 id="html-to-rfc.pl">html-to-rfc.pl</h2>

<p>Part of the "Cheap HTML Parser" utilities released by Jim Davis in 1994,
this Perl script converts an HTML document to an ASCII document formatted
according to the <a href="http://www.faqs.org/rfcs/rfc1543.html"><acronym
title="Internet Engineering Task Force">IETF</acronym> instructions for
<acronym title="Request For Comments">RFC</acronym> writers</a>. While it's
not really an indexing script, html-to-rfc.pl is noteworthy as an early
user of <samp>META</samp> tags, and as the probable first implementation of
the <var>Author</var> and <var>Date</var> tags. html-to-rfc.pl used four
proprietary tags for formatting its ASCII output.</p>

<p><samp>name</samp>="<var>author</var>" is the author's name, surname
first.</p>

<p><samp>name</samp>="<var>date</var>" is the month and year of
publication. Unlike most implementations of the <var>date</var> tag, the
month is expressed as the full English name.</p>

<p><samp>name</samp>="<var>status</var>" indicates the document's status as
an Internet Draft or RFC. This is an IETF-controlled vocabulary.</p> 

<p><samp>name</samp>="<var>title</var>" is used to provide a title for the
ASCII version of the document. (Why it wouldn't be the same as the HTML
version is a mystery to me.)</p>

<p>Reference: <a href=
"http://ftp.ics.uci.edu/pub/websoft/libwww-perl/contrib/jdavis/html-parser.html"
>Cheap HTML Parser in Perl</a></p>

<h2 id="InktomiEnterprise">Inktomi Enterprise Search</h2>

<p><a href="http://www.inktomi.com/products/search/">Inktomi Enterprise
Search</a> is the customizable search engine (formerly known as Ultraseek)
that Inktomi sells to various business and educational clients. It can be
configured to rank search results according to the last modification date
of the document, but that date can be overriden with a
<samp>name</samp>="<var>date</var>" tag, with <samp>content</samp> being a
date in YYYY-MM-DD format.</p>

<p>I don't know if "the real Inktomi" will acknowledge a <var>date</var>
tag, but at least one of it client sites, <a
href="http://www.hotbot.com/">Hotbot</a>, does offer the option of filtering
results by date.</p>

<p>Reference: <a href="http://www.custominktomi.com/meta-date.html">Custom
Inktomi: Ranking by date</a>

<h2 id="MapleSquare">MapleSquare.com</h2>

<p>Maple Square was a search engine for Canadian content. In addition to
the standard <var><a href="engines.html#description">Description</a></var>
and <var><a href="engines.html#keywords">Keywords</a></var> tags, Maple
Square used a <samp>name</samp>="<var>Location</var>" tag. The
<samp>content</samp> for <var>Location</var> was "Country, Province,
City", using the two-letter ISO abbreviations for countries and the the
Canadian postal abbreviations for provinces.</p>

<p>Reference: <a href=
"http://web.archive.org/web/19980111004253/http://www.maplesquare.com/addsite.asp"
>Maple Square - How to Include Your Site</a></p>
  

<h2 id="MOMspider">MOMspider</h2> 

<p>Roy Fielding's <a
href="http://www.ics.uci.edu/pub/websoft/MOMspider/">MOMspider</a> is a web
robot designed in 1993/1994 to help maintain large-scale websites that have
multiple authors. Not only is MOMspider the first web robot to use
<samp>META</samp> headers, it's the reason <samp>META</samp> headers were
created by Fielding.</p>

<p>In its default configuration, MOMspider looked for
<samp>http-equiv</samp>="<var>expires</var>" and two custom tag values:
<samp>http-equiv</samp>="<var>Owner</var>" was an unqualified name of the
page's author and <samp>http-equiv</samp>="<var>Reply-To</var>" was the
owner's e-mail address. <samp>http-equiv</samp>="<var>keywords</var>" is
mentioned in an example of user-configurable headers, but it is unknown if
anybody ever used this header with MOMspider.</p>

<p>Reference: <a href=
"http://web.archive.org/web/20010822235521/http://www.ics.uci.edu/pub/websoft/MOMspider/docs/metainfo.html"
>MOMspider -- Making Document Metainformation Visible</a>
 

<h2 id="NetInsert">NetInsert</h2>

<p><a href="http://www.netinsert.com/">NetInsert</a> is a Sweden-based web
directory founded in 1998. In addition to the typical
<var>Description</var> and <var>Keywords</var> tags, NetInsert uses several
proprietary values.</p>

<p><samp>name</samp>="<var>netinsert</var>" contains a string of numbers
identifying the category the page should be listed in. Websites cannot be
listed in NetInsert without this tag.</p>

<p><samp>name</samp>="<var>news</var>" may contain a 128 characters of news
about the site. When NetInsert detects this tag, it will add a small news
icon to the site's listing.</p>

<p><samp>name</samp>="<var>expire</var>" (not to be confused with
<samp>http-equiv</samp>="<var>Expires</var>") tells NetInsert when to
remove a page from the directory.</p>

<p><samp>name</samp>="<var>e-mail</var>" is the e-mail address NetInsert
should cache and contact if it has trouble accessing the web page.</p>

<p><samp>name</samp>="<var>revision</var>" is used to identify revisions of
a web document. NetInsert's instructions don't specify the format for the
<samp>content</samp> of this tag, but it's allowed 64 characters, so it's
probably just an unqualified text descriptor.</p>

<p><samp>name</samp>="<var>revisit</var>" is apparently inspired by <a
href="#SearchBC">SearchBC</a>'s <var>revisit</var> value, but only accepts
intervals in days, not week or months. NetInsert also accepts the
semi-mythical <var><a href="useless.html#revisit-after"
>Revisit-After</a></var> value as a synonym for <var>revisit</var>.</p>

<p>For more information about NetInsert, read <a
href="/websnob/engines/NetInsert.html">Websnob's review of
NetInsert</a>.</p>

<p>Reference: <a href="http://www.netinsert.com/en/metatag.html">NetInsert
- Meta Tags</a></p>


<h2 id="SearchBC">SearchBC</h2>

<p><a href="http://vancouver-webpages.com/VWbot/searchBC.html">SearchBC</a>
is a Vancouver-based search engine that only indexes sites in the province of
British Columbia. In additon to recognizing a large range of common
<samp>META</samp> tags, SearchBC's robot has two proprietary tags.</p>

<p><samp>name</samp>="<var>VW96.ObjectType</var>" expanded on a suggestion
by Dublin Core developers, and defined the purpose of the web document.
Legal values are: <var>Homepage</var>, <var>FAQ</var>, <var>RFC</var>,
<var>Document</var>, <var>World</var>, <var>RealWorld</var>,
<var>Index</var>, <var>Magazine</var>, <var>Mall</var>,
<var>Dictionary</var>, <var>Archive</var>, <var>SearchEngine</var>,
<var>Hypercatalog</var>, <var>Keybank</var>, <var>Manual</var>,
<var>Book</var>, <var>Database</var>, <var>Journal</var>,
<var>Catalog</var>, <var>Linecard</var>, and <var>HOWTO</var>.</p>

<p><samp>name</samp>="<var>revisit</var>" instructed SearchBC how often to
reindex a web page. The <samp>content</samp> value was an integer followed
by one of the keywords <var>days</var>, <var>weeks</var>, or
<var>months</var>. SearchBC does not obey this tag anymore.</p>

<p>References: <a
href="http://vancouver-webpages.com/META/VW96-schema.html">VW.96 schema
description</a> and <a
href="http://vancouver-webpages.com/META/about-mk-metas.rich.html">META Tag
Builder</a>.</p>

<h2 id="SiftGroups">SiftGroups</h2>

<p><a href="http://www.startset.com/">SiftGroups</a>, according to its
manufacturer, is an "outsourced online community and vertical portal suite
of technology and services". That sounds like marketdroid-speak to me, but
what do I know about community-building?</p>

<p>Anyway, SiftGroup's search engine (which is based on <a
href="#InktomiEnterprise">Inktomi Enterprise Search</a>) recognizes
<samp>name</samp>="<var>date</var>", <samp>name</samp>="<var>class</var>",
and <samp>name</samp>="<var>subject</var>". The <var>date</var> tag is used
just as Inktomi uses it, while <var>class</var> and <var>subject</var> use
operator-defined vocabularies to classify pages by use and topic.</p>

<p>Reference: <a href=
"http://www.startset.com/documentation/user_documentation/inktomi.html#categorisation"
>Sitegroups User Guide: Inktomi</a></p>
 
<h2 id="site-index.pl">site-index.pl</h2>

<p>Little remembered today, Robert Thau's site-index.pl played a pivotal role
in the spread of <samp>META</samp> tags when it was released in 1994.
Thau's script (intended for use only by webserver administrators)
parsed <samp>META</samp> headers of local pages to create a site.idx file that
could be submitted to <a href="history.html#Aliweb">ALIWEB</a>.</p>

<p>Although no one uses site-index.pl or ALIWEB in 2002, Thau's script left
an undeniable legacy: It was the script that <var><a
href="engines.html#description">Description</a></var> and <var><a
href="engines.html#keywords">Keywords</a></var> were invented for. The
popularity of site-index.pl in the mid-1990s encouraged web authors to use
those <samp>META</samp> tags, which in turn encouraged AltaVista to use
them when it debuted in 1995.</p> 

<p>Most of site-index.pl's (and ALIWEB's) metadata labels were based on an
expired Internet Draft by the Internet Anonymous FTP Archives (IAFA)
working group. In addition to the now-ubiquitous <var>Description</var> and
<var>Keywords</var> tags, site-index.pl supported five other values that
weren't adopted by any search engines. </p>

<p><samp>name</samp>="<var>iafa-description</var>"
<samp>name</samp>="<var>iafa-keywords</var>" were used in the original
version of site-index.pl in place of <var>Description</var> and
<var>Keywords</var>. After the creation of the more generic terms,
site-index.pl continued to support these values as synonyms of the more
popular tags.</p>

<p><samp>name</samp>="<var>Distribution</var>" was used by site-index.pl to
determine whether or not a page could be submitted to Aliweb.
<var>Distribution</var> originally had two <samp>content</samp> values:
<var>global</var> pages could be included in Aliweb or any other index.
<var>local</var> pages could only be submitted to "local" indexes belonging
to the organization that published the page. (Some sources cite a third
value, <var>IU</var> for "Internal Use", which I gather to mean that a page
shouldn't be included in <em>any</em> search engines. <var>IU</var> is not
implemented in any version of site-index.pl I've seen; it may have been
added by an imitator like item-index.pl.)</p>
 
<p><samp>name</samp>="<var>Resource-Type</var>" had two values.
<var>Document</var> was used for most "normal" (non-interactive,
non-dynamic) web pages, while <var>Service</var> was used for search
engines, feedback forms, and other pages that provide more than static
text. (<var>iafa-type</var> was also accepted as a synonym for
<var>Resource-Type</var>.)</p>

<p>References: <a
href="http://www.webhistory.org/www.lists/www-talk.1994q1/0980.html">Thau's
first annoucement of the site-index.pl</a> and <a
href="http://www.webhistory.org/www.lists/www-talk.1994q2/0006.html">Thau's
announcement of a revised version</a>. (The second annoucement introduces
<var>Description</var> and <var>Keywords</var>.)</p>

<h2 id="Suntek">Suntek</h2>

<p><a href="http://www.suntek.com.hk/">Suntek Computer Systems Ltd</a> is a
Hong Kong-based company specializing in bilingual (English and Chinese)
search software. Their engine and robot is used by several governments,
universities, and companies in China.</p>

<p>Suntek's software <samp>META</samp> support is open-ended (it will read
any tag the engine operator tells it to), but it's mentioned here for one
reason: It only accepts <samp>name</samp>="<var>date</var>" tags whose
<samp>content</samp> uses a specific format.</p>

<p>Reference: <a
href="http://www.suntek.com.hk/articles/meta.html">Suntek's Metatag
Support</a></p> 

<h2 id="Syndic8">Syndic8.com</h2>

<p><a href="http://www.syndic8.com/">Syndic8</a> is a searchable directory
of syndicated content on the Web. It uses HTML <samp>META</samp> tags to
associate metadata with a web page's syndicated content (which may actually
be syndicated in a non-HTML format like <a
href="http://backend.userland.com/stories/rss091">RSS</a>).</p>

<p><samp>name</samp>="<var>dmoz.id</var>" is used to identify the <a
href="http://dmoz.org/">Open Directory Project</a> category a page is
listed in. (The <acronym title="Open Directory Project">ODP</acronym>
itself doesn't have anything to do with this tag value.) The
<samp>content</samp> for this tag is the category's file path on
dmoz.org.</p>

<p><samp>name</samp>="<var>geo.country</var>" identifies the country a feed
originates from. <samp>Content</samp> is the two-letter ISO abbreviation for the country.</p>

<p><samp>name</samp>="<var>geo.placename</var>" is an unqualified,
human-readable name identifying the geographic origin of the content
feed.</p>

<p><samp>name</samp>="<var>geo.position</var>" identifies the geographic
origin of a content feed using longitude and latitude. (<a
href="#Geotags">Geotags</a> uses this too.)</p>

<p><samp>name</samp>="<var>tgn.id</var>" and
<samp>name</samp>="<var>tgn.name</var>" use <a
href="http://www.getty.edu/research/tools/vocabulary/tgn/">The Paul J.
Getty Museum's Thesaurus of Geographic Names</a> to identify the geographic
focus of a content feed. The <samp>content</samp> values will be an integer
and human-readable name, respectively.</p>
 
<p>Reference: <a href="http://syndic8.com/help_metadata.php">Syndic8.com:
All About Feed Metadata</a></p>

<p class="advert"><!--#exec cgi="../adverts/ad_2.pl"--></p>
<!--#exec cgi="../cgi/menu.pl"-->
<!--#exec cgi="../cgi/2002"-->
</BODY></HTML>

