Search Engine Review: Teoma.com

A beta version of the Teoma search engine was unveiled in May 2001 by the company of the same name. Founded by researchers from Rutgers University, both Teoma the company and Teoma the search engine were "built to be sold", and immediately began looking for more established company to sell out to. Four months after unveiling their beta site, Teoma was purchased by Ask Jeeves. Jeeves finally took Teoma out of beta in April 2002, incorporating Teoma's technology in the Ask Jeeves site, shuttering their previous acquisition, DirectHit.com, and relaunching Teoma.com in the media.

Teoma's "hook" is what it calls "Subject-Specific Popularity", which rests on the notion that quality pages about a subject will always link to other quality pages about the same subject. Pages that are linked from many similar pages move up in the Teoma index, while sites that link to many similar pages get sifted out and identified as "resource sites" for a subject. In practice, the first function of Subject-Specific Popularity works a lot like Google's PageRank.

Teoma is the latest search engine to be promoted as the Fourth Coming, especially by that sect of search engine optimizers who resent being marginalized by Google's dominance of the industry. The popular press, however, has been mixed, and some searchers (including this websnob) remain unimpressed by Teoma.

The Webmasters' Side

I normally cover the searchers' side of a search engine first, but Teoma has a lot of webmaster issues that I feel effect the end-user as well, so we're going to discuss those issues first. To be blunt, Teoma's got a lot of problems that are going to keep it out of Google's league.

Teoma has no free submission

That's right. If you want to submit a site to Teoma, you have to pay. (Paid sites also get spidered more often than free sites. A lot more often; see the next complaint.) Some non-paid sites do enter the database through random crawling by Teoma's robot, but free entries are relatively rare.

Teoma doesn't spider often enough

Although Teoma has a robot looking for new web pages, it doesn't send the robot out very often. As a result, Teoma's search results for sites that haven't paid for spidering appear to be five months (or more) out of date. Think I'm kidding? Take a look at these search results for "Google", as captured on 5 May 2002:

Happy Holidays from Google Web Images Groups Directory ? Advanced
Search ? Preferences ? Language Tools Advertise with Us -
Add...

The phrase "Happy Holidays from Google" in that sample is the alternate text from Google's logo in December 2001. That's right: It's Cinco de Mayo, but Teoma hasn't spidered the Number 2 site on the Web since Christmas. Do those look like fresh results to you?

Teoma doesn't support the robot exclusion protocols

I'm not completely sure of this one, but so far the evidence is that Teoma's spider (when it actually ventures out onto the Web), doesn't honor the usual robot exclusion protocols. Teoma doesn't mention them anywhere on its site, and the latest information about Teoma at robotstext.org reports that the Teoma robot ignores robots.txt.

While there's no law requiring that a web spider follow these protocols, it's highly unusual for a major site (or someone who wants to be a major site) to ignore them completely. It suggests that Teoma either doesn't care about good citizenship, or isn't planning to do much actual spidering. The latter, of course, would just be another sign that Teoma intends to concentrate on sites that pay to get in the database.

Teoma has problems with XHTML

Here's another search result from Teoma, also captured on Cinco de Mayo. It shows what happens when Teoma indexes an XHTML page:

Don't feel bad. He doesn't know who you are, either. ... ?xml
version="1.0"? michael @ bauser .com Third Person Michael Bauser is a web
provocateur...

The phrase between question marks isn't part the page's human-readable text, it's the page's XML declaration. Search engines should not be indexing those.

The Users' Side

Teoma's home page follows the trend towards plain-and-simple search pages, offering a lone search box with one search option (phrase searching). Teoma's search results pages are more complex.

At the top of the results are "Sponsored Results", currently taken from Overture. The same Sponsored Results repeat on each page of search results. The descriptions of Sponsored Results are written by the sponsors (advertisers).

Teoma's main results follow underneath the Sponsored Results. The site descriptions combine the description from sites' meta tags with the leading text of the pages themselves.

The right margin of Teoma's results pages contain the features that are Teoma's "hooks". The "Refine" menu suggests additional searches related to your original search. Refined searches are based on analysis of the pages that appear in your original search (Essentially, Teoma looks at all the pages which it found for your original search, identifies phrases that appear more often than average, and suggests those phrases as ways to narrow a search), and can be a hit-or-miss proposition. In my tests, nonsensical phrases like "Product People" and "Free, Get" often show up.

Teoma's "Resources" listings point to pages that contain links to many of the pages listed in your search result. Teoma considers such pages to be potentially useful resources on the topic you're researching. So far, the "Resources" results seem more reliable than the "Refine" results.

Teoma's index is too small

Normally, I don't pick on new sites for having small indexes, but Teoma's run by a crew that won't shut up about challenging Google. By most estimates, Teoma's total database is about 200 million pages. That's ten percent the size of Google's database. Even AltaVista has a larger database than Teoma, and they're the search engine we all consider a lumbering dinosaur.

Why is Teoma's database so small? Because there's no free submission and it never spiders anything, that's why! (Now you know why I listed webmaster issues first.) As long as Teoma concentrates all its growth on paid-for listings, it's going to have smaller, less representative database than free-ranging sites like Google and AltaVista.

Conclusions

As of May 2002, Teoma.com is nowhere near being the engine that will unseat Google. It's index is too small and stale and it's ability to expand with the web is limited. Teoma.com has made a lousy first impression.

First impressions are important, especially if you're waiting for that Fourth Age of Search Engines to start. The engines of the first three ages (Yahoo, AltaVista, and Google, in that order) weren't necessarilty the first or largest sites of their ages. They were the search sites that most impressed the Web's early adopters, The Geeks. The Geeks are the ones who recommend search engines to the rest of us, and search engines that don't get word of mouth from the Geeks have much steeper mountain to climb if they want to reach the top.

Teoma is not impressing the geeks. At best, they've said it looks promising. At worst, they've already dismissed it as a wannabe. Even forums that aren't devoted to search engines have taken pot shots at Teoma; witness the dismissals from Dotcom Scoop, icann.Blog, and Slashdot. Teoma is rapidly accumulating bad karma with the most influential members of the Web audience.

Search sites that don't impress the geeks of the Web have too choices: Live off a partnership with a major-leaguer that can drive them traffic (as Looksmart lives off its partnership with MSN) or die. Unfortunately for Teoma, their only major partner is Ask Jeeves, who've already proven they can do more harm to own search engines (look at Direct Hit) than they can to the competition's.

Unless Teoma finds a way to radically improve its relevancy, or manages to partner with some better sites, it's always going to be an also-ran with delusions of grandeur. I can't recommend investing time or money in Teoma, whether you're a searcher or a webmaster.