Tonight, Robot Snob looks over the shoulder of Gigabot, the indexing robot for one of my least favorite search engines, Gigablast.
208.254.87.133 - - [11/Apr/2002:07:29:55 -0400] "GET /robots.txt HTTP/1.0" 200 116 "-" "-" 208.254.87.133 - - [11/Apr/2002:07:29:56 -0400] "GET /news.groups.reviews/ HTTP/1.0" 200 12188 "-" "-" 208.254.87.133 - - [22/Apr/2002:21:58:23 -0400] "GET / HTTP/1.0" 200 3593 "-" "-"
Observation: In its first incarnation, Gigabot didn't identify itself using HTTP_Agent, which is always inconsiderate. (I only know these three requests are from Gigabot because somebody outed Gigabot's IP address on WebmasterWorld.) It did ask for robots.txt.
Gigablast's choice of /news.groups.reviews/ as the first URI is odd, since that's not the root URI, and I didn't submit it. Gigablast was probably grabbing URIs from the The Open Directory Project. Ten days later it came back for the URI I did submit.
216.243.113.1 - - [21/May/2002:18:29:36 -0400] "GET /robots.txt HTTP/1.0" 200 159 "-" "Gigabot/1.0" 216.243.113.1 - - [21/May/2002:18:29:38 -0400] "GET /beer/ HTTP/1.0" 200 19617 "-" "Gigabot/1.0"
Observation: A month later, and Gigablast has an agent identifier and a new IP address. It grabs robots.txt again, plus one other file.
216.243.113.1 - - [22/May/2002:14:50:47 -0400] "GET /robots.txt HTTP/1.0" 200 159 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:50:48 -0400] "GET /michael/statistics.html HTTP/1.0" 200 6696 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:52:32 -0400] "GET /michael/words.html HTTP/1.0" 200 4145 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:52:57 -0400] "GET /beer/glossary.html HTTP/1.0" 200 21160 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:54:00 -0400] "GET /websnob/rules.html HTTP/1.0" 200 6629 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:54:16 -0400] "GET /beer/coolers.html HTTP/1.0" 200 14963 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:54:52 -0400] "GET /alt.security.keydist/FAQ.html HTTP/1.0" 200 10280 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:55:10 -0400] "GET /michael/rockstars.html HTTP/1.0" 200 7987 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:55:45 -0400] "GET /roleplaying/Ghostbusters/index.html HTTP/1.0" 200 13804 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:01 -0400] "GET /websnob/robots/ HTTP/1.0" 200 3308 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:14 -0400] "GET /alt.security.keydist/newsgroups.html HTTP/1.0" 200 7852 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:27 -0400] "GET /alt.security.keydist/yarn.html HTTP/1.0" 200 4235 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:28 -0400] "GET /websnob/html4/relationships.html HTTP/1.0" 200 53471 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:29 -0400] "GET /roleplaying/ADnD/cantrips.html HTTP/1.0" 200 19124 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:56:58 -0400] "GET /beer/Japan.html HTTP/1.0" 200 9664 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:05 -0400] "GET /websnob/html4/link.html HTTP/1.0" 200 9483 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:08 -0400] "GET /roleplaying/reaper.html HTTP/1.0" 200 9834 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:10 -0400] "GET /alt.security.keydist/FAQ.txt HTTP/1.0" 200 7172 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:11 -0400] "GET /beer/Colorado.html HTTP/1.0" 200 11751 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:15 -0400] "GET /roleplaying/freighters.html HTTP/1.0" 200 7042 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:23 -0400] "GET /beer/Holland.html HTTP/1.0" 200 7725 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:31 -0400] "GET /roleplaying/Spelljammer/index.html HTTP/1.0" 200 8497 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:38 -0400] "GET /roleplaying/Spelljammer/ HTTP/1.0" 200 8497 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:42 -0400] "GET /beer/Pennsylvania.html HTTP/1.0" 200 10309 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:50 -0400] "GET /beer/Michigan.html HTTP/1.0" 200 16211 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:53 -0400] "GET /websnob/robots/UniverseBot.html HTTP/1.0" 200 31597 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:57:56 -0400] "GET /roleplaying/StarFrontiers/index.html HTTP/1.0" 200 8671 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:58:17 -0400] "GET /beer/Stroh.html HTTP/1.0" 200 18304 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:58:20 -0400] "GET /websnob/robots/NetResearchServer.html HTTP/1.0" 200 13094 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:58:41 -0400] "GET /roleplaying/StarFrontiers/interstellar.html HTTP/1.0" 200 30235 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:58:53 -0400] "GET /websnob/engines/Gigablast.html HTTP/1.0" 200 6690 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:03 -0400] "GET /alt.security.keydist/revisions.html HTTP/1.0" 200 5006 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:04 -0400] "GET /beer/Wisconsin.html HTTP/1.0" 200 18349 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:16 -0400] "GET /websnob/engines/Teoma.html HTTP/1.0" 200 11917 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:23 -0400] "GET /websnob/engines/ HTTP/1.0" 200 6186 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:30 -0400] "GET /alt.security.keydist/statistics.html HTTP/1.0" 200 8080 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:34 -0400] "GET /websnob/engines/NetInsert.html HTTP/1.0" 200 8852 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:14:59:45 -0400] "GET /michael/privacy.html HTTP/1.0" 200 4908 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:00:07 -0400] "GET /roleplaying/revisions.html HTTP/1.0" 200 7376 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:00:11 -0400] "GET /websnob/finger/index.html HTTP/1.0" 200 9755 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:00:25 -0400] "GET /roleplaying/statistics.html HTTP/1.0" 200 17718 "-" "Gigabot/1.0"
Observation: The next day, Gigabot finally gets to spidering in earnest, grabbing lots of HTML files and one text file.
216.243.113.1 - - [22/May/2002:15:00:26 -0400] "GET /alt.security.keydist/javascript:window.external.AddFavorite(document.location,document.title) HTTP/1.0" 404 358 "-" "Gigabot/1.0"
Observation: Yes, I know it's incredibly lazy of me to put an unprotected javascript URI into an anchor element, but it's my web page, and I can be lazy if I want. Besides, it gives me an opportunity to catch bad robots like Gigabot, who is misinterpreting the javascript URI to be a filename. Apparently, Gigabot isn't programmed to recognize (and skip) URI prefixes it doesn't use properly.
216.243.113.1 - - [22/May/2002:15:00:43 -0400] "GET /roleplaying/newsgroups.html HTTP/1.0" 200 12527 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:00:50 -0400] "GET /alt.security.keydist/ HTTP/1.0" 200 6860 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:01:19 -0400] "GET /michael/hotlist.html HTTP/1.0" 200 5197 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:01:32 -0400] "GET /michael/guestbook.html HTTP/1.0" 200 27055 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:01:34 -0400] "GET /websnob/html4/stealth_redirection.html HTTP/1.0" 200 11772 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:01:47 -0400] "GET /michael/tel:+1-602-502-8848 HTTP/1.0" 404 292 "-" "Gigabot/1.0"
Observation: Gigabot bungles a mystery URI prefix again, this time misinterpreting an RFC 2806 telephone URI. Yes, that's right: Gigabot tried to call my cell phone.
216.243.113.1 - - [22/May/2002:15:01:51 -0400] "GET /roleplaying/linking.html HTTP/1.0" 200 5126 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:02 -0400] "GET /beer/Washington.html HTTP/1.0" 200 9244 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:10 -0400] "GET /websnob/CSS/scrollbar.html HTTP/1.0" 200 12197 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:20 -0400] "GET /beer/Germany.html HTTP/1.0" 200 11126 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:21 -0400] "GET /roleplaying/javascript:window.external.AddFavorite(document.location,document.title) HTTP/1.0" 404 349 "-" "Gigabot/1.0"
Observation: It tried to access another javascript URI.
216.243.113.1 - - [22/May/2002:15:02:26 -0400] "GET /websnob/CSS/borders.html HTTP/1.0" 200 5733 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:33 -0400] "GET /websnob/freeware.html HTTP/1.0" 200 4333 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:02:52 -0400] "GET /beer/UK.html HTTP/1.0" 200 8799 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:03:02 -0400] "GET /michael/ HTTP/1.0" 200 17959 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:03:09 -0400] "GET /websnob/domains/reason.html HTTP/1.0" 200 6622 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:04:13 -0400] "GET /websnob/CSS/Netscape4.html HTTP/1.0" 200 6053 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:04:23 -0400] "GET /beer/Ireland.html HTTP/1.0" 200 10647 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:04:23 -0400] "GET /websnob/ HTTP/1.0" 200 9232 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:05:32 -0400] "GET /websnob/logs.html HTTP/1.0" 200 9684 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:06:02 -0400] "GET /websnob/cliches.html HTTP/1.0" 200 9365 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:06:37 -0400] "GET /websnob/traffic.html HTTP/1.0" 200 5591 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:06:54 -0400] "GET /websnob/html3/BANNER.html HTTP/1.0" 200 4838 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:07:42 -0400] "GET /websnob/keydist.html HTTP/1.0" 200 6848 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:08:36 -0400] "GET /websnob/html3/BQ.html HTTP/1.0" 200 5740 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:09:19 -0400] "GET /websnob/html3/NOTE.html HTTP/1.0" 200 4524 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:12:13 -0400] "GET /websnob/html3/FN.html HTTP/1.0" 200 7152 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:12:27 -0400] "GET /beer/Anheuser-Busch.html HTTP/1.0" 200 18334 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:12:44 -0400] "GET /beer/Massachusetts.html HTTP/1.0" 200 8288 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:13:38 -0400] "GET /websnob/javascript:window.external.AddFavorite(document.location,document.title) HTTP/1.0" 404 345 "-" "Gigabot/1.0"
Observation: It tried to access another javascript URI.
216.243.113.1 - - [22/May/2002:15:13:41 -0400] "GET /websnob/newsgroups.html HTTP/1.0" 200 41566 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:15:14 -0400] "GET /websnob/revisions.html HTTP/1.0" 200 8807 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:15:23 -0400] "GET /websnob/statistics.html HTTP/1.0" 200 5446 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:18:57 -0400] "GET /beer/Vermont.html HTTP/1.0" 200 6617 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:19:26 -0400] "GET /beer/North_Carolina.html HTTP/1.0" 200 6433 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:19:46 -0400] "GET /beer/Canada.html HTTP/1.0" 200 14991 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:21:38 -0400] "GET /beer/Texas.html HTTP/1.0" 200 11516 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:23:57 -0400] "GET /beer/Oregon.html HTTP/1.0" 200 10234 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:24:06 -0400] "GET /beer/Coors.html HTTP/1.0" 200 11571 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:25:43 -0400] "GET /beer/Czech_Republic.html HTTP/1.0" 200 9331 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:29:11 -0400] "GET /beer/Austria.html HTTP/1.0" 200 6391 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:33:31 -0400] "GET /beer/Illinois.html HTTP/1.0" 200 6834 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:44:13 -0400] "GET /beer/Miller.html HTTP/1.0" 200 12333 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:47:36 -0400] "GET /beer/New_York.html HTTP/1.0" 200 13917 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:53:07 -0400] "GET /beer/Finland.html HTTP/1.0" 200 6606 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:15:59:40 -0400] "GET /beer/Ohio.html HTTP/1.0" 200 10716 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:16:52:24 -0400] "GET /beer/California.html HTTP/1.0" 200 13163 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:17:32:46 -0400] "GET /beer/help.html HTTP/1.0" 200 8667 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:17:35:17 -0400] "GET /beer/guestbook.html HTTP/1.0" 200 28856 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:17:36:39 -0400] "GET /beer/javascript:window.external.AddFavorite(document.location,document.title) HTTP/1.0" 404 342 "-" "Gigabot/1.0"
Observation: It tried to access another javascript URI.
216.243.113.1 - - [22/May/2002:17:37:04 -0400] "GET /beer/statistics.html HTTP/1.0" 200 19104 "-" "Gigabot/1.0" 216.243.113.1 - - [22/May/2002:17:37:18 -0400] "GET /beer/revisions.html HTTP/1.0" 200 10715 "-" "Gigabot/1.0"
Observation: That ended the first full visit from Gigabot. It spidered the majority of bauser.com in 2 hours, 45 minutes, and 11 seconds.
216.243.113.1 - - [05/Jun/2002:08:56:19 -0400] "GET /robots.txt HTTP/1.0" 200 36 "-" "Gigabot/1.0" 216.243.113.1 - - [05/Jun/2002:08:56:22 -0400] "GET /news.groups.reviews/ HTTP/1.0" 200 12509 "-" "Gigabot/1.0"
Observation: Again, it starts with the one URI that isn't connected to the rest of bauser.com. I don't know where it's getting that URI, but it's not getting it from me.
Anyway, Gigabot didn't spider anything prohibited by robots.txt, but it did annoy me by requesting those javascript and tel URIs. It's not enough of a pest to block, even if I think it's attached to an awful engine.
[an error occurred while processing this directive]