Robots Identification


URI:

http://herbert.gandraxa.com/herbert/rid.asp

Link template:   

<a href="http://herbert.gandraxa.com/herbert/rid.asp">Robots Identification</a>


Link symbols:   

Local LinkOn current page | DocumentOn this site | External PageOn external site | WikipediaWikipedia article | Compressed ArchiveZIP archive | PDF documentPDF | E-MailE-Mail


Article

Organization

DocumentHome » Robots Identification

Scope

This article serves to identify those surfers which spider the web automatically.

Author

DocumentHerbert Glarner

Published

2007-Nov-23 11:00 — Setup of trap
2007-Dec-14 10:30 — Last update

External Links


Your Visit

It is somewhat unlikely that you are a human visitor, because no person I know would click on the spot you needed to click to find this page. However, it is possible that a search engine provided you with this link. In this case, you most likely will not find what you were looking for, not on this page that is.

For a robot, though, it is no issue to find the link, and after having been found the robot usually will follow it. This makes this page ideal for the stated purpose, namely to identify those automatic travellers. I want to know them, because I want to ban some from my pages, to reduce overall Internet traffic and also useless traffic on my server. That said, I want to make it clear, that I do not want to ban all robots from my site, but I certainly will attempt to block e-mail harvesters and the like.

I will first attempt a blockade via the WikipediaRobots Excusion Standard (RES), and if that is to no avail, more drastic measures are invoked.

So, if you happen to be a human or a useful robot: thanks for your visit, and have a nice day. The others can't read anyway, so there is no point to tell them anything.

Report

If you are human and still are on this page, then you maybe look forward to know which robots are involved. For you I will maintain this table here:

[Note: Updates are made manually, hence the 3 columns First visit, Last visit and Number of visits depend on the date of the last update. It can be expected that updates come at a quite infrequent and in any case irregular rate.]

IPs Organization User Agent string Visits [1] First visit [1] Last visit RES [2] Follows RES Blocked IPs
83.138.172.72 Rackspace Managed Hosting
 San Antonio, TX
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322) 1 14.12.2007 09:58:46 14.12.2007 09:58:46 No n/a
65.55.165.40 Microsoft Corp
 Redmond, WA
Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.2; .NET CLR 1.1.4322) 1 12.12.2007 20:59:55 12.12.2007 20:59:55 No n/a
24.73.96.230 Road Runner HoldCo LLC
 Herndon, VA
Mozilla/4.0 (compatible ; MSIE 6.0; Windows NT 5.1) 1 12.12.2007 18:35:27 12.12.2007 18:35:27 No n/a
65.55.212.26 Microsoft Corp
 Redmond, WA
msnbot-media/1.0 (+http://search.msn.com/msnbot.htm) 1 07.12.2007 03:31:12 07.12.2007 03:31:12 No n/a
64.208.172.181 Clobal Crossing
 Phoenix, AZ
ia_archiver 1 04.12.2007 03:47:20 04.12.2007 03:47:20 No n/a
65.54.165.35-65.55.208.27 Microsoft Corp
 Redmond, WA
msnbot/1.0 (+http://search.msn.com/msnbot.htm) 9 25.11.2007 07:24:44 12.12.2007 20:58:11 No n/a
74.6.26.119 Inktomi Corporation
 Sunnyvale, CA
Mozilla/5.0 (compatible; Yahoo! Slurp; http://help.yahoo.com/help/us/ysearch/slurp) 18 23.11.2007 15:15:03 13.12.2007 17:12:03 No n/a
66.249.65.208 Google Inc.
 Mountain View, CA
Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html) 4 23.11.2007 12:50:34 08.12.2007 19:26:38 No n/a
66.249.65.208 Google Inc.
 Mountain View, CA
Mediapartners-Google 2 23.11.2007 11:03:50 01.12.2007 07:18:01 No n/a

[1] Since 23.11.2007 11:00

[2] If the robot is not added to the RES (Robots Exclusion Standard), then this means that that particular robot is welcome or at least tolerated.