# http://www.robotstxt.org/ # # http://concept.temple.edu/email.shtml # # Too many spiders crawling my webserver # # # "robots.txt is not intended for access control, so don't try to use it as such." # Use IP restriction and/or user/pass authentication # # There is no "Allow" field # # Slurp, inktomisearch, etc is Yahoo! User-agent: Slurp Crawl-delay: 5 # Cuil.com, Cuill.com -- Your crawler stinks; you clobber together requests based on different sections of my website. # Cuil.com, Cuill.com -- Why not *crawl* what's there, rather then request files without a Referer? # Cuil.com, Cuill.com -- Once you straighten that out, Id be happy to let you crawl again. User-agent: twiceler Disallow: / # ExaLead.com -- Your crawler stinks: Stop requesting all lowercase filenames! # ExaLead.com -- Once you straighten that out, Id be happy to let you crawl again. User-agent: Exabot Disallow: / User-agent: * Disallow: /cgi-bin/ Disallow: /css/ Disallow: /img/ Disallow: /IT/ Disallow: /Condo/ # These directories no longer exist. Disallow: /joda/ Disallow: /yoda/ # These files no longer exist. Disallow: /system_rss.xml Disallow: /email_rss.xml Disallow: /Susie_Sorrento.html Disallow: /sysadmin/227million--mega_dic.zip # Relocating Disallow: /joda.cis.temple.edu/ Disallow: /yoda.cis.temple.edu/