# See http://www.robotstxt.org/wc/norobots.html for documentation on how to use the robots.txt file # Mainly to reduce server load from bots, we block pages which are actions, and # searches. We also block /feed/, as RSS readers (rightly, I think) don't seem # to check robots.txt. # Note: Can delay Bing's crawler with: # Crawl-delay: 1 # http://www.bing.com/community/blogs/webmaster/archive/2009/08/10/crawl-delay-and-the-bing-crawler-msnbot.aspx # This file uses the non-standard extension characters * and $, which are supported by Google and Yahoo! # http://code.google.com/web/controlcrawlindex/docs/robots_txt.html # http://help.yahoo.com/l/us/yahoo/search/webcrawler/slurp-02.html User-agent: * Disallow: */annotate/ Disallow: */new/ Disallow: */search/ Disallow: */similar/ Disallow: */track/ Disallow: */upload/ Disallow: */user/contact/ Disallow: */feed/ Disallow: */profile/ Disallow: */signin Disallow: */request/*/response/ Disallow: */body/*/view_email$ # The following adding Jan 2012 to stop robots crawling pages # generated in error (see # https://github.com/mysociety/alaveteli/issues/311). Can be removed # later in 2012 when the error pages have been dropped from the index Disallow: *.json.j*