Because you perform responsible web scraping.
machina-policy is a way to parse and query robots.txt files so your web-scraping bot can play nicely with the other kids.
machina-policy has been converted from darcs to git and is now available on github.
All dependencies are available via Quicklisp.
See the readme for more information.
machina-policy is not intended as the final solution in web scraping (cl-web-crawler may get you closer to that goal). It is intended to make handling robots.txt files easy, not to make handling robots.txt transparent.
Currently, machina-policy is very much geared towards single-domain usage. For instance, #'URI-ALLOWED-P does not check to ensure the hostname of the given URI actually falls under the jurisdiction of the given POLICY. If crawling multiple domains, it is your responsibilty to keep the policies for those domains separate.
My e-mail address is email@example.com. Questions, comments, patches, beratings, and bug reports are all welcome.