robotstxt: A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Provides functions to download and parse 'robots.txt' files. Ultimately the package makes it easy to check if bots (spiders, scrapers, ...) are allowed to access specific resources on a domain.

AuthorPeter Meissner [aut, cre], Oliver Keys [ctb], Rich Fitz John [ctb]
Date of publication2016-12-05 18:28:48
MaintainerPeter Meissner <retep.meissner@gmail.com>
LicenseMIT + file LICENSE
Version0.3.2
https://github.com/ropenscilabs/robotstxt

View on CRAN

Files in this package

robotstxt
robotstxt/inst
robotstxt/inst/robotstxts
robotstxt/inst/robotstxts/robots_new_york_times.txt
robotstxt/inst/robotstxts/disallow_all_for_BadBot.txt
robotstxt/inst/robotstxts/robots_bundestag.txt
robotstxt/inst/robotstxts/robots_pmeissner.txt
robotstxt/inst/robotstxts/robots_wikipedia.txt
robotstxt/inst/robotstxts/robots_yahoo.txt
robotstxt/inst/robotstxts/disallow_some_for_all.txt
robotstxt/inst/robotstxts/disallow_two_at_once.txt
robotstxt/inst/robotstxts/selfhtml_Example.txt
robotstxt/inst/robotstxts/robots_google.txt
robotstxt/inst/robotstxts/host.txt
robotstxt/inst/robotstxts/allow_single_bot.txt
robotstxt/inst/robotstxts/crawl_delay.txt
robotstxt/inst/robotstxts/empty.txt
robotstxt/inst/robotstxts/disallow_all_for_all.txt
robotstxt/inst/robotstxts/testing_comments.txt
robotstxt/inst/robotstxts/robots_spiegel.txt
robotstxt/inst/robotstxts/robots_amazon.txt
robotstxt/inst/doc
robotstxt/inst/doc/using_robotstxt.html
robotstxt/inst/doc/using_robotstxt.R
robotstxt/inst/doc/using_robotstxt.Rmd
robotstxt/tests
robotstxt/tests/testthat.R
robotstxt/tests/testthat
robotstxt/tests/testthat/test_parser.R
robotstxt/tests/testthat/test_permissions.R
robotstxt/tests/testthat/test_robotstxt.R
robotstxt/NAMESPACE
robotstxt/NEWS
robotstxt/R
robotstxt/R/parse_robotstxt.R robotstxt/R/tools.R robotstxt/R/robotstxt.R robotstxt/R/permissions.R
robotstxt/vignettes
robotstxt/vignettes/using_robotstxt.Rmd
robotstxt/README.md
robotstxt/MD5
robotstxt/build
robotstxt/build/vignette.rds
robotstxt/DESCRIPTION
robotstxt/man
robotstxt/man/sanitize_permissions.Rd robotstxt/man/print.robotstxt_text.Rd robotstxt/man/rt_get_comments.Rd robotstxt/man/paths_allowed.Rd robotstxt/man/parse_robotstxt.Rd robotstxt/man/guess_domain.Rd robotstxt/man/rt_list_rtxt.Rd robotstxt/man/rt_get_rtxt.Rd robotstxt/man/remove_domain.Rd robotstxt/man/path_allowed.Rd robotstxt/man/get_robotstxt.Rd robotstxt/man/rt_get_fields_worker.Rd robotstxt/man/sanitize_permission_values.Rd robotstxt/man/sanitize_path.Rd robotstxt/man/robotstxt.Rd robotstxt/man/print.robotstxt.Rd robotstxt/man/rt_cache.Rd robotstxt/man/named_list.Rd robotstxt/man/rt_get_useragent.Rd robotstxt/man/rt_get_fields.Rd
robotstxt/LICENSE

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.