Files in petermeissner/robotstxt
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

.Rbuildignore
.Rprofile
.github/.gitignore
.github/CONTRIBUTING.md
.github/workflows/R-CMD-check.yaml
.github/workflows/rhub.yaml
.github/workflows/test-coverage.yaml
.gitignore
CRAN-SUBMISSION
DESCRIPTION
LICENSE
NAMESPACE
NEWS.md R/as_list.R R/fix_url.R R/get_robotstxt.R R/get_robotstxt_http_get.R R/get_robotstxts.R R/guess_domain.R R/http_domain_changed.R R/http_subdomain_changed.R R/http_was_redirected.R R/is_suspect_robotstxt.R R/is_valid_robotstxt.R R/list_merge.R R/null_to_default.R R/parse_robotstxt.R R/parse_url.R R/paths_allowed.R R/paths_allowed_worker_spiderbar.R R/pipe.R R/print_robotstxt.R R/print_robotstxt_text.R R/remove_domain.R R/request_handler_handler.R R/robotstxt.R R/rt_cache.R R/rt_get_comments.R R/rt_get_fields.R R/rt_get_fields_worker.R R/rt_get_useragent.R R/rt_request_handler.R R/rt_request_handler_defaults.R R/sanitize_path.R R/tools.R README.Rmd README.md
_pkgdown.yml
benchmarks/spiderbar_and_futures.r cran-comments.md
data-raw/logo/robotstxt-logo.jpeg
data-raw/logo/robotstxt.jpeg
data-raw/logo/robotstxt.png
inst/http_requests/http_404.rds
inst/http_requests/http_client_error.rds
inst/http_requests/http_domain_change.rds
inst/http_requests/http_html_content.rds
inst/http_requests/http_ok_1.rds
inst/http_requests/http_ok_2.rds
inst/http_requests/http_ok_3.rds
inst/http_requests/http_ok_4.rds
inst/http_requests/http_redirect_www.rds
inst/http_requests/http_server_error.rds
inst/robotstxts/allow_single_bot.txt
inst/robotstxts/crawl_delay.txt
inst/robotstxts/disallow_all_for_BadBot.txt
inst/robotstxts/disallow_all_for_all.txt
inst/robotstxts/disallow_some_for_all.txt
inst/robotstxts/disallow_two_at_once.txt
inst/robotstxts/empty.txt
inst/robotstxts/host.txt
inst/robotstxts/rbloggers.txt
inst/robotstxts/robots_amazon.txt
inst/robotstxts/robots_bundestag.txt
inst/robotstxts/robots_cdc.txt
inst/robotstxts/robots_cdc2.txt
inst/robotstxts/robots_commented_token.txt
inst/robotstxts/robots_facebook.txt
inst/robotstxts/robots_facebook_unsupported.txt
inst/robotstxts/robots_google.txt
inst/robotstxts/robots_new_york_times.txt
inst/robotstxts/robots_pmeissner.txt
inst/robotstxts/robots_spiegel.txt
inst/robotstxts/robots_wikipedia.txt
inst/robotstxts/robots_wikipedia_20170706.txt
inst/robotstxts/robots_yahoo.txt
inst/robotstxts/selfhtml_Example.txt
inst/robotstxts/testing_comments.txt
inst/urls.txt
man/as.list.robotstxt_text.Rd
man/figures/logo.jpeg
man/fix_url.Rd man/get_robotstxt.Rd man/get_robotstxt_http_get.Rd man/get_robotstxts.Rd man/guess_domain.Rd man/http_domain_changed.Rd man/http_subdomain_changed.Rd man/http_was_redirected.Rd man/is_suspect_robotstxt.Rd man/is_valid_robotstxt.Rd man/list_merge.Rd man/named_list.Rd man/null_to_default.Rd man/parse_robotstxt.Rd man/parse_url.Rd man/paths_allowed.Rd man/paths_allowed_worker_spiderbar.Rd man/pipe.Rd man/print.robotstxt.Rd man/print.robotstxt_text.Rd man/remove_domain.Rd man/request_handler_handler.Rd man/robotstxt.Rd man/rt_cache.Rd man/rt_get_comments.Rd man/rt_get_fields.Rd man/rt_get_fields_worker.Rd man/rt_get_rtxt.Rd man/rt_get_useragent.Rd man/rt_list_rtxt.Rd man/rt_request_handler.Rd man/sanitize_path.Rd
robotstxt.Rproj
tests/testthat.R tests/testthat/_snaps/http_event_handling.md tests/testthat/_snaps/paths_allowed.md tests/testthat/test_attribute_handling.R tests/testthat/test_get_robotstxt.R tests/testthat/test_http_event_handling.R tests/testthat/test_issue50.R tests/testthat/test_parser.R tests/testthat/test_path_examples_from_rfc.R tests/testthat/test_paths_allowed.R tests/testthat/test_robotstxt.R tests/testthat/test_tools.R
vignettes/style.css
vignettes/using_robotstxt.Rmd
petermeissner/robotstxt documentation built on Nov. 17, 2024, 9:50 p.m.