Man pages for ropenscilabs/robotstxt
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

as.list.robotstxt_textConvert robotstxt_text to list
fix_urlAdd http protocal if missing from URL
get_robotstxtDownload a robots.txt file
get_robotstxt_http_getStorage for http request response objects
get_robotstxtsDownload multiple robotstxt files
guess_domainGuess a domain from path
http_domain_changedCheck if HTTP domain changed
http_subdomain_changedCheck if HTTP subdomain changed
http_was_redirectedCheck if HTTP redirect occurred
is_suspect_robotstxtCheck if file is valid / parsable robots.txt file
is_valid_robotstxtValidate if a file is valid / parsable robots.txt file
list_mergeMerge a number of named lists in sequential order
named_listCreate a named list
null_to_defaultReturn default value if NULL
parse_robotstxtParse a robots.txt file
parse_urlParse a URL
paths_allowedCheck if a bot has permissions to access page(s)
paths_allowed_worker_spiderbarCheck if a spiderbar bot has permissions to access page(s)
pipere-export magrittr pipe operator
print.robotstxtPrint robotstxt
print.robotstxt_textPrint robotstxt's text
remove_domainRemove domain from path
request_handler_handlerHandle robotstxt handlers
robotstxtGenerate a representation of a robots.txt file
rt_cacheGet the robotstxt cache
rt_get_commentsExtract comments from robots.txt
rt_get_fieldsExtract permissions from robots.txt
rt_get_fields_workerExtract robotstxt fields
rt_get_rtxtLoad robots.txt files saved along with the package
rt_get_useragentExtract HTTP useragents from robots.txt
rt_list_rtxtList robots.txt files saved along with the package
rt_request_handlerHandle robotstxt object retrieved from HTTP request
sanitize_pathMake paths uniform
ropenscilabs/robotstxt documentation built on Sept. 4, 2024, 6 p.m.