'Crawler' Permissions Checker

Documented in parse_url

#' parse_url
#'
#' @param url url to parse into its components
#'
#' @return data.frame with columns protocol, domain, path
#'
#'
#' @keywords internal
#'
#' @examples
#'
#' \dontrun{
#' url <-
#' c(
#'   "google.com",
#'   "google.com/",
#'   "www.google.com",
#'   "http://google.com",
#'   "https://google.com",
#'   "sub.domain.whatever.de"
#'   "s-u-b.dom-ain.what-ever.de"
#' )
#'
#' parse_url(url)
#' }
#'
parse_url <- function(url){
  match <-
    stringr::str_match(
      string  = url,
      pattern = "(^\\w+://)?([^/]+)?(/.*)?"
    )

  match <- match[, -1, drop = FALSE]

  df        <- as.data.frame(match, stringsAsFactors = FALSE)
  names(df) <- c("protocol", "domain", "path")
  df$path[ is.na(df$path) ] <- ""

  # return
  df
}

Any scripts or data that you put into this service are public.

robotstxt documentation built on Sept. 12, 2024, 7:36 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

robotstxt
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

R/parse_url.R
In robotstxt: A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Defines functions parse_url

Documented in parse_url

Try the robotstxt package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

robotstxt A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

R/parse_url.R In robotstxt: A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

Defines functions parse_url

Documented in parse_url

Try the robotstxt package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

robotstxt
A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker

R/parse_url.R
In robotstxt: A 'robots.txt' Parser and 'Webbot'/'Spider'/'Crawler' Permissions Checker