R/RcppExports.R

Defines functions url_compose strip_credentials get_credentials url_decode url_encode param_get param_set param_remove url_parse get_component_ set_component_ set_component_r set_component_f rm_component_ puny_encode puny_decode reverse_strings finalise_suffixes tld_extract_ host_extract_

Documented in get_credentials param_get param_remove param_set puny_decode puny_encode strip_credentials url_compose url_decode url_encode url_parse

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#'@title Recompose Parsed URLs
#'
#'@description Sometimes you want to take a vector of URLs, parse them, perform
#'some operations and then rebuild them. \code{url_compose} takes a data.frame produced
#'by \code{\link{url_parse}} and rebuilds it into a vector of full URLs (or: URLs as full
#'as the vector initially thrown into url_parse).
#'
#'This is currently a `beta` feature; please do report bugs if you find them.
#'
#'@param parsed_urls a data.frame sourced from \code{\link{url_parse}}
#'
#'@seealso \code{\link{scheme}} and other accessors, which you may want to
#'run URLs through before composing them to modify individual values.
#'
#'@examples
#'#Parse a URL and compose it
#'url <- "http://en.wikipedia.org"
#'url_compose(url_parse(url))
#'
#'@export
url_compose <- function(parsed_urls) {
    .Call(`_urltools_url_compose`, parsed_urls)
}

#'@title Get or remove user authentication credentials
#'@description authentication credentials appear before the domain
#'name and look like \emph{user:password}. Sometimes you want the removed,
#'or retrieved; \code{strip_credentials} and \code{get_credentials} do
#'precisely that
#'
#'@aliases creds
#'@rdname creds
#'
#'@param urls a URL, or vector of URLs
#'
#'@examples
#'# Remove credentials
#'strip_credentials("http://foo:bar@97.77.104.22:3128")
#'
#'# Get credentials
#'get_credentials("http://foo:bar@97.77.104.22:3128")
#'@export
strip_credentials <- function(urls) {
    .Call(`_urltools_strip_credentials`, urls)
}

#'@rdname creds
#'@export
get_credentials <- function(urls) {
    .Call(`_urltools_get_credentials`, urls)
}

#'@title Encode or decode a URI
#'@description encodes or decodes a URI/URL
#'
#'@param urls a vector of URLs to decode or encode.
#'
#'@details
#'URL encoding and decoding is an essential prerequisite to proper web interaction
#'and data analysis around things like server-side logs. The
#'\href{http://tools.ietf.org/html/rfc3986}{relevant IETF RfC} mandates the percentage-encoding
#'of non-Latin characters, including things like slashes, unless those are reserved.
#'
#'Base R provides \code{\link{URLdecode}} and \code{\link{URLencode}}, which handle
#'URL encoding - in theory. In practise, they have a set of substantial problems
#'that the urltools implementation solves::
#'
#'\itemize{
#' \item{No vectorisation: }{Both base R functions operate on single URLs, not vectors of URLs.
#'       This means that, when confronted with a vector of URLs that need encoding or
#'       decoding, your only option is to loop from within R. This can be incredibly
#'       computationally costly with large datasets. url_encode and url_decode are
#'       implemented in C++ and entirely vectorised, allowing for a substantial
#'       performance improvement.}
#' \item{No scheme recognition: }{encoding the slashes in, say, http://, is a good way
#'       of making sure your URL no longer works. Because of this, the only thing
#'       you can encode in URLencode (unless you refuse to encode reserved characters)
#'       is a partial URL, lacking the initial scheme, which requires additional operations
#'       to set up and increases the complexity of encoding or decoding. url_encode
#'       detects the protocol and silently splits it off, leaving it unencoded to ensure
#'       that the resulting URL is valid.}
#' \item{ASCII NULs: }{Server side data can get very messy and sometimes include out-of-range
#'       characters. Unfortunately, URLdecode's response to these characters is to convert
#'       them to NULs, which R can't handle, at which point your URLdecode call breaks.
#'       \code{url_decode} simply ignores them.}
#'}
#'
#'@return a character vector containing the encoded (or decoded) versions of "urls".
#'
#'@seealso \code{\link{puny_decode}} and \code{\link{puny_encode}}, for punycode decoding
#'and encoding.
#'
#'@examples
#'
#'url_decode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_%28logo%29.jpg")
#'url_encode("https://en.wikipedia.org/wiki/File:Vice_City_Public_Radio_(logo).jpg")
#'
#'\dontrun{
#'#A demonstrator of the contrasting behaviours around out-of-range characters
#'URLdecode("%gIL")
#'url_decode("%gIL")
#'}
#'@rdname encoder
#'@export
url_decode <- function(urls) {
    .Call(`_urltools_url_decode`, urls)
}

#'@rdname encoder
#'@export
url_encode <- function(urls) {
    .Call(`_urltools_url_encode`, urls)
}

#'@title get the values of a URL's parameters
#'@description URLs can have parameters, taking the form of \code{name=value}, chained together
#'with \code{&} symbols. \code{param_get}, when provided with a vector of URLs and a vector
#'of parameter names, will generate a data.frame consisting of the values of each parameter
#'for each URL.
#'
#'@param urls a vector of URLs
#'
#'@param parameter_names a vector of parameter names. If \code{NULL} (default), will extract
#'all parameters that are present.
#'
#'@return a data.frame containing one column for each provided parameter name. Values that
#'cannot be found within a particular URL are represented by an NA.
#'
#'@examples
#'#A very simple example
#'url <- "https://google.com:80/foo.php?this_parameter=selfreferencing&hiphop=awesome"
#'parameter_values <- param_get(url, c("this_parameter","hiphop"))
#'
#'@seealso \code{\link{url_parse}} for decomposing URLs into their constituent parts and
#'\code{\link{param_set}} for inserting or modifying key/value pairs within a query string.
#'
#'@aliases param_get url_parameter
#'@rdname param_get
#'@export
param_get <- function(urls, parameter_names = NULL) {
    .Call(`_urltools_param_get`, urls, parameter_names)
}

#'@title Set the value associated with a parameter in a URL's query.
#'@description URLs often have queries associated with them, particularly URLs for
#'APIs, that look like \code{?key=value&key=value&key=value}. \code{param_set}
#'allows you to modify key/value pairs within query strings, or even add new ones
#'if they don't exist within the URL.
#'
#'@param urls a vector of URLs. These should be decoded (with \code{url_decode})
#'but do not have to have been otherwise manipulated.
#'
#'@param key a string representing the key to modify the value of (or insert wholesale
#'if it doesn't exist within the URL).
#'
#'@param value a value to associate with the key. This can be a single string,
#'or a vector the same length as \code{urls}
#'
#'@return the original vector of URLs, but with modified/inserted key-value pairs. If the
#'URL is \code{NA}, the returned value will be - if the key or value are, no insertion
#'will be made.
#'
#'@examples
#'# Set a URL parameter where there's already a key for that
#'param_set("https://en.wikipedia.org/api.php?action=query", "action", "pageinfo")
#'
#'# Set a URL parameter where there isn't.
#'param_set("https://en.wikipedia.org/api.php?list=props", "action", "pageinfo")
#'
#'@seealso \code{\link{param_get}} to retrieve the values associated with multiple keys in
#'a vector of URLs, and \code{\link{param_remove}} to strip key/value pairs from a URL entirely.
#'
#'@export
param_set <- function(urls, key, value) {
    .Call(`_urltools_param_set`, urls, key, value)
}

#'@title Remove key-value pairs from query strings
#'@description URLs often have queries associated with them, particularly URLs for
#'APIs, that look like \code{?key=value&key=value&key=value}. \code{param_remove}
#'allows you to remove key/value pairs while leaving the rest of the URL intact.
#'
#'@param urls a vector of URLs. These should be decoded with \code{url_decode} but don't
#'have to have been otherwise processed.
#'
#'@param keys a vector of parameter keys to remove.
#'
#'@return the original URLs but with the key/value pairs specified by \code{keys} removed.
#'If the original URL is \code{NA}, \code{NA} will be returned; if a specified key is \code{NA},
#'nothing will be done with it.
#'
#'@seealso \code{\link{param_set}} to modify values associated with keys, or \code{\link{param_get}}
#'to retrieve those values.
#'
#'@examples
#'# Remove multiple parameters from a URL
#'param_remove(urls = "https://en.wikipedia.org/wiki/api.php?action=list&type=query&format=json",
#'             keys = c("action","format"))
#'@export
param_remove <- function(urls, keys) {
    .Call(`_urltools_param_remove`, urls, keys)
}

#'@title split URLs into their component parts
#'@description \code{url_parse} takes a vector of URLs and splits each one into its component
#'parts, as recognised by RfC 3986.
#'
#'@param urls a vector of URLs
#'
#'@details It's useful to be able to take a URL and split it out into its component parts - 
#'for the purpose of hostname extraction, for example, or analysing API calls. This functionality
#'is not provided in base R, although it is provided in \code{\link[httr]{parse_url}}; that
#'implementation is entirely in R, uses regular expressions, and is not vectorised. It's
#'perfectly suitable for the intended purpose (decomposition in the context of automated
#'HTTP requests from R), but not for large-scale analysis.
#'
#'Note that user authentication/identification information is not extracted;
#'this can be found with \code{\link{get_credentials}}.
#'
#'@return a data.frame consisting of the columns scheme, domain, port, path, query
#'and fragment. See the '\href{http://tools.ietf.org/html/rfc3986}{relevant IETF RfC} for
#'definitions. If an element cannot be identified, it is represented by an empty string.
#'
#'@examples
#'url_parse("https://en.wikipedia.org/wiki/Article")
#'
#'@seealso \code{\link{param_get}} for extracting values associated with particular keys in a URL's
#'query string, and \code{\link{url_compose}}, which is \code{url_parse} in reverse.
#'
#'@export
url_parse <- function(urls) {
    .Call(`_urltools_url_parse`, urls)
}

get_component_ <- function(urls, component) {
    .Call(`_urltools_get_component_`, urls, component)
}

set_component_ <- function(urls, component, new_value) {
    .Call(`_urltools_set_component_`, urls, component, new_value)
}

set_component_r <- function(urls, component, new_value, comparator) {
    .Call(`_urltools_set_component_r`, urls, component, new_value, comparator)
}

set_component_f <- function(urls, component, new_value, comparator) {
    .Call(`_urltools_set_component_f`, urls, component, new_value, comparator)
}

rm_component_ <- function(urls, component) {
    .Call(`_urltools_rm_component_`, urls, component)
}

#'@title Encode or Decode Internationalised Domains
#'@description \code{puny_encode} and \code{puny_decode} implement
#'the encoding standard for internationalised (non-ASCII) domains and
#'subdomains. You can use them to encode UTF-8 domain names, or decode
#'encoded names (which start "xn--"), or both.
#'
#'@param x a vector of URLs. These should be URL decoded using \code{\link{url_decode}}.
#'
#'@return a CharacterVector containing encoded or decoded versions of the entries in \code{x}.
#'Invalid URLs (ones that are \code{NA}, or ones that do not successfully map to an actual
#'decoded or encoded version) will be returned as \code{NA}.
#'
#'@examples
#'# Encode a URL
#'puny_encode("https://www.bücher.com/foo")
#'
#'# Decode the result, back to the original
#'puny_decode("https://www.xn--bcher-kva.com/foo")
#'
#'@seealso \code{\link{url_decode}} and \code{\link{url_encode}} for percent-encoding.
#'
#'@rdname puny
#'@export
puny_encode <- function(x) {
    .Call(`_urltools_puny_encode`, x)
}

#'@rdname puny
#'@export
puny_decode <- function(x) {
    .Call(`_urltools_puny_decode`, x)
}

reverse_strings <- function(strings) {
    .Call(`_urltools_reverse_strings`, strings)
}

finalise_suffixes <- function(full_domains, suffixes, wildcard, is_suffix) {
    .Call(`_urltools_finalise_suffixes`, full_domains, suffixes, wildcard, is_suffix)
}

tld_extract_ <- function(domains) {
    .Call(`_urltools_tld_extract_`, domains)
}

host_extract_ <- function(domains) {
    .Call(`_urltools_host_extract_`, domains)
}

Try the urltools package in your browser

Any scripts or data that you put into this service are public.

urltools documentation built on May 1, 2019, 6:49 p.m.