In jayjacobs/tldextract: Extract top level domain, domain, and subdomain from host name

After working with https://github.com/john-kurkowski/tldextract in python, I wanted the same functionality within R. The list of top level domains can be automatically loaded from https://publicsuffix.org/list/effective_tld_names.dat. A cached version of the data is stored in the package.

Installation

To install this package, use the devtools package:

devtools::install_github("jayjacobs/tldextract")

Usage

library(tldextract)
# use the cached lookup data, simple call
tldextract("www.google.com")

# it can take multiple domains at the same time
tldextract(c("www.google.com", "www.google.com.ar", "googlemaps.ca", "tbn0.google.cn"))

The specification for the top-level domains is cached in the package and is viewable.

# view and update the TLD domains list in the tldnames data
data(tldnames)
head(tldnames)

If the cached version is out of data and the package isn't updated, the data can be manually loaded, and then passed into the \code{tldextract} function.

# get most recent TLD listings
tld <- getTLD() # optionally pass in a different URL than the default
manyhosts <- c("pages.parts.marionautomotive.com", "www.embroiderypassion.com", 
               "fsbusiness.co.uk", "www.vmm.adv.br", "ttfc.cn", "carole.co.il",
               "visiontravail.qc.ca", "mail.space-hoppers.co.uk", "chilton.k12.pa.us")
tldextract(manyhosts, tldnames=tld)

jayjacobs/tldextract documentation built on Jan. 7, 2020, 12:25 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

jayjacobs/tldextract
Extract top level domain, domain, and subdomain from host name

In jayjacobs/tldextract: Extract top level domain, domain, and subdomain from host name

Installation

Usage

R Package Documentation

Browse R Packages

We want your feedback!

jayjacobs/tldextract Extract top level domain, domain, and subdomain from host name

In jayjacobs/tldextract: Extract top level domain, domain, and subdomain from host name

Installation

Usage

R Package Documentation

Browse R Packages

We want your feedback!

jayjacobs/tldextract
Extract top level domain, domain, and subdomain from host name