After working with https://github.com/john-kurkowski/tldextract in python, I wanted the same functionality within R. The list of top level domains can be automatically loaded from https://publicsuffix.org/list/effective_tld_names.dat. A cached version of the data is stored in the package.
To install this package, use the devtools package:
devtools::install_github("jayjacobs/tldextract")
library(tldextract) # use the cached lookup data, simple call tldextract("www.google.com") # it can take multiple domains at the same time tldextract(c("www.google.com", "www.google.com.ar", "googlemaps.ca", "tbn0.google.cn"))
The specification for the top-level domains is cached in the package and is viewable.
# view and update the TLD domains list in the tldnames data data(tldnames) head(tldnames)
If the cached version is out of data and the package isn't updated, the data can be manually loaded, and then passed into the \code{tldextract} function.
# get most recent TLD listings tld <- getTLD() # optionally pass in a different URL than the default manyhosts <- c("pages.parts.marionautomotive.com", "www.embroiderypassion.com", "fsbusiness.co.uk", "www.vmm.adv.br", "ttfc.cn", "carole.co.il", "visiontravail.qc.ca", "mail.space-hoppers.co.uk", "chilton.k12.pa.us") tldextract(manyhosts, tldnames=tld)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.