High-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs) in R.
The punycoder package addresses critical gaps in R’s URL processing
capabilities by providing reliable, fast conversion between Unicode and
ASCII representations of domain names. It follows RFC 3492 standards and
is designed for robust handling of internationalized domain names in web
scraping, data analysis, and URL processing workflows.
punycoder has a small dependency footprint:
R (>= 3.5.0), Rcpplibidn2 (detected at compile time)pkg-config (used by configure to detect
libidn2)testthat, knitr, rmarkdownInstall the released version of punycoder from CRAN with:
install.packages("punycoder")
Or install the development version from GitHub with:
# install.packages("remotes")
remotes::install_github("bart-turczynski/punycoder")
libidn2)punycoder works without extra system libraries. If libidn2 is
available at build time, the package enables a native backend
automatically; otherwise it uses the built-in C++ fallback backend.
To install the recommended optional dependency:
brew install libidn2 pkg-configsudo apt-get install libidn2-0-dev pkg-configsudo dnf install libidn2-devel pkgconf-pkg-configsudo pacman -S libidn2 pkgconfVerify the library is visible before installing punycoder from source:
system("pkg-config --modversion libidn2")
Then install/reinstall punycoder:
remotes::install_github("bart-turczynski/punycoder")
library(punycoder)
# Basic encoding
puny_encode("café.com")
#> [1] "xn--caf-dma.com"
# Check if domain is punycode
is_punycode("xn--example")
#> [1] TRUE
# Validate domains
validate_domain("test.com")
#> Punycoder Domain Validation Results
#> ==================================
#>
#> Domain: test.com
#> Valid: TRUE
libidn2 when available,
with a built-in fallback backendProcess international websites with Unicode domain names:
international_urls <- c(
"https://café.paris.fr/menu",
"https://москва.рф/news",
"https://北京.中国/info"
)
# Convert for HTTP requests
ascii_urls <- url_encode(international_urls)
Clean and standardize URL datasets:
# Identify international domains
is_idn(c("café.com", "example.com", "москва.рф"))
# Validate domain names
validate_domain(c("valid.com", "invalid..domain"))
punycoder currently provides:
puny_encode(), puny_decode()url_encode(), url_decode(), parse_url()is_punycode(), is_idn(),
validate_domain()libidn2 when present, built-in
fallback otherwise)Rcpp.libidn2.punycoder is inspired by urltools and is designed to provide a
robust fix for punycode encode/decode issues that may arise in
urltools workflows.We welcome contributions. See CONTRIBUTING.md for the current development workflow.
MIT
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.