knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The punycoder package provides high-performance Unicode and Punycode encoding/decoding for internationalized domain names (IDNs). It addresses critical gaps in R's URL processing capabilities by offering reliable, fast conversion between Unicode and ASCII representations of domain names.
International domain names containing Unicode characters (like café.com or москва.рф) need to be converted to ASCII format for use in many network protocols and systems. Existing R packages have limitations:
punycoder provides:
library(punycoder) # Encode Unicode domains to ASCII puny_encode("café.com") # Returns: "xn--caf-dma.com" puny_encode("москва.рф") # Returns: "xn--80adxhks.xn--p1ai" # Decode ASCII domains back to Unicode puny_decode("xn--caf-dma.com") # Returns: "café.com" # Vectorized operations domains <- c("café.com", "москва.рф", "北京.中国") encoded <- puny_encode(domains) print(encoded)
# Encode URLs with Unicode domains url_encode("https://café.example.com/menu") # Decode URLs back to Unicode url_decode("https://xn--caf-dma.example.com/menu") # Parse URLs with IDN handling url_parts <- parse_url("https://café.example.com:8080/path?q=test#section") print(url_parts)
# Check if domain is already punycode is_punycode("xn--caf-dma.com") # TRUE is_punycode("café.com") # FALSE # Check if domain contains Unicode characters is_idn("café.com") # TRUE is_idn("example.com") # FALSE # Comprehensive domain validation result <- validate_domain(c("café.com", "invalid..domain", "valid.org")) print(result)
# Example: Processing international URLs for web scraping international_urls <- c( "https://café.paris.fr/menu", "https://москва.рф/news", "https://北京.中国/info" ) # Convert to ASCII for HTTP requests ascii_urls <- url_encode(international_urls) print(ascii_urls) # Process the data... # Convert back to Unicode for display display_urls <- url_decode(ascii_urls) print(display_urls)
# Example: Processing large datasets set.seed(123) sample_domains <- c( rep("example.com", 1000), rep("café.com", 1000), rep("test.org", 1000) ) # Efficient vectorized encoding system.time({ encoded_domains <- puny_encode(sample_domains) }) # Check results table(is_punycode(encoded_domains))
The package provides robust error handling with informative messages:
# Strict validation (default) try({ puny_encode(c("valid.com", "")) # Empty string causes error }) # Non-strict mode returns NA for invalid input result <- puny_encode(c("valid.com", ""), strict = FALSE) print(result) # Validation provides detailed error information validation <- validate_domain(c("valid.com", "invalid..domain", "")) print(validation)
The package is designed for high-performance processing:
# Benchmark with large dataset large_domains <- rep(c("example.com", "café.com"), 5000) system.time({ encoded <- puny_encode(large_domains) }) # Should process 10,000+ domains per second
You can configure package behavior using R options:
# Set global strict validation options(punycoder.strict = FALSE) # Check current setting getOption("punycoder.strict") # Set encoding preference options(punycoder.encoding = "UTF-8")
punycoder is designed to integrate well with other R packages:
# With data.table library(data.table) dt <- data.table( original = c("café.com", "москва.рф"), encoded = puny_encode(c("café.com", "москва.рф")) ) # With dplyr library(dplyr) urls_df <- data.frame( unicode_url = c("https://café.com", "https://москва.рф") ) |> mutate( ascii_url = url_encode(unicode_url), is_international = is_idn(unicode_url) )
help(package = "punycoder")The package uses a C++ backend with Rcpp for performance, and follows RFC 3492 standards for punycode implementation. When libidn2 is available at build time, punycoder uses it behind the same R-level API and falls back to the built-in implementation otherwise.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.