Quantify 'URL' Diversity and Apply Popular Biodiversity Indices to a 'URL' Collection
Methods are provided to compute the 'WSDL Diversity Index' http://ws-dl.blogspot.com/2018/05/2018-05-04-exploration-of-url-diversity.html along with selected biodiversity indidces to a corpus (collection) of 'URLs'.
All credit goes to Alexander Nwala for the algorithm research and original Python implementation.
The following functions are implemented:
uri_diversity
: Quantify URL diversityurl_diversity
: (an alias for ^^ b/c I regularly forget it's rightlfully uri
)clean_index_factors
: Clean up diversity and evenness namesbody_anchor_urls
: Extract all body anchor hypertext referencesbody_img_urls
: Extract all body image URLssafeGET
: Safer version of 'httr::GET()'safePOST
: Safer version of 'httr::POST()'devtools::install_github("hrbrmstr/urldiversity")
options(width=120)
library(urldiversity) # current verison packageVersion("urldiversity")
collection <- readLines(system.file("extdat", "corpus.txt", package = "urldiversity")) print(collection) x <- uri_diversity(collection) dplyr::glimpse(x) x
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.