knitr::opts_chunk$set(collapse=TRUE, fig.retina=2, message=FALSE, warning=FALSE) options(width=120)
Identify the Crux of an Article
Methods are provided to retrieve HTML content and return extracted metadata and summarised plain text. Further methods are provided to classify URLs with or without making network calls. Based on https://github.com/chimbori/crux.
The following functions are implemented:
classify_url
: Classify a URL with or without making network callsis_ad_image
: Classify a URL with or without making network callsis_likely_archive
: Classify a URL with or without making network callsis_likely_article
: Classify a URL with or without making network callsis_likely_audio
: Classify a URL with or without making network callsis_likely_binary_doc
: Classify a URL with or without making network callsis_likely_executable
: Classify a URL with or without making network callsis_likely_image
: Classify a URL with or without making network callsis_likely_video
: Classify a URL with or without making network callsis_web_scheme
: Classify a URL with or without making network callssummarise_url
: Summarise the contents at a URL to essential bitsinstall.packages(c("cruxjars", "crux"), repos = "https://cinc.rud.is/")
library(crux) # current version packageVersion("crux")
str( summarise_url("http://time.com/5541738/joe-biden-backtracks-pence-praise-criticism/"), 1 )
str( classify_url("https://www.washingtonpost.com/powerpost/house-democrats-explode-in-recriminations-as-liberals-lash-out-at-moderates/2019/02/28/c3d163fe-3b87-11e9-a06c-3ec8ed509d15_story.html") )
cloc::cloc_pkg_md()
Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.