not_news: Classify News and Non-News Based on keywords in the URL
In rdomains: Get the Category of Content Hosted by a Domain

not_news

R Documentation

Classify News and Non-News Based on keywords in the URL

Description

Based on a slightly amended version of the regular expression used to classify news, and non-news in: “Exposure to ideologically diverse news and opinion on Facebook” by Bakshy, Messing, and Adamic. Science. 2015.

Usage

not_news(url_list = NULL)

Arguments

url_list

vector of URLs

Details

Amendment: sport rather than sports

Note that it is based on patterns existing in a small set of domains. See paper for details.

Value

data.frame with 3 columns: url, not_news, news

References

https://www.science.org/doi/10.1126/science.aaa1160

Examples

## Not run: 
not_news("http://www.bbc.com/sport")
not_news(c("http://www.bbc.com/sport", "http://www.washingtontimes.com/news/politics/"))

## End(Not run)

rdomains documentation built on March 18, 2022, 7:28 p.m.