whe_similarity: Compare articles with set of documents

Description Usage Arguments Details Value Examples

Description

Compare articles with set of documents, i.e.: press releases.

Usage

1
2
3
4
5
6
7
whe_similarity(wh, docs, progress = interactive())

## S3 method for class 'data.frame'
whe_similarity(wh, docs, progress = interactive())

## S3 method for class 'character'
whe_similarity(wh, docs, progress = interactive())

Arguments

wh

highlighted object returned by wh_collect, see examples.

docs

documents to compare the articles with.

progress

whether to show progress bar.

Details

This function uses the https://en.wikipedia.org/wiki/Jaccard_index

Value

if a data.frame is passed will append a column named similarity.* where * is the input document number. If a character vector is passed the function returns a character vector.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
## Not run: 
library(webhose)
token <- wh_token("xXX-x0X0xX0X-00X")

token %>%
  wh_news(q = '"World Economic Forum"') %>%  # use highlight!
  wh_collect() -> wef # collect results

library(rvest)

html <- read_html('http://reports.weforum.org/global-gender-gap-report-2017/press-release/')

# scrape Gender Gap Report press release
html %>%
  html_nodes(".content") %>%
  html_children() %>%
  html_text() %>%
  .[5:40] %>%
  paste0(., collapse = "\n") -> pr

wef %>%
  whe_similarity(pr) -> similarity

library(dplyr)
wef %>%
  mutate(nmentions = whe_mentions(text)) -> similarity

## End(Not run)

JohnCoene/webhoserx documentation built on June 15, 2019, 3:48 p.m.