pairwise_delta: Delta measure of pairs of documents

Description Usage Arguments See Also Examples

View source: R/pairwise_delta.R

Description

Compute the delta distances (from its two variants) of all pairs of documents in a tidy table.

Usage

1
2
3
pairwise_delta(tbl, item, feature, value, method = "burrows", ...)

pairwise_delta_(tbl, item, feature, value, method = "burrows", ...)

Arguments

tbl

Table

item

Item to compare; will end up in item1 and item2 columns

feature

Column describing the feature that links one item to others

value

Value

method

Distance measure to be used; see dist

...

Extra arguments passed on to squarely, such as diag and upper

See Also

squarely

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
library(janeaustenr)
library(dplyr)
library(tidytext)

# closest documents in terms of 1000 most frequent words
closest <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word) %>%
  top_n(1000, n) %>%
  pairwise_delta(book, word, n, method = "burrows") %>%
  arrange(delta)

closest

closest %>%
  filter(item1 == "Pride & Prejudice")

# to remove duplicates, use upper = FALSE
closest <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word) %>%
  top_n(1000, n) %>%
  pairwise_delta(book, word, n, method = "burrows", upper = FALSE) %>%
  arrange(delta)

# Can also use Argamon's Linear Delta
closest <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word) %>%
  top_n(1000, n) %>%
  pairwise_delta(book, word, n, method = "argamon", upper = FALSE) %>%
  arrange(delta)

widyr documentation built on April 14, 2020, 6:16 p.m.