lsh_compare: Compare candidates identified by LSH
In textreuse: Detect Text Reuse and Document Similarity

Description Usage Arguments Value Examples

The lsh_candidates only identifies potential matches, but cannot estimate the actual similarity of the documents. This function takes a data frame returned by lsh_candidates and applies a comparison function to each of the documents in a corpus, thereby calculating the document similarity score. Note that since your corpus will have minhash signatures rather than hashes for the tokens itself, you will probably wish to use tokenize to calculate new hashes. This can be done for just the potentially similar documents. See the package vignettes for details.

1	lsh_compare(candidates, corpus, f, progress = interactive())

`candidates`	A data frame returned by `lsh_candidates`.
`corpus`	The same `TextReuseCorpus` corpus which was used to generate the candidates.
`f`	A comparison function such as `jaccard_similarity`.
`progress`	Display a progress bar while comparing documents.

A data frame with values calculated for score.

dir <- system.file("extdata/legal", package = "textreuse")
minhash <- minhash_generator(200, seed = 234)
corpus <- TextReuseCorpus(dir = dir,
                          tokenizer = tokenize_ngrams, n = 5,
                          minhash_func = minhash)
buckets <- lsh(corpus, bands = 50)
candidates <- lsh_candidates(buckets)
lsh_compare(candidates, corpus, jaccard_similarity)

# A tibble: 0 x 3
# ... with 3 variables: a <chr>, b <chr>, score <dbl>

textreuse documentation built on July 8, 2020, 6:40 p.m.

textreuse index

Package overview README.md Introduction to the textreuse package Minhash and locality-sensitive hashing Pairwise comparisons for document similarity Text Alignment

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

textreuse
Detect Text Reuse and Document Similarity

lsh_compare: Compare candidates identified by LSH
In textreuse: Detect Text Reuse and Document Similarity

Description

Usage

Arguments

Value

Examples

Example output

Related to lsh_compare in textreuse...

R Package Documentation

Browse R Packages

We want your feedback!

textreuse Detect Text Reuse and Document Similarity

lsh_compare: Compare candidates identified by LSH In textreuse: Detect Text Reuse and Document Similarity

Description

Usage

Arguments

Value

Examples

Example output

Related to lsh_compare in textreuse...

R Package Documentation

Browse R Packages

We want your feedback!

textreuse
Detect Text Reuse and Document Similarity

lsh_compare: Compare candidates identified by LSH
In textreuse: Detect Text Reuse and Document Similarity