lsh_candidates: Candidate pairs from LSH comparisons

View source: R/lsh_candidates.R

lsh_candidatesR Documentation

Candidate pairs from LSH comparisons

Description

Given a data frame of LSH buckets returned from lsh, this function returns the potential candidates.

Usage

lsh_candidates(buckets)

Arguments

buckets

A data frame returned from lsh.

Value

A data frame of candidate pairs.

Examples

dir <- system.file("extdata/legal", package = "textreuse")
minhash <- minhash_generator(200, seed = 234)
corpus <- TextReuseCorpus(dir = dir,
                          tokenizer = tokenize_ngrams, n = 5,
                          minhash_func = minhash)
buckets <- lsh(corpus, bands = 50)
lsh_candidates(buckets)


ropensci/textreuse documentation built on Aug. 8, 2024, 9:17 a.m.