Builds an edge list based on whether documents are similar based on their shingles. Uses the candidates list to make comparing document much faster.
1 | build_edges(candidates, shingles, threshold = 0.8)
|
candidates |
list of buckets with document ids from |
shingles |
list of documents and their shingles from |
threshold |
jaccard similarity threshold |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.