rehash: Recompute the hashes for a document or corpus

Description Usage Arguments Value Examples

View source: R/rehash.R

Description

Given a TextReuseTextDocument or a TextReuseCorpus, this function recomputes either the hashes or the minhashes with the function specified. This implies that you have retained the tokens with the keep_tokens = TRUE parameter.

Usage

1
rehash(x, func, type = c("hashes", "minhashes"))

Arguments

x

A TextReuseTextDocument or TextReuseCorpus.

func

A function to either hash the tokens or to generate the minhash signature. See hash_string, minhash_generator.

type

Recompute the hashes or minhashes?

Value

The modified TextReuseTextDocument or TextReuseCorpus.

Examples

1
2
3
4
5
6
7
dir <- system.file("extdata/legal", package = "textreuse")
minhash1 <- minhash_generator(seed = 1)
corpus <- TextReuseCorpus(dir = dir, minhash_func = minhash1, keep_tokens = TRUE)
head(minhashes(corpus[[1]]))
minhash2 <- minhash_generator(seed = 2)
corpus <- rehash(corpus, minhash2, type = "minhashes")
head(minhashes(corpus[[2]]))

ropensci/textreuse documentation built on May 19, 2020, 7:40 a.m.