Description Usage Arguments Value Examples
Given a TextReuseTextDocument
or a
TextReuseCorpus
, this function recomputes the tokens and hashes
with the functions specified. Optionally, it can also recompute the minhash signatures.
1 2 3 4 5 6 7 8 9 |
x |
A |
tokenizer |
A function to split the text into tokens. See
|
... |
Arguments passed on to the |
hash_func |
A function to hash the tokens. See
|
minhash_func |
A function to create minhash signatures. See
|
keep_tokens |
Should the tokens be saved in the document that is returned or discarded? |
keep_text |
Should the text be saved in the document that is returned or discarded? |
The modified TextReuseTextDocument
or
TextReuseCorpus
.
1 2 3 4 | dir <- system.file("extdata/legal", package = "textreuse")
corpus <- TextReuseCorpus(dir = dir, tokenizer = NULL)
corpus <- tokenize(corpus, tokenize_ngrams)
head(tokens(corpus[[1]]))
|
[1] "4 every action" "every action shall" "action shall be"
[4] "shall be prosecuted" "be prosecuted in" "prosecuted in the"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.