Description Usage Arguments Details Value
Simple text preprocessor for, namely for example purposes.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | prepare_documents(data, ...)
## S3 method for class 'data.frame'
prepare_documents(data, text, doc_id = NULL,
min_freq = 1, lexicon = c("SMART", "snowball", "onix"), ...,
return_doc_id = FALSE)
## S3 method for class 'character'
prepare_documents(data, doc_id = NULL,
min_freq = 1, lexicon = c("SMART", "snowball", "onix"), ...,
return_doc_id = FALSE)
## S3 method for class 'factor'
prepare_documents(data, doc_id = NULL, min_freq = 1,
lexicon = c("SMART", "snowball", "onix"), ..., return_doc_id = FALSE)
|
data |
A |
... |
Any other parameters. |
text |
A bare column name or a vector of documents. |
doc_id |
Id of documents, if omitted they are created dynamically
assuming each element of |
min_freq |
Minimum term frequency to keep terms in. |
lexicon |
Name of a lexicon of stopwords, borrowed from stop_words. |
return_doc_id |
Whether to return document id (named list). |
Simply tokenises each document, removes punctuation, stop words, digits,
and keeps only terms that appear more than min_freq
across documents.
A named list
of documents where the names are the documents id
.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.