Description Usage Arguments Value Examples
Compute document-term matrix from a corpus.
1 2 3 4 5 6 7 8 9 10 |
corpus |
A |
sparsity |
Value between 0 and 1 indicating the proportion of documents
with no occurrences of a term above which that term should be dropped. By default
all terms are kept ( |
dictionary |
A vector of terms to which the matrix should be restricted.
By default, all words with more than |
remove_stopwords |
Whether to remove stopwords appearing in a language-specific list
(see |
tolower |
Whether to convert all text to lower case. |
remove_punctuation |
Whether to remove all punctuation from text before tokenizing terms. |
remove_numbers |
Whether to remove all numbers from text before tokenizing terms. |
min_length |
The minimal number of characters for a word to be retained. |
A DocumentTermMatrix
object.
1 2 3 | file <- system.file("texts", "reut21578-factiva.xml", package="tm.plugin.factiva")
corpus <- import_corpus(file, "factiva", language="en")
build_dtm(corpus)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.