View source: R/preprocessCorpus.R
toDocumentTermMatrix | R Documentation |
Preprocess existing corpus of type Corpus
according to default operations.
This helper function groups all standard preprocessing steps such that the usage of the
package is more convenient. The result is a document-term matrix.
toDocumentTermMatrix(
x,
language = "english",
minWordLength = 3,
sparsity = NULL,
removeStopwords = TRUE,
stemming = TRUE,
weighting = function(x) tm::weightTfIdf(x, normalize = FALSE)
)
x |
|
language |
Default language used for preprocessing (i.e. stop word removal and stemming) |
minWordLength |
Minimum length of words used for cut-off; i.e. shorter words are removed. Default is 3. |
sparsity |
A numeric for the maximal allowed sparsity in the range from bigger zero to
smaller one. Default is |
removeStopwords |
Flag indicating whether to remove stopwords or not (default: yes) |
stemming |
Perform stemming (default: TRUE) |
weighting |
Function used for weighting of words; default is a a link to the tf-idf scheme. |
Object of DocumentTermMatrix
DocumentTermMatrix
for the underlying class
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.