Description Usage Arguments Value Note Examples
This is the main document term matrix creating function for textmineR.
In most cases, all you need to do is import documents as a character vector in R and then 
run this function to get a document term matrix that is compatible with the 
rest of textmineR's functionality and many other libraries. CreateDtm
is built on top of the excellent text2vec library.
| 1 2 3 4 | 
| doc_vec | A character vector of documents. | 
| doc_names | A vector of names for your documents. Defaults to 
 | 
| ngram_window | A numeric vector of length 2. The first entry is the minimum
n-gram size; the second entry is the maximum n-gram size. Defaults to
 | 
| stopword_vec | A character vector of stopwords you would like to remove.
Defaults to  | 
| lower | Do you want all words coerced to lower case? Defaults to  | 
| remove_punctuation | Do you want to convert all non-alpha numeric 
characters to spaces? Defaults to  | 
| remove_numbers | Do you want to convert all numbers to spaces? Defaults 
to  | 
| stem_lemma_function | A function that you would like to apply to the documents for stemming, lemmatization, or similar. See examples for usage. | 
| verbose | Defaults to  | 
| ... | Other arguments to be passed to  | 
A document term matrix of class dgCMatrix. The rows index 
documents. The columns index terms. The i, j entries represent the count of 
term j appearing in document i.
The following transformations are applied to stopword_vec as 
well as doc_vec: 
lower, 
remove_punctuation, 
remove_numbers
See stopwords for details on the default to the 
stopword_vec argument.
| 1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run: 
data(nih_sample)
# DTM of unigrams and bigrams
dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,
                 doc_names = nih_sample$APPLICATION_ID, 
                 ngram_window = c(1, 2))
# DTM of unigrams with Porter's stemmer applied
dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,
                 doc_names = nih_sample$APPLICATION_ID,
                 stem_lemma_function = function(x) SnowballC::wordStem(x, "porter"))
## End(Not run)
 | 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.