Description Usage Arguments Value Note Examples
View source: R/corpus_functions.R
This is the main document term matrix creating function for textmineR
.
In most cases, all you need to do is import documents as a character vector in R and then
run this function to get a document term matrix that is compatible with the
rest of textmineR
's functionality and many other libraries. CreateDtm
is built on top of the excellent text2vec
library.
1 2 3 4 5 6 7 8 9 10 11 12 |
doc_vec |
A character vector of documents. |
doc_names |
A vector of names for your documents. Defaults to
|
ngram_window |
A numeric vector of length 2. The first entry is the minimum
n-gram size; the second entry is the maximum n-gram size. Defaults to
|
stopword_vec |
A character vector of stopwords you would like to remove.
Defaults to |
lower |
Do you want all words coerced to lower case? Defaults to |
remove_punctuation |
Do you want to convert all non-alpha numeric
characters to spaces? Defaults to |
remove_numbers |
Do you want to convert all numbers to spaces? Defaults
to |
stem_lemma_function |
A function that you would like to apply to the documents for stemming, lemmatization, or similar. See examples for usage. |
verbose |
Defaults to |
... |
Other arguments to be passed to |
A document term matrix of class dgCMatrix
. The rows index
documents. The columns index terms. The i, j entries represent the count of
term j appearing in document i.
The following transformations are applied to stopword_vec
as
well as doc_vec
:
lower
,
remove_punctuation
,
remove_numbers
See stopwords
for details on the default to the
stopword_vec
argument.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | ## Not run:
data(nih_sample)
# DTM of unigrams and bigrams
dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,
doc_names = nih_sample$APPLICATION_ID,
ngram_window = c(1, 2))
# DTM of unigrams with Porter's stemmer applied
dtm <- CreateDtm(doc_vec = nih_sample$ABSTRACT_TEXT,
doc_names = nih_sample$APPLICATION_ID,
stem_lemma_function = function(x) SnowballC::wordStem(x, "porter"))
## End(Not run)
|
Loading required package: Matrix
sh: 1: cannot create /dev/null: Permission denied
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
sh: 1: wc: Permission denied
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
sh: 1: cannot create /dev/null: Permission denied
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
sh: 1: wc: Permission denied
sh: 1: cannot create /dev/null: Permission denied
sh: 1: wc: Permission denied
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
|
|======= | 10%
|
|============== | 20%
|
|===================== | 30%
|
|============================ | 40%
|
|=================================== | 50%
|
|========================================== | 60%
|
|================================================= | 70%
|
|======================================================== | 80%
|
|=============================================================== | 90%
|
|======================================================================| 100%
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.