Vec2Dtm: Convert a character vector to a document term matrix of class...

Description Usage Arguments Value Examples

Description

This function is deprecated. Use CreateDtm instead.

Usage

1
2
3
4
Vec2Dtm(vec, docnames = names(vec), min.n.gram = 1, max.n.gram = 1,
  remove.stopwords = TRUE, custom.stopwords = NULL, lower = TRUE,
  remove.punctuation = TRUE, remove.numbers = TRUE, stem.document = FALSE,
  ...)

Arguments

vec

A character vector of documents.

docnames

A vector of names for your documents. Defaults to names(doc_vec). If NULL, then docnames is set to be 1:length(doc_vec).

min.n.gram

The minimum size of n for creating n-grams. Defaults to 1.

max.n.gram

The maximum size of n for creating n-grams. Defaults to 1. Numbers greater than 3 are discouraged due to risk of overfitting.

remove.stopwords

Do you want to remove standard stopwords from your documents? Defaults to TRUE.

custom.stopwords

If not NULL (the default) a character vector of stopwords to remove from your corpus.

lower

Do you want all words coerced to lower case? Defaults to TRUE

remove.punctuation

Do you want to convert all non-alpha numeric characters to spaces? Defaults to TRUE

remove.numbers

Do you want to convert all numbers to spaces? Defaults to TRUE

stem.document

Do you want to stem the words in your document using Porter's word stemmer? Defaults to FALSE

...

Other arguments to be passed to TmParallelApply.

Value

A document term matrix of class dgCMatrix. The rows index documents. The columns index terms. The i, j entries represent the count of term j appearing in document i.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
## Not run: 
data(nih_sample)


dtm <- Vec2Dtm(vec = nih_sample$ABSTRACT_TEXT,
               docnames = nih_sample$APPLICATION_ID, 
               min.n.gram = 1, max.n.gram = 2)

dim(dtm)

head(colnames(dtm))

head(rownames(dtm))

## End(Not run)

ChengMengli/topic documentation built on May 31, 2019, 8:44 p.m.