DepluralizeDtm: Run the CorrectS function on columns of a document term...

Description Usage Arguments Value Note Examples

Description

Turns pluralizations of words in the columns of a document term matrix to their singular form. Then aggregates all columns that now have the same token. See example below.

Usage

1

Arguments

dtm

A document term matrix of class dgCMatrix whose colnames are tokens

...

Other arguments to pass to TmParallelApply. See note, below.

Value

Returns a document term matrix of class dgCMatrix. The columns index the de-pluralized tokens of the input document term matrix. In other words, there will generally be fewer columns in the returned matrix than the input matrix

Note

This function performs parallel computation by default. The default behavior is to use all available cores according to detectCores. However, this can be modified by passing the cpus argument when calling this function.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
## Not run: 
myvec <- c("the quick brown fox eats chickens", 
           "the slow gray fox eats the slow chicken", 
           "look at my horse", "my horses are amazing")
           
names(myvec) <- paste("doc", 1:length(myvec), sep="_")

dtm <- Vec2Dtm(vec = myvec, min.n.gram = 1, max.n.gram = 1)

dtm_new <- DepluralizeDtm(dtm = dtm)
#' 
## End(Not run)

ChengMengli/topic documentation built on May 31, 2019, 8:44 p.m.