dtmwrappers: Wrappers to DocumentTermMatrix and DocumentTermMatrix to use...

Description Usage Arguments Value See Also Examples

Description

Wrappers to DocumentTermMatrix and DocumentTermMatrix to use n-gram tokenization provided by ngramrr.

Usage

1
2
3
dtm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)

tdm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)

Arguments

x

character vector, Source or Corpus to be converted

char

logical, using character n-gram. char = FALSE denotes word n-gram.

ngmin

integer, minimun order of n-gram

ngmax

integer, maximun order of n-gram

rmEOL

logical, remove ngrams wih EOL character

...

Additional options for DocumentTermMatrix or DocumentTermMatrix

Value

DocumentTermMatrix or DocumentTermMatrix

See Also

ngramrr, DocumentTermMatrix, TermDocumentMatrix

Examples

1
2
3
4
5
6
nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
dtm2(nirvana, ngmax = 3, removePunctuation = TRUE)

Example output

<<DocumentTermMatrix (documents: 18, terms: 25)>>
Non-/sparse entries: 36/414
Sparsity           : 92%
Maximal term length: 10
Weighting          : term frequency (tf)

ngramrr documentation built on May 2, 2019, 11:28 a.m.