dtmwrappers: Wrappers to DocumentTermMatrix and DocumentTermMatrix to use...
In ngramrr: A Simple General Purpose N-Gram Tokenizer

Description Usage Arguments Value See Also Examples

Wrappers to DocumentTermMatrix and DocumentTermMatrix to use n-gram tokenization provided by ngramrr.

1
2
3

dtm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)

tdm2(x, char = FALSE, ngmin = 1, ngmax = 2, rmEOL = TRUE, ...)

`x`	character vector, `Source` or `Corpus` to be converted
`char`	logical, using character n-gram. char = FALSE denotes word n-gram.
`ngmin`	integer, minimun order of n-gram
`ngmax`	integer, maximun order of n-gram
`rmEOL`	logical, remove ngrams wih EOL character
`...`	Additional options for `DocumentTermMatrix` or `DocumentTermMatrix`

DocumentTermMatrix or DocumentTermMatrix

ngramrr, DocumentTermMatrix, TermDocumentMatrix

nirvana <- c("hello hello hello how low", "hello hello hello how low",
"hello hello hello how low", "hello hello hello",
"with the lights out", "it's less dangerous", "here we are now", "entertain us",
"i feel stupid", "and contagious", "here we are now", "entertain us",
"a mulatto", "an albino", "a mosquito", "my libido", "yeah", "hey yay")
dtm2(nirvana, ngmax = 3, removePunctuation = TRUE)

<<DocumentTermMatrix (documents: 18, terms: 25)>>
Non-/sparse entries: 36/414
Sparsity           : 92%
Maximal term length: 10
Weighting          : term frequency (tf)