tfidfTransformer: Generate n-grams from a document

Description Usage Arguments See Also Examples

View source: R/textMining.R

Description

splits strings into n-grams with given minimal and maximal numbers of grams.

Usage

1
2
tfidfTransformer(text_vector, ngrams = 1, minDocFreq = 2, wordLengths = 3,
  wordLengths_max = 20, idf = TRUE, cores = 6)

Arguments

text_vector

a vector of strings to be tokenized.

ngrams

number of grams for ngrams transformation

minDocFreq

minimum frequency for each document to be kept

wordLengths

minimum length of a valid word to be kept

wordLengths_max

maximum length of a valid word to be kept

idf

inverse-document-frequency OR term-frequency, TRUE/FALSE

cores

number of cores for parallel computing

See Also

tfidfTransformer

Examples

1
2
3
4
setupTwitterConn()
tweets <- tweet_corpus(search = "audusd", n = 100, since = as.character(Sys.Date()-7), until = as.character(Sys.Date()))
tfidf.dt =  tfidfTransformer(tweets$d$text, ngrams = 1, minDocFreq = 2, wordLengths = 3, wordLengths_max = 20, idf = TRUE, cores = 6)
head(as.matrix(tfidf.dt))

ivanliu1989/RQuant documentation built on Sept. 13, 2019, 11:53 a.m.