Description Usage Arguments See Also Examples
splits strings into n-grams with given minimal and maximal numbers of grams.
1 2 | tfidfTransformer(text_vector, ngrams = 1, minDocFreq = 2, wordLengths = 3,
wordLengths_max = 20, idf = TRUE, cores = 6)
|
text_vector |
a vector of strings to be tokenized. |
ngrams |
number of grams for ngrams transformation |
minDocFreq |
minimum frequency for each document to be kept |
wordLengths |
minimum length of a valid word to be kept |
wordLengths_max |
maximum length of a valid word to be kept |
idf |
inverse-document-frequency OR term-frequency, TRUE/FALSE |
cores |
number of cores for parallel computing |
tfidfTransformer
1 2 3 4 | setupTwitterConn()
tweets <- tweet_corpus(search = "audusd", n = 100, since = as.character(Sys.Date()-7), until = as.character(Sys.Date()))
tfidf.dt = tfidfTransformer(tweets$d$text, ngrams = 1, minDocFreq = 2, wordLengths = 3, wordLengths_max = 20, idf = TRUE, cores = 6)
head(as.matrix(tfidf.dt))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.