reduce_dtm_tfidf: Internal Supreme function

Description Usage Arguments Value Examples

Description

reduce_dtm_tfidf selects suitable columns of an unlabeled document-term matrix by deleting terms with tf-idf score out of a range defined by inf and sup quantiles of tf-idf scores distribution. reduce_dtm_tfidf is called by reduce_dtm function.

Usage

1
reduce_dtm_tfidf(dtm, q = list(inf = 0.25, sup = 0.75), export = FALSE)

Arguments

dtm

a document-term matrix in term frequency format.

q

a list with inf and sup quantiles of tf-idf scores distribution. Default are the first and third quartiles.

export

logical. If TRUE exports the discarded terms, the vocabulary and the returned object to the built-in directory data/dtm. Default is FALSE.

Value

a list with the reduced dtm and associated term_tfidf. Values of quantile thresholds are also returned.

Examples

1
2
3
4
5
6
## Not run: 
library(Supreme)
data("dtm")
dtm.tfidf <- reduce_dtm_tfidf(dtm, export = TRUE)

## End(Not run)

paolofantini/Supreme documentation built on May 24, 2019, 6:14 p.m.