tfidf: tf-idf

Description Usage Arguments Value Author(s) See Also Examples

Description

term frequency, inverse document frequency

Usage

1
tfidf(x,normalize=TRUE)

Arguments

x

A dgCMatrix or matrix of counts.

normalize

Whether to normalize term frequency by document totals.

Value

A matrix of the same type as x, with values replaced by the tf-idf

f_{ij} * \log[n/(d_j+1)],

where f_{ij} is x_{ij}/m_i or x_{ij}, depending on normalize, and d_j is the number of documents containing token j.

Author(s)

Matt Taddy taddy@chicagobooth.edu

See Also

pls, we8there

Examples

1
2
3
4
5
6
data(we8there)
## 20 high-variance tf-idf terms
colnames(we8thereCounts)[
	order(-sdev(tfidf(we8thereCounts)))[1:20]]
 
 

TaddyLab/textir documentation built on May 9, 2019, 4:17 p.m.