tfidf: compute tf-idf weights from a dfm

Description Usage Arguments Details References See Also Examples

View source: R/dfm_weight.R

Description

Weight a dfm by term frequency-inverse document frequency (tf-idf) using fully sparse methods.

Usage

1
tfidf(x, scheme_tf = "prop", scheme_df = "inverse", base = 10, ...)

Arguments

x

object for which idf or tf-idf will be computed (a document-feature matrix)

scheme_tf

scheme for tf; defaults to "count"

scheme_df

scheme for link{docfreq}; defaults to "inverse"

base

for the logarithms in the tf and docfreq calls

...

additional arguments passed to docfreq when calling tfidf

Details

tfidf computes term frequency-inverse document frequency weighting. The default is not to normalize term frequency (by computing relative term frequency within document) but this will be performed if scheme_tf = "prop".

References

Manning, C. D., Raghavan, P., & Schutze, H. (2008). Introduction to Information Retrieval. Cambridge University Press.

See Also

tf, docfreq

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
head(data_dfm_lbgexample[, 5:10])
head(tfidf(data_dfm_lbgexample)[, 5:10])
docfreq(data_dfm_lbgexample)[5:15]
head(tf(data_dfm_lbgexample)[, 5:10])

# replication of worked example from
# https://en.wikipedia.org/wiki/Tf-idf#Example_of_tf.E2.80.93idf
(wikiDfm <- new("dfmSparse", 
                Matrix::Matrix(c(1,1,2,1,0,0, 1,1,0,0,2,3),
                   byrow = TRUE, nrow = 2,  
                   dimnames = list(docs = c("document1", "document2"), 
                     features = c("this", "is", "a", "sample", "another",
                                  "example")), sparse = TRUE)))
docfreq(wikiDfm)
tfidf(wikiDfm)

quanteda documentation built on Aug. 16, 2017, 1:03 a.m.