get_idf: Inverse document-frequency scaling matrix

Description Usage Arguments Value See Also


This function creates an inverse-document-frequency (IDF) scaling matrix from a document-term matrix. The IDF is defined as follows: idf = log(# documents in the corpus) / (# documents where the term appears + 1)


get_idf(dtm, log_scale = log, smooth_idf = TRUE)



a document-term matrix of class dgCMatrix or dgTMatrix.


function to use in calculating the IDF matrix. Usually log is used, but it might be worth trying log2.


logical smooth IDF weights by adding one to document frequencies, as if an extra document was seen containing every term in the collection exactly once. This prevents division by zero.


ddiMatrix IDF scaling diagonal sparse matrix.

See Also

get_tf, get_dtm, create_dtm

Search within the text2vec package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.