pmi: A function to calculate a number of information-theoretic...

Description Usage Arguments Value

View source: R/pmi.R

Description

A function to calculate a number of information-theoretic measures on terms in a contingency table, including point-wise mutual information.

Usage

1
2
pmi(contingency_table, display_top_x_terms = 20, term_threshold = 5,
  every_category_counts = FALSE)

Arguments

contingency_table

A contingency table generated by the 'contingency_table()' function.

display_top_x_terms

Defaults to 20, the number of top ranked terms to display for each measure.

term_threshold

The threshold at which terms are eliminated from the contingency table for the purposes of calculating information-theoretic quantities. THis gets around issues with terms that only appear once having very high PMI.

every_category_counts

Defaults to FALSE, if TRUE, then terms are removed if they do not appear at least term_threshold times in every row (category) of the contingency table.

Value

A list object containing lots of different information theoretic measures calculated on the contingency table. If a sparse matrix was provided, then a sparse PMI table is returned. Note that the "zero" entries in this sparse matrix are actually -Inf, but cannot be represented as such using the slam sparse matrix libraries (which this package does), so you will manually need to replace the zero entries with -Inf if you want to compare to a dense matrix.


matthewjdenny/SpeedReader documentation built on March 25, 2020, 5:32 p.m.