calculateDiscreteContinuousPointwiseConditionalEntropy_KWindow: calculate pointwise mutual information between a categorical...

View source: R/refactoredCode.R

calculateDiscreteContinuousPointwiseConditionalEntropy_KWindowR Documentation

calculate pointwise mutual information between a categorical value (X) and a continuous value (Y). I.e. the self information of X conditioned on each of the possible values for X using a sliding window and local entropy measure

Description

This is based on the technique described here: B. C. Ross, “Mutual information between discrete and continuous data sets,” PLoS One, vol. 9, no. 2, p. e87357, Feb. 2014 [Online]. Available: http://dx_doi.org/10.1371/journal.pone.0087357 but with the important simplification of using the sliding window K elements wide rather than the k nearest neighbours. This is empirically shown to have little difference on larger datasets and makes this algorithm simple to implement in dbplyr tables.

Usage

calculateDiscreteContinuousPointwiseConditionalEntropy_KWindow(
  df,
  discreteVars,
  continuousVar,
  k_05 = 4L,
  ...
)

Arguments

df

- may be grouped, in which case the value is interpreted as different types of continuous variable

discreteVars

- the column(s) of the categorical value (X) quoted by vars(...)

continuousVar

- the column of the continuous value (Y)

k_05

- half the sliding window width - this should be a small number like 1,2,3.

Value

a dataframe containing the disctinct values of the groups of df, and for each group a mutual information column (I). If df was not grouped this will be a single entry


terminological/tidy-info-stats documentation built on Nov. 19, 2022, 11:23 p.m.