adjustMIForAbsentValues: Adjust mutual information calculation when values are...

View source: R/missingValues.R

adjustMIForAbsentValuesR Documentation

Adjust mutual information calculation when values are missing.

Description

This corrects normal mutual information calculations for information carried by the absense of a variable. This is relevant for sparse data sets with many features such as NLP terms. Unequal "missingness" of features can contain information about the outcomes. To deterimine whether a feature is missung for a given sample we need to make some assumptions. These are generally calculated from the data but can alternatively be specificied directly.

Usage

adjustMIForAbsentValues(df, discreteVars, sampleVars, mutualInformationFn, ...)

Arguments

df

- may be grouped, in which case the value is interpreted as different types of continuous variable

discreteVars

- the column(s) of the categorical value (X) quoted by vars(...) (e.g. outcome)

sampleVars

- the column(s) which uniquely identify the sample (e.g. person identifier)

mutualInformationFn

- the function that will calculate the unadjusted MI

...

- the other parameters are passed onto the function specified in mutualInformationFn and the observedVersusExpected(...) function. Particularly sampleCount or sampleCountDf

Value

a dataframe containing the disctinct values of the groups of df, and for each group a mutual information column (I). If df was not grouped this will be a single entry


terminological/tidy-info-stats documentation built on Nov. 19, 2022, 11:23 p.m.