CMIM: Minimal conditional mutual information maximisation filter
In praznik: Tools for Information-Based Feature Selection and Scoring

View source: R/algorithms.R

CMIM	R Documentation

Minimal conditional mutual information maximisation filter

Description

The method starts with a feature of a maximal mutual information with the decision Y. Then, it greedily adds feature X with a maximal value of the following criterion:

J(X)=\min(I(X;Y),\min_{W\in S} I(X;Y|W)),

where S is the set of already selected features.

Usage

CMIM(X, Y, k = 3, threads = 0)

Arguments

`X`	Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details). Single vector will be interpreted as a data.frame with one column. `NA`s are not allowed.
`Y`	Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes. `NA`s are not allowed.
`k`	Number of attributes to select. Must not exceed `ncol(X)`.
`threads`	Number of threads to use; default value, 0, means all available to OpenMP.

Value

A list with two elements: selection, a vector of indices of the selected features in the selection order, and score, a vector of corresponding feature scores. Names of both vectors will correspond to the names of features in X. Both vectors will be at most of a length k, as the selection may stop sooner, even during initial selection, in which case both vectors will be empty.

Note

The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy. To allow smoother user experience, praznik automatically coerces non-factor vectors in inputs, which requires additional time, memory and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function. Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature. Precise number of cuts depends on the number of objects; namely, it is n/3, but never less than 2 and never more than 10. Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.

References

"Fast Binary Feature Selection using Conditional Mutual Information Maximisation" F. Fleuret, JMLR (2004)

"Object recognition with informative features and linear classification" M. Vidal-Naquet and S. Ullman, IEEE Conference on Computer Vision and Pattern Recognition (2003).

Examples

data(MadelonD)
CMIM(MadelonD$X,MadelonD$Y,20)

praznik documentation built on Nov. 11, 2025, 9:06 a.m.

praznik index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

praznik
Tools for Information-Based Feature Selection and Scoring

CMIM: Minimal conditional mutual information maximisation filter
In praznik: Tools for Information-Based Feature Selection and Scoring

Minimal conditional mutual information maximisation filter

Description

Usage

Arguments

Value

Note

References

Examples

Related to CMIM in praznik...

R Package Documentation

Browse R Packages

We want your feedback!

praznik Tools for Information-Based Feature Selection and Scoring

CMIM: Minimal conditional mutual information maximisation filter In praznik: Tools for Information-Based Feature Selection and Scoring

Minimal conditional mutual information maximisation filter

Description

Usage

Arguments

Value

Note

References

Examples

Related to CMIM in praznik...

R Package Documentation

Browse R Packages

We want your feedback!

praznik
Tools for Information-Based Feature Selection and Scoring

CMIM: Minimal conditional mutual information maximisation filter
In praznik: Tools for Information-Based Feature Selection and Scoring