JMI: Joint mutual information filter
In praznik: Tools for Information-Based Feature Selection and Scoring

View source: R/algorithms.R

JMI	R Documentation

Joint mutual information filter

Description

The method starts with a feature of a maximal mutual information with the decision Y. Then, it greedily adds feature X with a maximal value of the following criterion:

J(X)=\sum_{W\in S} I(X,W;Y),

where S is the set of already selected features.

Usage

JMI(X, Y, k = 3, threads = 0)

Arguments

`X`	Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details). Single vector will be interpreted as a data.frame with one column. `NA`s are not allowed.
`Y`	Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes. `NA`s are not allowed.
`k`	Number of attributes to select. Must not exceed `ncol(X)`.
`threads`	Number of threads to use; default value, 0, means all available to OpenMP.

Value

A list with two elements: selection, a vector of indices of the selected features in the selection order, and score, a vector of corresponding feature scores. Names of both vectors will correspond to the names of features in X. Both vectors will be at most of a length k, as the selection may stop sooner, even during initial selection, in which case both vectors will be empty.

Note

DISR is a normalised version of JMI; JMIM and NJMIM are modifications of JMI and DISR in which minimal joint information over already selected features is used instead of a sum.

The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy. To allow smoother user experience, praznik automatically coerces non-factor vectors in inputs, which requires additional time, memory and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function. Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature. Precise number of cuts depends on the number of objects; namely, it is n/3, but never less than 2 and never more than 10. Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.

References

"Data Visualization and Feature Selection: New Algorithms for Nongaussian Data" H. Yang and J. Moody, NIPS (1999)

Examples

data(MadelonD)
JMI(MadelonD$X,MadelonD$Y,20)

praznik documentation built on Nov. 11, 2025, 9:06 a.m.

praznik index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

praznik
Tools for Information-Based Feature Selection and Scoring

JMI: Joint mutual information filter
In praznik: Tools for Information-Based Feature Selection and Scoring

Joint mutual information filter

Description

Usage

Arguments

Value

Note

References

Examples

Related to JMI in praznik...

R Package Documentation

Browse R Packages

We want your feedback!

praznik Tools for Information-Based Feature Selection and Scoring

JMI: Joint mutual information filter In praznik: Tools for Information-Based Feature Selection and Scoring

Joint mutual information filter

Description

Usage

Arguments

Value

Note

References

Examples

Related to JMI in praznik...

R Package Documentation

Browse R Packages

We want your feedback!

praznik
Tools for Information-Based Feature Selection and Scoring

JMI: Joint mutual information filter
In praznik: Tools for Information-Based Feature Selection and Scoring