# jmiScores: Calculate joint mutual information of all features In mbq/praznik: Collection of Information-Based Feature Selection Filters

## Description

Calculated mutual information between each attribute joint with some other vector `Z` with the decision, that is

I(X,Z;Y).

This is the same as conditional mutual information between X and Y plus a constant that depends on Y and Z, that is

I(X,Z;Y)=I(X;Y|Z)+I(Y;Z).

## Usage

 `1` ```jmiScores(X, Y, Z, threads = 0) ```

## Arguments

 `X` Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details). `NA`s are not allowed. `Y` Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes. `NA`s are not allowed. `Z` Other vector; should be given as a factor, but other options are accepted, as for attributes. `threads` Number of threads to use; default value, 0, means all available to OpenMP.

## Value

A numerical vector with joint mutual information scores, with names copied from `X`.

## Note

The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy. To allow smoother user experience, praznik automatically coerces non-factor vectors in `X` and `Y`, which requires additional time and space and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function. Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature. Precise number of cuts depends on the number of objects; namely, it is n/3, but never less than 2 and never more than 10. Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.

## Examples

 `1` ```jmiScores(iris[,-5],iris\$Species,iris\$Sepal.Length) ```

mbq/praznik documentation built on May 9, 2018, 12:59 a.m.