jmiScores: Calculate joint mutual information of all features

Description Usage Arguments Value Note Examples

View source: R/scorers.R

Description

Calculated mutual information between each attribute joint with some other vector Z with the decision, that is

I(X,Z;Y).

This is the same as conditional mutual information between X and Y plus a constant that depends on Y and Z, that is

I(X,Z;Y)=I(X;Y|Z)+I(Y;Z).

Usage

1
jmiScores(X, Y, Z, threads = 0)

Arguments

X

Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details). NAs are not allowed.

Y

Decision attribute; should be given as a factor, but other options are accepted, exactly like for attributes. NAs are not allowed.

Z

Other vector; should be given as a factor, but other options are accepted, as for attributes.

threads

Number of threads to use; default value, 0, means all available to OpenMP.

Value

A numerical vector with joint mutual information scores, with names copied from X.

Note

The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy. To allow smoother user experience, praznik automatically coerces non-factor vectors in X and Y, which requires additional time and space and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function. Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature. Precise number of cuts depends on the number of objects; namely, it is n/3, but never less than 2 and never more than 10. Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.

Examples

1
jmiScores(iris[,-5],iris$Species,iris$Sepal.Length)

mbq/praznik documentation built on May 9, 2018, 12:59 a.m.