praznik-package: Tools for information-based feature selection and scoring

praznik-packageR Documentation

Tools for information-based feature selection and scoring

Description

Praznik is a collection of tools for information theory-based feature selection and scoring.

Details

The first part of the package consists of efficient implementations of several popular information filters, feature selection approaches based on greedy optimisation of certain feature inclusion criterion. In a nutshell, an algorithm of this class requires an information system (X,Y) and a predefined number of features to selected k, and works like this. To start, it estimates mutual information between each feature and the decision, find a feature with maximal such score and stores it as a first on a list of selected features, S. Then, it estimates a value of a certain criterion function J(X,Y,S) for each feature X; this function also depends on Y and the set of already selected features S. As in the first step, the previously unselected feature with a greatest value of the criterion function is selected next. This is repeated until the method would gather k features, or, in case of some methods, when no more informative features can be found. The methods implemented in praznik consider the following criteria.

The mutual information maximisation filter, MIM, simply selects top-k features of best mutual information, that is

J_{MIM}=I(X;Y).

The minimal conditional mutual information maximisation proposed by F. Fleauret, CMIM, uses

J_{CMIM}(X)=\min(I(X;Y),\min_{W\in S} I(X;Y|W));

this method is also effectively identical to the information fragments method.

The minimum redundancy maximal relevancy proposed by H. Peng et al., MRMR, uses

J_{MRMR}=I(X;Y)-\frac{1}{|S|}∑_{W\in S} I(X;W).

The joint mutual information filter by H. Yang and J. Moody, JMI, uses

J_{JMI}=∑_{W\in S} I(X,W;Y).

The double input symmetrical relevance filter by P. Meyer and G. Bontempi, DISR, uses

J_{DISR}(X)=∑_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.

The minimal joint mutual information maximisation filter by M. Bennasar, Y. Hicks and R. Setchi, JMIM, uses

J_{JMIM}=\min_{W\in S} I(X,W;Y).

The minimal normalised joint mutual information maximisation filter by the same authors, NJMIM, uses

J_{NJMIM}=\min_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.

The third-order joint mutual information filter by Sechidis et al., JMI3, uses

J(X)=\frac{1}{2}∑_{(U,W)\in S^2; U\neq W} I(X,U,W;Y).

While CMIM, JMIM and NJMIM consider minimal value over already selected features, they may use a somewhat more sophisticated and faster algorithm.

The second part of the package provides methods for scoring features, useful on its own as well as building blocks for more sophisticated algorithms. In particular, the package exposes the following functions:

hScores returns

H(X).

jhScores returns

H(X,Y).

miScores returns

I(X;Y).

cmiScores returns, for a given condition vector Z,

I(X;Y|Z).

jmiScores returns

I(X,Z;Y).

njmiScores returns

\frac{I(X,Z;Y)}{H(X,Y,Z)}.

minCmiScores, maxCmiScores and minMaxCmiScores return

\min_j I(X_i;Y|X_j)

and/or

\max_j I(X_i;Y|X_j).

maxJmiScores returns

\max_{j\neq i} I(X_i,X_j;Y).

triScores returns, for every triple of features,

I(X_i;X_j;X_k).

These function generally also have their *Matrix counterparts, which efficiently build a matrix of scores between all pairs of features. This is especially useful for network inference tasks.

Estimation of mutual information and its generalisations is a hard task; still, praznik aims at speed and simplicity and hence only offers basic, maximum likelihood estimator applicable on discrete data. For convenience, praznik automatically and silently coerces non-factor inputs into about ten equally-spaced bins, following the heuristic often used in literature.

Furthermore, praznik provides kTransform function for converting continuous features into discrete ones with Kendall transformation, a novel approach based on Kendall correlation coefficient which allows for multivariate reasoning based on monotonicity agreement.

Additionally, praznik has a limited, experimental support for replacing entropic statistics with Gini impurity-based; in such framework, entropy is replaced by Gini impurity

g(X):=1-∑_x p_x^2,

which leads to an impurity gain

G(X;Y):=g(Y)-E(g(Y)|X)=∑_{xy}\frac{p_{xy}^2}{p_x}-∑_{y} p_y^2,

a counterpart of mutual information or information gain. It does not possess most of elegant properties of mutual information, yet values of both are usually highly correlated; moreover, Gini gain is computationally easier to calculate, hence it often replaces MI in performance-sensitive applications, like optimising splits in decision trees.

In a present version, praznik includes impScores for generating values of G for all features (an analog of miScores, as well as JIM, a Gini gain-based feature selection method otherwise identical to JMI.

Author(s)

Maintainer: Miron B. Kursa m@mbq.me (ORCID)

References

"Praznik: High performance information-based feature selection" M.B. Kursa. SoftwareX 16, 100819 (2021).

"Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection" G. Brown et al. JMLR (2012).

See Also

Useful links:


praznik documentation built on May 20, 2022, 5:06 p.m.