praznik-package | R Documentation |
Praznik is a collection of tools for information theory-based feature selection and scoring.
The first part of the package consists of efficient implementations of several popular information filters, feature selection approaches based on greedy optimisation of certain feature inclusion criterion. In a nutshell, an algorithm of this class requires an information system (X,Y) and a predefined number of features to selected k, and works like this. To start, it estimates mutual information between each feature and the decision, find a feature with maximal such score and stores it as a first on a list of selected features, S. Then, it estimates a value of a certain criterion function J(X,Y,S) for each feature X; this function also depends on Y and the set of already selected features S. As in the first step, the previously unselected feature with a greatest value of the criterion function is selected next. This is repeated until the method would gather k features, or, in case of some methods, when no more informative features can be found. The methods implemented in praznik consider the following criteria.
The mutual information maximisation filter, MIM
, simply selects top-k features of best mutual information, that is
J_{MIM}=I(X;Y).
The minimal conditional mutual information maximisation proposed by F. Fleauret, CMIM
, uses
J_{CMIM}(X)=\min(I(X;Y),\min_{W\in S} I(X;Y|W));
this method is also effectively identical to the information fragments method.
The minimum redundancy maximal relevancy proposed by H. Peng et al., MRMR
, uses
J_{MRMR}=I(X;Y)-\frac{1}{|S|}∑_{W\in S} I(X;W).
The joint mutual information filter by H. Yang and J. Moody, JMI
, uses
J_{JMI}=∑_{W\in S} I(X,W;Y).
The double input symmetrical relevance filter by P. Meyer and G. Bontempi, DISR
, uses
J_{DISR}(X)=∑_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.
The minimal joint mutual information maximisation filter by M. Bennasar, Y. Hicks and R. Setchi, JMIM
, uses
J_{JMIM}=\min_{W\in S} I(X,W;Y).
The minimal normalised joint mutual information maximisation filter by the same authors, NJMIM
, uses
J_{NJMIM}=\min_{W\in S} \frac{I(X,W;Y)}{H(X,W,Y)}.
The third-order joint mutual information filter by Sechidis et al., JMI3
, uses
J(X)=\frac{1}{2}∑_{(U,W)\in S^2; U\neq W} I(X,U,W;Y).
While CMIM
, JMIM
and NJMIM
consider minimal value over already selected features, they may use a somewhat more sophisticated and faster algorithm.
The second part of the package provides methods for scoring features, useful on its own as well as building blocks for more sophisticated algorithms. In particular, the package exposes the following functions:
hScores
returns
H(X).
jhScores
returns
H(X,Y).
miScores
returns
I(X;Y).
cmiScores
returns, for a given condition vector Z,
I(X;Y|Z).
jmiScores
returns
I(X,Z;Y).
njmiScores
returns
\frac{I(X,Z;Y)}{H(X,Y,Z)}.
minCmiScores
, maxCmiScores
and minMaxCmiScores
return
\min_j I(X_i;Y|X_j)
and/or
\max_j I(X_i;Y|X_j).
maxJmiScores
returns
\max_{j\neq i} I(X_i,X_j;Y).
triScores
returns, for every triple of features,
I(X_i;X_j;X_k).
These function generally also have their *Matrix
counterparts, which efficiently build a matrix of scores between all pairs of features.
This is especially useful for network inference tasks.
Estimation of mutual information and its generalisations is a hard task; still, praznik aims at speed and simplicity and hence only offers basic, maximum likelihood estimator applicable on discrete data. For convenience, praznik automatically and silently coerces non-factor inputs into about ten equally-spaced bins, following the heuristic often used in literature.
Furthermore, praznik provides kTransform
function for converting continuous features into discrete ones with Kendall transformation, a novel approach based on Kendall correlation coefficient which allows for multivariate reasoning based on monotonicity agreement.
Additionally, praznik has a limited, experimental support for replacing entropic statistics with Gini impurity-based; in such framework, entropy is replaced by Gini impurity
g(X):=1-∑_x p_x^2,
which leads to an impurity gain
G(X;Y):=g(Y)-E(g(Y)|X)=∑_{xy}\frac{p_{xy}^2}{p_x}-∑_{y} p_y^2,
a counterpart of mutual information or information gain. It does not possess most of elegant properties of mutual information, yet values of both are usually highly correlated; moreover, Gini gain is computationally easier to calculate, hence it often replaces MI in performance-sensitive applications, like optimising splits in decision trees.
In a present version, praznik includes impScores
for generating values of G for all features (an analog of miScores
, as well as JIM
, a Gini gain-based feature selection method otherwise identical to JMI
.
Maintainer: Miron B. Kursa m@mbq.me (ORCID)
"Praznik: High performance information-based feature selection" M.B. Kursa. SoftwareX 16, 100819 (2021).
"Conditional Likelihood Maximisation: A Unifying Framework for Information Theoretic Feature Selection" G. Brown et al. JMLR (2012).
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.