njmiMatrix | R Documentation |
Calculates normalised mutual information between each feature and a joint mix of each other feature with a given feature, that is
\frac{I(X_i;X_j,Z)}{H(X_i,X_j,Z)}.
njmiMatrix(X, Z, zeroDiag = TRUE, threads = 0)
X |
Attribute table, given as a data frame with either factors (preferred), booleans, integers (treated as categorical) or reals (which undergo automatic categorisation; see below for details).
Single vector will be interpreted as a data.frame with one column.
|
Z |
Condition; should be given as a factor, but other options are accepted, as for features. |
zeroDiag |
Boolean flag, whether the diagonal should be filled with zeroes, or with degenerated scores for two identical copies of a feature. |
threads |
Number of threads to use; default value, 0, means all available to OpenMP. |
A numerical matrix with scores, with row and column names copied from X
.
The method requires input to be discrete to use empirical estimators of distribution, and, consequently, information gain or entropy. To allow smoother user experience, praznik automatically coerces non-factor vectors in inputs, which requires additional time, memory and may yield confusing results – the best practice is to convert data to factors prior to feeding them in this function. Real attributes are cut into about 10 equally-spaced bins, following the heuristic often used in literature. Precise number of cuts depends on the number of objects; namely, it is n/3, but never less than 2 and never more than 10. Integers (which technically are also numeric) are treated as categorical variables (for compatibility with similar software), so in a very different way – one should be aware that an actually numeric attribute which happens to be an integer could be coerced into a n-level categorical, which would have a perfect mutual information score and would likely become a very disruptive false positive.
njmiMatrix(iris[,-5],iris[,5])
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.