binda.ranking: Binary Discriminant Analysis: Variable Ranking
In binda: Multi-Class Discriminant Analysis using Binary Predictors

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/binda.ranking.R

binda.ranking determines a ranking of predictors by computing corresponding t-scores between the group means and the pooled mean.

plot.binda.ranking provides a graphical visualization of the top ranking variables

1
2
3

binda.ranking(Xtrain, L, lambda.freqs, verbose=TRUE)
## S3 method for class 'binda.ranking'
plot(x, top=40, arrow.col="blue", zeroaxis.col="red", ylab="Variables", main, ...)

`Xtrain`	A matrix containing the training data set. Note that the rows correspond to observations and the columns to variables.
`L`	A factor with the class labels of the training samples.
`lambda.freqs`	Shrinkage intensity for the class frequencies. If not specified it is estimated from the data. `lambda.freqs=0` implies no shrinkage (i.e. empirical frequencies) and `lambda.freqs=1` complete shrinkage (i.e. uniform frequencies).
`verbose`	Print out some info while computing.
`x`	A "binda.ranking" object – this is produced by the binda.ranking() function.
`top`	The number of top-ranking variables shown in the plot (default: 40).
`arrow.col`	Color of the arrows in the plot (default is `"blue"`).
`zeroaxis.col`	Color for the center zero axis (default is `"red"`).
`ylab`	Label written next to feature list (default is `"Variables"`).
`main`	Main title (if missing, `"The", top, "Top Ranking Variables"` is used).
`...`	Other options passed on to generic plot().

The overall ranking of a feature is determined by computing a weighted sum of the squared t-scores. This is approximately equivalent to the mutual information between the response and each variable. The same criterion is used in dichotomize. For precise details see Gibb and Strimmer (2015).

binda.ranking returns a matrix with the following columns:

`idx`	original feature number
`score`	the score determining the overall ranking of a variable
`t`	for each group and feature the t-score of the class mean versus the pooled mean

Sebastian Gibb and Korbinian Strimmer (https://strimmerlab.github.io).

Gibb, S., and K. Strimmer. 2015. Differential protein expression and peak selection in mass spectrometry data by binary discriminant analysis. Bioinformatics 31:3156-3162. <DOI:10.1093/bioinformatics/btv334>

binda, predict.binda, dichotomize.

# load "binda" library
library("binda")

# training data set with labels
Xtrain = matrix(c(1, 1, 0, 1, 0, 0,
             1, 1, 1, 1, 0, 0,
             1, 0, 0, 0, 1, 1,
             1, 0, 0, 0, 1, 1), nrow=4, byrow=TRUE)
colnames(Xtrain) = paste0("V", 1:ncol(Xtrain))
is.binaryMatrix(Xtrain) # TRUE
L = factor(c("Treatment", "Treatment", "Control", "Control") )

# ranking variables
br = binda.ranking(Xtrain, L)
br
#   idx    score t.Control t.Treatment
#V2   2 4.000000 -2.000000    2.000000
#V4   4 4.000000 -2.000000    2.000000
#V5   5 4.000000  2.000000   -2.000000
#V6   6 4.000000  2.000000   -2.000000
#V3   3 1.333333 -1.154701    1.154701
#V1   1 0.000000  0.000000    0.000000
#attr(,"class")
#[1] "binda.ranking"
#attr(,"cl.count")
#[1] 2

# show plot
plot(br)

# result: variable V1 is irrelevant for distinguishing the two groups