fs.techniques: Implementation of Feature Ranking Techniques
In wilsontom/FIEmspro: Flow In-jection Electrospray Mass Spectrometry Processing: \\ data processing, classification modelling and variable selection in metabolite fingerprinting

Description Usage Arguments Details Value Author(s) References See Also Examples

Implementation of feature ranking techniques.

  fs.anova(x,y,...)
  fs.auc(x,y)
  fs.bw(x,y)
  fs.kruskal(x,y,...)
  fs.mi(x,y)
  fs.relief(x,y)
  fs.rf(x,y,...)
  fs.snr(x,y)
  fs.welch(x,y,...)

`x`	A data frame or matrix of data set.
`y`	A factor or vector of class.
`...`	Optional arguments to be passed to the feature ranking method.

Several techniques are implemented in the current packages:

fs.anova:: Wrapper for function oneway.test. Performs an analysis of variance to test whether means from normal distributions are identical. It assumes that group variances are not necessarily equal. The F value is used to compute feature ranks - Two and multiple class problems both allowed.
fs.auc:: Compute the area under the simple ROC curve (x axis: false positive, y-axis: true positive rate) for each individual feature. The actual value of the AUC (if class 1 > class 2) or its complement (if class 1 < class 2) is used to get the feature ranking - Two class problems only.
fs.bw:: Compute the ratio of between-group to within-group sums of squares for each feature without assuming any particular data distributions - Two and multiple class problems both allowed.
fs.kruskal:: Wrapper for function kruskal.test - Non parametric alternative that handles two and multiple class problems.
fs.mi:: Compute the mutual information between the two classes - Two class problems only.
fs.relief:: Implementation of the RELIEF algorithm to calculate relevance scores in a multivariate fashion - Two and multiple class problems both allowed.
fs.rf:: Wrapper for randomForest function to compute importance scores in a multivariate fashion. The mean decrease in accuracy is used to calculate feature scores. Further arguments related to the random forests algorithm can also be passed - Two and multiple class problems both allowed.
fs.snr:: Compute the signal to noise ratio for each feature. The absolute value of the SNR is reported and used for accessing feature ranks - Two class problems only.
fs.welch:: Performs a univariate t-test to test whether group means from normal distributions are identical assuming that group variances may not be necessarily equal. The absolute value of the t-test statistics is returned and used to compute feature ranks - Two classes problems only.

A list with components:

`fs.rank`	A vector of feature ranks.
`fs.order`	A vector of feature ids in decreasing order of saliency.
`stats`	A vector of the original statistic/quantity describing feature saliency.
`pval`	A vector of p values if calculated by the feature ranking method.

David Enot dle@aber.ac.uk and Wanchang Lin wll@aber.ac.uk.

Dudoit, S., Fridlyand, J. and Speed, T.P. (2002). Comparison of discrimination methods for classification of tumors using gene expression data. Journal of the American Statistical Association. Vol.97, No.457, 77-87.

Kira, K. and Rendel, L. (1992). The Feature Selection Problem: Traditional Methods and a new algorithm. Proc. Tenth National Conference on Artificial Intelligence, MIT Press, 129 - 134.

Kononenko, I., Simes, E., and Robnik-Sikonja, M. (1997). Overcoming the myopia of induction learning algorithms with RELIEFF. Applied Intelligence, Vol.7, 1, 39-55.

Jeffery, I. B., Higgins,D. G. and Culhane,A. C. (2006). Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics, 7:359.

Chen, D.,Liu, Z., Ma, X. and Hua,D. (2005). Selecting Genes by Test Statistics. Journal of Biomedicine and Biotechnology. 2005:2, 132 - 138.

Golub, T. R., et al., (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537.

oneway.test, kruskal.test, randomForest, feat.rank.re.

## prepare data set
data(abr1)
y   <- factor(abr1$fact$class)
x <- preproc(abr1$pos , y=y, method=c("log10","TICnorm"),add=1)[,110:500]  
## Only test for class 1 and 2
dat <- dat.sel(x, y, choices=c("1","2"))
mat <- dat$dat[[1]]
cl <- dat$cl[[1]]

## apply SNR method for feature ranking
res <- fs.snr(mat,cl)
names(res)


## Template R function for a user defined feature ranking function, 
## which can be used in re-sampling based feature selection 
## function: feat.rank.re.
fs.custom <- function(x, y)
{
### -------- user defined feature selection method goes here ----------
## As an example, generate random importance score
  stats        <- abs(rnorm(ncol(x)))
  names(stats) <- names(x)
### --------------------------------------------------------------------

  ### Generate rank and order
  ### Here the importance score is in decreasing order
  fs.rank <- rank(-stats, na.last = TRUE, ties.method = "random")
  fs.order <- order(fs.rank, na.last = TRUE)
  names(fs.rank) <- names(stats)
  nam <- names(stats[fs.order])
  ### return results
  list(fs.rank = fs.rank, fs.order = fs.order, stats = stats)
}

## apply fs.custom for feature ranking
res <- fs.custom(mat,cl)
names(res)