Description Usage Arguments Details Value Author(s) References See Also Examples
Implementation of feature ranking techniques.
1 2 3 4 5 6 7 8 9 |
x |
A data frame or matrix of data set. |
y |
A factor or vector of class. |
... |
Optional arguments to be passed to the feature ranking method. |
Several techniques are implemented in the current packages:
Wrapper for function oneway.test
.
Performs an analysis
of variance to test whether means from normal distributions are
identical. It assumes that group variances are not necessarily equal.
The F value is used to compute feature ranks
- Two and multiple class problems both allowed.
Compute the area under the simple ROC curve (x axis: false positive, y-axis: true positive rate) for each individual feature. The actual value of the AUC (if class 1 > class 2) or its complement (if class 1 < class 2) is used to get the feature ranking - Two class problems only.
Compute the ratio of between-group to within-group sums of squares for each feature without assuming any particular data distributions - Two and multiple class problems both allowed.
Wrapper for function kruskal.test
- Non parametric
alternative that handles two and multiple class problems.
Compute the mutual information between the two classes - Two class problems only.
Implementation of the RELIEF algorithm to calculate relevance scores in a multivariate fashion - Two and multiple class problems both allowed.
Wrapper for randomForest function to compute importance scores in a multivariate fashion. The mean decrease in accuracy is used to calculate feature scores. Further arguments related to the random forests algorithm can also be passed - Two and multiple class problems both allowed.
Compute the signal to noise ratio for each feature. The absolute value of the SNR is reported and used for accessing feature ranks - Two class problems only.
Performs a univariate t-test to test whether group means from normal distributions are identical assuming that group variances may not be necessarily equal. The absolute value of the t-test statistics is returned and used to compute feature ranks - Two classes problems only.
A list with components:
fs.rank |
A vector of feature ranks. |
fs.order |
A vector of feature ids in decreasing order of saliency. |
stats |
A vector of the original statistic/quantity describing feature saliency. |
pval |
A vector of p values if calculated by the feature ranking method. |
David Enot dle@aber.ac.uk and Wanchang Lin wll@aber.ac.uk.
Dudoit, S., Fridlyand, J. and Speed, T.P. (2002). Comparison of discrimination methods for classification of tumors using gene expression data. Journal of the American Statistical Association. Vol.97, No.457, 77-87.
Kira, K. and Rendel, L. (1992). The Feature Selection Problem: Traditional Methods and a new algorithm. Proc. Tenth National Conference on Artificial Intelligence, MIT Press, 129 - 134.
Kononenko, I., Simes, E., and Robnik-Sikonja, M. (1997). Overcoming the myopia of induction learning algorithms with RELIEFF. Applied Intelligence, Vol.7, 1, 39-55.
Jeffery, I. B., Higgins,D. G. and Culhane,A. C. (2006). Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics, 7:359.
Chen, D.,Liu, Z., Ma, X. and Hua,D. (2005). Selecting Genes by Test Statistics. Journal of Biomedicine and Biotechnology. 2005:2, 132 - 138.
Golub, T. R., et al., (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537.
oneway.test
, kruskal.test
, randomForest
,
feat.rank.re
.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ## prepare data set
data(abr1)
y <- factor(abr1$fact$class)
x <- preproc(abr1$pos , y=y, method=c("log10","TICnorm"),add=1)[,110:500]
## Only test for class 1 and 2
dat <- dat.sel(x, y, choices=c("1","2"))
mat <- dat$dat[[1]]
cl <- dat$cl[[1]]
## apply SNR method for feature ranking
res <- fs.snr(mat,cl)
names(res)
## Template R function for a user defined feature ranking function,
## which can be used in re-sampling based feature selection
## function: feat.rank.re.
fs.custom <- function(x, y)
{
### -------- user defined feature selection method goes here ----------
## As an example, generate random importance score
stats <- abs(rnorm(ncol(x)))
names(stats) <- names(x)
### --------------------------------------------------------------------
### Generate rank and order
### Here the importance score is in decreasing order
fs.rank <- rank(-stats, na.last = TRUE, ties.method = "random")
fs.order <- order(fs.rank, na.last = TRUE)
names(fs.rank) <- names(stats)
nam <- names(stats[fs.order])
### return results
list(fs.rank = fs.rank, fs.order = fs.order, stats = stats)
}
## apply fs.custom for feature ranking
res <- fs.custom(mat,cl)
names(res)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.