fs.techniques: Implementation of Feature Ranking Techniques

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Implementation of feature ranking techniques.

Usage

1
2
3
4
5
6
7
8
9
  fs.anova(x,y,...)
  fs.auc(x,y)
  fs.bw(x,y)
  fs.kruskal(x,y,...)
  fs.mi(x,y)
  fs.relief(x,y)
  fs.rf(x,y,...)
  fs.snr(x,y)
  fs.welch(x,y,...)

Arguments

x

A data frame or matrix of data set.

y

A factor or vector of class.

...

Optional arguments to be passed to the feature ranking method.

Details

Several techniques are implemented in the current packages:

fs.anova:

Wrapper for function oneway.test. Performs an analysis of variance to test whether means from normal distributions are identical. It assumes that group variances are not necessarily equal. The F value is used to compute feature ranks - Two and multiple class problems both allowed.

fs.auc:

Compute the area under the simple ROC curve (x axis: false positive, y-axis: true positive rate) for each individual feature. The actual value of the AUC (if class 1 > class 2) or its complement (if class 1 < class 2) is used to get the feature ranking - Two class problems only.

fs.bw:

Compute the ratio of between-group to within-group sums of squares for each feature without assuming any particular data distributions - Two and multiple class problems both allowed.

fs.kruskal:

Wrapper for function kruskal.test - Non parametric alternative that handles two and multiple class problems.

fs.mi:

Compute the mutual information between the two classes - Two class problems only.

fs.relief:

Implementation of the RELIEF algorithm to calculate relevance scores in a multivariate fashion - Two and multiple class problems both allowed.

fs.rf:

Wrapper for randomForest function to compute importance scores in a multivariate fashion. The mean decrease in accuracy is used to calculate feature scores. Further arguments related to the random forests algorithm can also be passed - Two and multiple class problems both allowed.

fs.snr:

Compute the signal to noise ratio for each feature. The absolute value of the SNR is reported and used for accessing feature ranks - Two class problems only.

fs.welch:

Performs a univariate t-test to test whether group means from normal distributions are identical assuming that group variances may not be necessarily equal. The absolute value of the t-test statistics is returned and used to compute feature ranks - Two classes problems only.

Value

A list with components:

fs.rank

A vector of feature ranks.

fs.order

A vector of feature ids in decreasing order of saliency.

stats

A vector of the original statistic/quantity describing feature saliency.

pval

A vector of p values if calculated by the feature ranking method.

Author(s)

David Enot [email protected]uk and Wanchang Lin [email protected].

References

Dudoit, S., Fridlyand, J. and Speed, T.P. (2002). Comparison of discrimination methods for classification of tumors using gene expression data. Journal of the American Statistical Association. Vol.97, No.457, 77-87.

Kira, K. and Rendel, L. (1992). The Feature Selection Problem: Traditional Methods and a new algorithm. Proc. Tenth National Conference on Artificial Intelligence, MIT Press, 129 - 134.

Kononenko, I., Simes, E., and Robnik-Sikonja, M. (1997). Overcoming the myopia of induction learning algorithms with RELIEFF. Applied Intelligence, Vol.7, 1, 39-55.

Jeffery, I. B., Higgins,D. G. and Culhane,A. C. (2006). Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics, 7:359.

Chen, D.,Liu, Z., Ma, X. and Hua,D. (2005). Selecting Genes by Test Statistics. Journal of Biomedicine and Biotechnology. 2005:2, 132 - 138.

Golub, T. R., et al., (1999). Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 286:531-537.

See Also

oneway.test, kruskal.test, randomForest, feat.rank.re.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
## prepare data set
data(abr1)
y   <- factor(abr1$fact$class)
x <- preproc(abr1$pos , y=y, method=c("log10","TICnorm"),add=1)[,110:500]  
## Only test for class 1 and 2
dat <- dat.sel(x, y, choices=c("1","2"))
mat <- dat$dat[[1]]
cl <- dat$cl[[1]]

## apply SNR method for feature ranking
res <- fs.snr(mat,cl)
names(res)


## Template R function for a user defined feature ranking function, 
## which can be used in re-sampling based feature selection 
## function: feat.rank.re.
fs.custom <- function(x, y)
{
### -------- user defined feature selection method goes here ----------
## As an example, generate random importance score
  stats        <- abs(rnorm(ncol(x)))
  names(stats) <- names(x)
### --------------------------------------------------------------------

  ### Generate rank and order
  ### Here the importance score is in decreasing order
  fs.rank <- rank(-stats, na.last = TRUE, ties.method = "random")
  fs.order <- order(fs.rank, na.last = TRUE)
  names(fs.rank) <- names(stats)
  nam <- names(stats[fs.order])
  ### return results
  list(fs.rank = fs.rank, fs.order = fs.order, stats = stats)
}

## apply fs.custom for feature ranking
res <- fs.custom(mat,cl)
names(res)

wilsontom/FIEmspro documentation built on Feb. 19, 2018, 9:03 a.m.