Description Usage Arguments Details Value Author(s) References See Also Examples

Implementation of feature ranking techniques.

1 2 3 4 5 6 7 8 9 |

`x` |
A data frame or matrix of data set. |

`y` |
A factor or vector of class. |

`...` |
Optional arguments to be passed to the feature ranking method. |

Several techniques are implemented in the current packages:

**fs.anova**:Wrapper for function

`oneway.test`

. Performs an analysis of variance to test whether means from normal distributions are identical. It assumes that group variances are not necessarily equal. The F value is used to compute feature ranks - Two and multiple class problems both allowed.**fs.auc**:Compute the area under the simple ROC curve (x axis: false positive, y-axis: true positive rate) for each individual feature. The actual value of the AUC (if class 1 > class 2) or its complement (if class 1 < class 2) is used to get the feature ranking - Two class problems only.

**fs.bw**:Compute the ratio of between-group to within-group sums of squares for each feature without assuming any particular data distributions - Two and multiple class problems both allowed.

**fs.kruskal**:Wrapper for function

`kruskal.test`

- Non parametric alternative that handles two and multiple class problems.**fs.mi**:Compute the mutual information between the two classes - Two class problems only.

**fs.relief**:Implementation of the RELIEF algorithm to calculate relevance scores in a multivariate fashion - Two and multiple class problems both allowed.

**fs.rf**:Wrapper for randomForest function to compute importance scores in a multivariate fashion. The mean decrease in accuracy is used to calculate feature scores. Further arguments related to the random forests algorithm can also be passed - Two and multiple class problems both allowed.

**fs.snr**:Compute the signal to noise ratio for each feature. The absolute value of the SNR is reported and used for accessing feature ranks - Two class problems only.

**fs.welch**:Performs a univariate t-test to test whether group means from normal distributions are identical assuming that group variances may not be necessarily equal. The absolute value of the t-test statistics is returned and used to compute feature ranks - Two classes problems only.

A list with components:

`fs.rank` |
A vector of feature ranks. |

`fs.order` |
A vector of feature ids in decreasing order of saliency. |

`stats` |
A vector of the original statistic/quantity describing feature saliency. |

`pval` |
A vector of p values if calculated by the feature ranking method. |

David Enot [email protected] and Wanchang Lin [email protected].

Dudoit, S., Fridlyand, J. and Speed, T.P. (2002). Comparison of discrimination methods
for classification of tumors using gene expression data.
*Journal of the American Statistical Association*. Vol.97, No.457, 77-87.

Kira, K. and Rendel, L. (1992).
The Feature Selection Problem: Traditional Methods and a new algorithm.
*Proc. Tenth National Conference on Artificial Intelligence*, MIT Press,
129 - 134.

Kononenko, I., Simes, E., and Robnik-Sikonja, M. (1997).
Overcoming the myopia of induction learning algorithms with RELIEFF.
*Applied Intelligence*, Vol.7, 1, 39-55.

Jeffery, I. B., Higgins,D. G. and Culhane,A. C. (2006). Comparison and
evaluation of methods for generating differentially expressed gene lists
from microarray data. *BMC Bioinformatics*, 7:359.

Chen, D.,Liu, Z., Ma, X. and Hua,D. (2005). Selecting Genes by Test Statistics.
*Journal of Biomedicine and Biotechnology*. 2005:2, 132 - 138.

Golub, T. R., et al., (1999). Molecular classification of cancer: class
discovery and class prediction by gene expression monitoring.
*Science*, 286:531-537.

`oneway.test`

, `kruskal.test`

, `randomForest`

,
`feat.rank.re`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | ```
## prepare data set
data(abr1)
y <- factor(abr1$fact$class)
x <- preproc(abr1$pos , y=y, method=c("log10","TICnorm"),add=1)[,110:500]
## Only test for class 1 and 2
dat <- dat.sel(x, y, choices=c("1","2"))
mat <- dat$dat[[1]]
cl <- dat$cl[[1]]
## apply SNR method for feature ranking
res <- fs.snr(mat,cl)
names(res)
## Template R function for a user defined feature ranking function,
## which can be used in re-sampling based feature selection
## function: feat.rank.re.
fs.custom <- function(x, y)
{
### -------- user defined feature selection method goes here ----------
## As an example, generate random importance score
stats <- abs(rnorm(ncol(x)))
names(stats) <- names(x)
### --------------------------------------------------------------------
### Generate rank and order
### Here the importance score is in decreasing order
fs.rank <- rank(-stats, na.last = TRUE, ties.method = "random")
fs.order <- order(fs.rank, na.last = TRUE)
names(fs.rank) <- names(stats)
nam <- names(stats[fs.order])
### return results
list(fs.rank = fs.rank, fs.order = fs.order, stats = stats)
}
## apply fs.custom for feature ranking
res <- fs.custom(mat,cl)
names(res)
``` |

wilsontom/FIEmspro documentation built on Feb. 19, 2018, 9:03 a.m.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.