pruners: Feature Selection

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Functions to create functions that perform feature selection (or at least feature reduction) using statistics that access class labels.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
keepAll(data, group)
fsTtest(fdr, ming=500)
fsModifiedFisher(q)
fsPearson(q = NULL, rho)
fsSpearman(q = NULL, rho)
fsMedSplitOddsRatio(q = NULL, OR)
fsChisquared(q = NULL, cutoff)
fsEntropy(q = 0.9, kind=c("information.gain", "gain.ratio", "symmetric.uncertainty"))
fsFisherRandomForest(q)
fsTailRank(specificity=0.9, tolerance=0.5, confidence=0.5)

Arguments

data

A matrix containng the data; columns are samples and rows are features.

group

A factor with two levels defining the sample classes.

fdr

A real number between 0 and 1 specifying the target false discovery rate (FDR).

ming

An integer specifing the minimum number of features to return; overrides the FDR.

q

A real number between 0.5 and 1 specifiying the fraction of features to discard.

rho

A real number between 0 and 1 specifying the absolute value of the correlation coefficient used to filter features.

OR

A real number specifying the desired odds ratio for filtering features.

cutoff

A real number specifiyng the targeted cutoff rate when using the statistic to filter features.

kind

The kind of information metric to use for filtering features.

specificity

See TailRankTest.

tolerance

See TailRankTest.

confidence

See TailRankTest.

Details

Following the usual conventions introduced from the world of gene expression microarrays, a typical data matrix is constructed from columns representing samples on which we want to make predictions amd rows representing the features used to construct the predictive model. In this context, we define a feature selector or pruner to be a function that accepts a data matrix and a two-level factor as its only arguments and returns a logical vector, whose length equals the number of rows in the matrix, where 'TRUE' indicates features that should be retrained. Most pruning functions belong to parametrized families. We implement this idea using a set of function-generating functions, whose arguments are the parameters that pick out the desired member of the family. The return value is an instantiation of a particular filtering function. The decison to define things this way is to be able to apply the methods in cross-validaiton (or other) loops where we want to ensure that we use the same feature selection rule each time.

We have implemented the following algorithms:

Value

The keepAll function is a "pruner"; it takes the data matrix and grouping factor as arguments, and returns a logical vector indicating which features to retain.

Each of the other nine functions described here return uses its arguments to contruct and return a pruning function, f, that has the same interface as keepAll.

Author(s)

Kevin R. Coombes <krc@silicovore.com>

See Also

See Modeler-class and Modeler for details about how to train and test models.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
set.seed(246391)
data <- matrix(rnorm(1000*36), nrow=1000, ncol=36)
data[1:50, 1:18] <- data[1:50, 1:18] + 1
status <- factor(rep(c("A", "B"), each=18))

fsel <- fsPearson(q = 0.9)
summary(fsel(data, status))
fsel <- fsPearson(rho=0.3)
summary(fsel(data, status))

fsel <- fsEntropy(kind="gain.ratio")
summary(fsel(data, status))

Modeler documentation built on May 7, 2019, 3 a.m.