fs.absT: support for feature selection in cross-validation

Description Usage Arguments Details Value Note Author(s) See Also Examples

Description

support for feature selection in cross-validation

Usage

1
2
3

Arguments

N

number of features to retain; features are ordered by descending value of abs(two-sample t stat.), and the top N are used.

p

cumulative probability (in (0,1)) in the distribution of absolute t statistics above which we retain features

Details

This function returns a function that will be used as a parameter to xvalSpec in applications of MLearn.

Value

a function is returned, that will itself return a formula consisting of the selected features for application of MLearn.

Note

The functions fs.absT and fs.probT are two examples of approaches to embedded feature selection that make sense for two-sample prediction problems. For selection based on linear models or other discrimination measures, you will need to create your own selection helper, following the code in these functions as examples.

fs.topVariance performs non-specific feature selection based on the variance. Argument p is the variance percentile beneath which features are discarded.

Author(s)

VJ Carey <stvjc@channing.harvard.edu>

See Also

MLearn

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library("MASS")
data(crabs)
# we will demonstrate this procedure with the crabs data.
# first, create the closure to pick 3 features
demFS = fs.absT(3)
# run it on the entire dataset with features excluding sex
demFS(sp~.-sex, crabs)
# emulate cross-validation by excluding last 50 records
demFS(sp~.-sex, crabs[1:150,])
# emulate cross-validation by excluding first 50 records -- different features retained
demFS(sp~.-sex, crabs[51:200,])

lgatto/MLInterfaces documentation built on May 21, 2019, 5:12 a.m.