CrossVal: Creating CrossValidate objects

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Given a model classifier and a data set, this function performs cross-validation by repeatedly splitting the data into training and testing subsets in order to estimate the performance of this kind of classifer on new data.

Usage

1
CrossValidate(model, data, status, frac, nLoop, prune=keepAll, verbose=TRUE)

Arguments

model

An element of the Modeler class.

data

A matrix containing the data to be used for cross-validation. As with most gene expression data, columns are the independent samples or observations and rows are the measured features.

status

A binary-valued factor with the classes to be predicted.

frac

A number between 0 and 1; the fraction of the data that should be used in each iteration to train the model.

nLoop

An integer; the number of times to split the data into training and test sets.

prune

A function that takes two inoputs, a data matrix and a factor with two levels, and rteturns a logical vector whose length equals the number of rows in the data matrix.

verbose

A logical value; should the cross-validation routine report interim progress.

Details

The CrossValidate package provides generic tools for performing cross-validation on classificaiton methods in the context of high-throughput data sets such as those produced by gene expression microarrays. In order to use a classifier with this implementaiton of cross-validation, you must first prepare a pair of functions (one for learning models from training data, and one for making predictions on test data). These functions, along with any required meta-parameters, are used to create an object of the Modeler-class. That object is then passed to the CrossValidate function along with the full training data set. The full data set is then repeatedly split into its own training and test sets; you can specify the fraction to be used for training and the number of iterations. The result is a detailed look at the accuracy, sensitivity, specificity, and positive and negative predictive value of the model, as estimated by cross-validation.

Value

An object of the CrossValidate-class.

Author(s)

Kevin R. Coombes krcoombes@mdanderson.org

References

See the manual page for the CrossValidate-class for a list of related references.

See Also

See CrossValidate-class for a description of the slots in the object created by this function.

Examples

1
2
3
4
5
6
dataset <- matrix(rnorm(50*100), nrow=50)
pseudoclass <- factor(rep(c("A", "B"), each=50))
model <- modelerCCP # obviously, other models can be used
numTimes <- 10 # and more is probably better
cv <- CrossValidate(model, dataset, pseudoclass, 0.5, numTimes)
summary(cv)

Example output

Loading required package: Modeler
Loading required package: ClassDiscovery
Loading required package: cluster
Loading required package: oompaBase
Loading required package: ClassComparison
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10
---------------
Cross-validation was performed using 50 percent of the data for
training. The data set was randomly split into training and testing
sets 10 times.

Training Accuracy:
         sens  spec   acc       ppv       npv
Min.    0.800 0.800 0.800 0.8000000 0.8000000
1st Qu. 0.800 0.850 0.860 0.8562802 0.8214286
Median  0.860 0.900 0.870 0.8968531 0.8660714
Mean    0.852 0.892 0.872 0.8898102 0.8593029
3rd Qu. 0.880 0.920 0.900 0.9147727 0.8878205
Max.    0.920 0.960 0.920 0.9565217 0.9166667

Validation Accuracy:
         sens  spec  acc       ppv       npv
Min.    0.400 0.320 0.36 0.3703704 0.3478261
1st Qu. 0.410 0.440 0.50 0.5000000 0.5000000
Median  0.540 0.480 0.53 0.5284900 0.5264946
Mean    0.512 0.508 0.51 0.5136091 0.5085646
3rd Qu. 0.560 0.610 0.54 0.5537634 0.5390625
Max.    0.680 0.680 0.58 0.5909091 0.5789474

CrossValidate documentation built on May 7, 2019, 1:02 a.m.