Description Usage Arguments Details Value Author(s) References See Also Examples
Given a model classifier and a data set, this function performs cross-valiadtion by repeatedly splitting the data into training and testing subsets in order to estimate the performance of this kind of classifer on new data.
1 |
model |
|
data |
|
status |
|
frac |
|
nLoop |
|
verbose |
The CrossVal
package provides generic tools for performing
cross-validation on classificaiton methods in the context of
high-throughput data sets such as those produced by gene expression
microarrays. In order to use a classifier with this implementaiton of
cross-validation, you must first prepare a pair of functions (one for
learning models from training data, and one for making predictions on
test data). These functions, along with any required meta-parameters,
are used to create an object of the Modeler-class
. That
object is then passed to the CrossVal
function along
with the full training data set. The full data set is then repeatedly
split into its own training and test sets; you can specify the
fraction to be used for training and the number of iterations. The
result is a detailed look at the accuracy, sensitivity, specificity,
and positive and negative predictive value of the model, as estimated
by cross-validation.
An object of the CrossVal-class
.
Kevin R. Coombes krcoombes@mdanderson.org
See the manual page for the CrossVal-package
for a list
of related references.
See the manual page for the CrossVal-package
for a list
of classifiers that have been adapted to work with this
cross-validation mechanism.
See CrossVal-class
for a description of the slots in
the object created by this function.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | ##---- Should be DIRECTLY executable !! ----
##-- ==> Define data, use random,
##-- or do help(data=index) for the standard data sets.
## The function is currently defined as
function(model, data, status, frac, nLoop, verbose=TRUE) {
if (length(status) != ncol(data)) {
stop("The length of the status vector must match the size of the data set.")
}
temp <- balancedSplit(status, frac) # just to compute sizes
nTrain <- sum(temp)
nTest <- sum(!temp)
# allocate space to hold the results
trainOutcome <- data.frame(matrix(NA, ncol=nLoop, nrow=nTrain))
validOutcome <- data.frame(matrix(NA, ncol=nLoop, nrow=nTest))
trainPredict <- data.frame(matrix(NA, ncol=nLoop, nrow=nTrain))
validPredict <- data.frame(matrix(NA, ncol=nLoop, nrow=nTest))
extras <- list()
for (i in 1:nLoop) {
# show that we are still alive
if(verbose) print(i)
# split into training and test
tr <- balancedSplit(status, frac)
# record the true status for each split so we can get
# statistics on the performance later
trainOutcome[,i] <- status[tr]
validOutcome[,i] <- status[!tr]
# train the model
thisModel <- learn(model, data[,tr], status[tr])
# record anything interesting about the model
extras[[i]] <- thisModel@extras
# save the predictions on the training set
trainPredict[,i] <- predict(thisModel)
# now make the predictions using the logistic model
validPredict[,i] <- predict(thisModel, newdata=data[, !tr])
}
new("CrossVal",
nIterations=nLoop,
trainPercent=frac,
outcome=status,
trainOutcome=trainOutcome,
validOutcome=validOutcome,
trainPredict=trainPredict,
validPredict=validPredict,
extras=extras)
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.