Description Usage Arguments Details Value Author(s) Examples
The ML-based classification model is trained and tested with N-fold cross validation method.
1 2 | cross_validation(seed = 1, method = c("randomForest", "svm", "nnet" ),
featureMat, positives, negatives, cross = 5, cpus = 1, ...)
|
seed |
an integer number specifying a random seed for randomly partitioning dataset. |
method |
a character string specifying machine learning method. Possible values are "randomForest", "nnet" or "svm" |
featureMat |
a numeric feature matrix. |
positives |
a character vector reocrding positive samples |
negatives |
a character vector recording negative samples. |
cross |
number of fold for cross validation. |
cpus |
an integer number specifying the number of cpus to be used for parallel computing. |
... |
Further parameters used to cross validation. Same with the parameters used in the classifer function. |
In machine learning, the cross validation method has been widely used to evaluate the performance of ML-based classification models (classifiers).
For N-fold cross validation, positive and negative samples are randomly partitioned into N groups with approximately equal amount of samples, and each group is successively used for testing the performance of the ML-based classifier trained with the other N-1 groups of positive and negative samples.
For each round of cross validation, the prediction accuracy of the ML-based classifier was assessed using the receiver operating characteristic (ROC) curve analysis.The ROC curve is a two-dimensional plot of the false positive rate (FPR, x-axis) against the true positive rate (TPR, y-axis) at all possible thresholds. The value of area under the ROC curve (AUC) was used to quantitatively score the prediction accuracy of the ML-based classifer. The AUC value is ranged from 0 to 1.0, with higher AUC value indicates a better prediction accuracy of the ML-based classifer.
After N groups have been successively used as the testing set, the N sets of (FPR, TPR) pairs were imported into R package ROCR to visualize the ROC curves. The mean value of N AUCs was then computed as the overall performance of the ML-based classification model.
A list recording results from each fold cross validation including the components:
positves.train |
positive samples used to train prediction model. |
negatives.train |
negative samples used to train prediction model. |
positives.test |
positive samples used to test prediction model. |
negatives.test |
negative samples used to test prediction model. |
ml |
machine learning method. |
classifier |
prediction model constructed with the best parameters obtained from training dataset. |
positives.train.score |
scores of postive samples in training dataset predicted by classifier. |
positives.train.score |
scores of postive samples in training dataset predicted by classifier. |
positives.test.score |
scores of postive samples in testing dataset predicted by classifier. |
negatives.test.score |
scores of negative samples in testing dataset predicted by classifier. |
train.AUC |
AUC value of the ML-based classifer on training dataset. |
test.AUC |
AUC value of the ML-based classifer on testing dataset. |
Chuang Ma, Xiangfeng Wang
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | ## Not run:
##generate expression feature matrix
sampleVec1 <- c(1, 2, 3, 4, 5, 6)
sampleVec2 <- c(1, 2, 3, 4, 5, 6)
featureMat <- expFeatureMatrix( expMat1 = ControlExpMat, sampleVec1 = sampleVec1,
expMat2 = SaltExpMat, sampleVec2 = sampleVec2,
logTransformed = TRUE, base = 2,
features = c("zscore", "foldchange", "cv", "expression"))
##positive samples
positiveSamples <- as.character(sampleData$KnownSaltGenes)
##unlabeled samples
unlabelSamples <- setdiff( rownames(featureMat), positiveSamples )
idx <- sample(length(unlabelSamples))
##randomly selecting a set of unlabeled samples as negative samples
negativeSamples <- unlabelSamples[idx[1:length(positiveSamples)]]
##five-fold cross validation
seed <- randomSeed() #generate a random seed
cvRes <- cross_validation(seed = seed, method = "randomForest",
featureMat = featureMat,
positives = positiveSamples,
negatives = negativeSamples,
cross = 5, cpus = 1,
ntree = 100 ) ##parameters for random forest algorithm
##get AUC values for five rounds of cross validation
aucVec <- rep(0, 5)
for( i in 1:5 )
aucVec[i] = cvRes[[i]]$test.AUC
##average AUC values as the final performance of the ML-based classifier
mean(aucVec)
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.