cross_validation: Cross validation method
In mlDNA: Machine Learning-based Differential Network Analysis of Transcriptome Data

Description Usage Arguments Details Value Author(s) Examples

The ML-based classification model is trained and tested with N-fold cross validation method.

1 2	cross_validation(seed = 1, method = c("randomForest", "svm", "nnet" ), featureMat, positives, negatives, cross = 5, cpus = 1, ...)

`seed`	an integer number specifying a random seed for randomly partitioning dataset.
`method`	a character string specifying machine learning method. Possible values are "randomForest", "nnet" or "svm"
`featureMat`	a numeric feature matrix.
`positives`	a character vector reocrding positive samples
`negatives`	a character vector recording negative samples.
`cross`	number of fold for cross validation.
`cpus`	an integer number specifying the number of cpus to be used for parallel computing.
`...`	Further parameters used to cross validation. Same with the parameters used in the classifer function.

In machine learning, the cross validation method has been widely used to evaluate the performance of ML-based classification models (classifiers).

For N-fold cross validation, positive and negative samples are randomly partitioned into N groups with approximately equal amount of samples, and each group is successively used for testing the performance of the ML-based classifier trained with the other N-1 groups of positive and negative samples.

For each round of cross validation, the prediction accuracy of the ML-based classifier was assessed using the receiver operating characteristic (ROC) curve analysis.The ROC curve is a two-dimensional plot of the false positive rate (FPR, x-axis) against the true positive rate (TPR, y-axis) at all possible thresholds. The value of area under the ROC curve (AUC) was used to quantitatively score the prediction accuracy of the ML-based classifer. The AUC value is ranged from 0 to 1.0, with higher AUC value indicates a better prediction accuracy of the ML-based classifer.

After N groups have been successively used as the testing set, the N sets of (FPR, TPR) pairs were imported into R package ROCR to visualize the ROC curves. The mean value of N AUCs was then computed as the overall performance of the ML-based classification model.

A list recording results from each fold cross validation including the components:

`positves.train`	positive samples used to train prediction model.
`negatives.train`	negative samples used to train prediction model.
`positives.test`	positive samples used to test prediction model.
`negatives.test`	negative samples used to test prediction model.
`ml`	machine learning method.
`classifier`	prediction model constructed with the best parameters obtained from training dataset.
`positives.train.score`	scores of postive samples in training dataset predicted by classifier.
`positives.train.score`	scores of postive samples in training dataset predicted by classifier.
`positives.test.score`	scores of postive samples in testing dataset predicted by classifier.
`negatives.test.score`	scores of negative samples in testing dataset predicted by classifier.
`train.AUC`	AUC value of the ML-based classifer on training dataset.
`test.AUC`	AUC value of the ML-based classifer on testing dataset.

Chuang Ma, Xiangfeng Wang

## Not run: 

   ##generate expression feature matrix
   sampleVec1 <- c(1, 2, 3, 4, 5, 6)
   sampleVec2 <- c(1, 2, 3, 4, 5, 6)
   featureMat <- expFeatureMatrix( expMat1 = ControlExpMat, sampleVec1 = sampleVec1, 
                                   expMat2 = SaltExpMat, sampleVec2 = sampleVec2, 
                                   logTransformed = TRUE, base = 2,
                               features = c("zscore", "foldchange", "cv", "expression"))

   ##positive samples
   positiveSamples <- as.character(sampleData$KnownSaltGenes)
   ##unlabeled samples
   unlabelSamples <- setdiff( rownames(featureMat), positiveSamples )
   idx <- sample(length(unlabelSamples))
   ##randomly selecting a set of unlabeled samples as negative samples
   negativeSamples <- unlabelSamples[idx[1:length(positiveSamples)]]

   ##five-fold cross validation
   seed <- randomSeed() #generate a random seed
   cvRes <- cross_validation(seed = seed, method = "randomForest", 
                             featureMat = featureMat, 
                             positives = positiveSamples, 
                             negatives = negativeSamples, 
                             cross = 5, cpus = 1,
                             ntree = 100 ) ##parameters for random forest algorithm

   ##get AUC values for five rounds of cross validation
   aucVec <- rep(0, 5) 
   for( i in 1:5 ) 
     aucVec[i] = cvRes[[i]]$test.AUC
  
   
   ##average AUC values as the final performance of the ML-based classifier
   mean(aucVec)

 

## End(Not run)

mlDNA documentation built on May 2, 2019, 2:15 p.m.

mlDNA index

Package overview

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mlDNA
Machine Learning-based Differential Network Analysis of Transcriptome Data

cross_validation: Cross validation method
In mlDNA: Machine Learning-based Differential Network Analysis of Transcriptome Data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to cross_validation in mlDNA...

R Package Documentation

Browse R Packages

We want your feedback!

mlDNA Machine Learning-based Differential Network Analysis of Transcriptome Data

cross_validation: Cross validation method In mlDNA: Machine Learning-based Differential Network Analysis of Transcriptome Data

Description

Usage

Arguments

Details

Value

Author(s)

Examples

Related to cross_validation in mlDNA...

R Package Documentation

Browse R Packages

We want your feedback!

mlDNA
Machine Learning-based Differential Network Analysis of Transcriptome Data

cross_validation: Cross validation method
In mlDNA: Machine Learning-based Differential Network Analysis of Transcriptome Data