Fitting Classification Models to Sequencing Data
Description
This function fits classification algorithms to sequencing data and measures model performances using various statistics
Usage
1 2 
Arguments
data 

method 
There are four methods available to perform classification: 
normalize 
Normalization of count data for classification. 
deseqTransform 
Transformation method applied after normalization. 
cv 
Number of crossvalidation folds. 
rpt 
Number of complete sets of folds for computation. 
B 
Number of bootstrap samples for 
ref 
User defined reference class. Default is 
... 
Optional arguments for 
Details
In RNASeq studies, normalization is used to adjust betweensample differences for further analysis. In this package, "deseq" and "tmm" normalization methods are available. "deseq" estimates the size factors by dividing each sample by the geometric means of the transcript counts. "tmm" trims the lower and upper side of the data by log fold changes to minimize the logfold changes between the samples and by absolute intensity. After normalization, it is useful to transform the data for classification. MLSeq
package has "voomCPM" and "vst" transformation methods. "voomCPM" transformation applies a logarithmic transformation (logcpm) to normalized count data. Second transformation method is the "vst" transformation and this approach uses an error modeling and the concept of variance stabilizing transformations to estimate the meandispersion relationship of data.
For model validation, kfold crossvalidation ("cv" option in MLSeq
package) is a widely used technique. Using this technique, training data is randomly splitted into k nonoverlapping and equally sized subsets. A classification model is trained on (k1) subsets and tested in the remaining subsets. MLSeq
package also has the repeat option as "rpt" to obtain more generalizable models. Giving a number of m repeats, cross validation concept is applied m times.
For more details, see the vignette.
Value
model 
fitted classification model 
method 
used classification method 
normalization 
used normalization method 
deseqTransform 
deseq transformation if 
confusionMat 
crosstabulation of observed and predicted classes and corresponding statistics 
ref 
reference class 
Author(s)
Gokmen Zararsiz, Dincer Goksuluk, Selcuk Korkmaz, Vahap Eldem, Izzet Parug Duru, Turgay Unver, Ahmet Ozturk
References
Kuhn M. (2008). Building predictive models in R using the caret package. Journal of Statistical Software, (http://www.jstatsoft.org/v28/i05/).
Anders S. Huber W. (2010). Differential expression analysis for sequence count data. Genome Biology, 11:R106
Witten DM. (2011). Classification and clustering of sequencing data using a poisson model. The Annals of Applied Statistics, 5(4), 2493:2518.
Charity WL. et al. (2014) Voom: precision weights unlock linear model analysis tools for RNASeq read counts, Genome Biology, 15:R29, doi:10.1186/gb2014152r29
Witten D. et al. (2010) Ultrahigh throughput sequencingbased small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls. BMC Biology, 8:58
Robinson MD, Oshlack A (2010). A scaling normalization method for differential expression analysis of RNASeq data. Genome Biology, 11:R25, doi:10.1186/gb2010113r25
See Also
predictClassify
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  data(cervical)
data = cervical[c(1:150),] # a subset of cervical data with first 150 features.
class = data.frame(condition=factor(rep(c("N","T"),c(29,29))))# defining sample classes.
n = ncol(data) # number of samples
p = nrow(data) # number of features
nTest = ceiling(n*0.2) # number of samples for test set (20% test, 80% train).
ind = sample(n,nTest,FALSE)
# train set
data.train = data[,ind]
data.train = as.matrix(data.train + 1)
classtr = data.frame(condition=class[ind,])
# train set in S4 class
data.trainS4 = DESeqDataSetFromMatrix(countData = data.train,
colData = classtr, formula(~ condition))
data.trainS4 = DESeq(data.trainS4, fitType="local")
# Classification and Regression Tree (CART) Classification
cart = classify(data = data.trainS4, method = "cart", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
cart
# Random Forest (RF) Classification
rf = classify(data = data.trainS4, method = "randomforest", normalize = "deseq", deseqTransform = "vst", cv = 5, rpt = 3, ref="T")
rf
