Perform model selection with one or multiple sequence kernels on one or multiple SVMs with one or multiple SVM parameter sets.
1 2 3 4 5
## kbsvm(...., kernel=..., pkg=..., svm=..., cost=..., ...., ## cross=0, noCross=1, ...., nestedCross=0, noNestedCross=1, ....) ## For details see below. With parameter nestedCross > 1 model selection is ## performed, the other parameters are handled identical to grid search.
for this and other parameters see
Model selection in KeBABS is based on nested k-fold cross validation (CV)
(for details see performCrossValidation). The inner cross
validation is used to determine the best parameters settings (kernel
parameters and SVM parameters) and the outer cross validation to verify
the performance on data that was not included in the selection of the
best model. The training folds of the outer CV are used to run a grid
search with the inner cross validation running for each point of the
performGridSearch to find the best performing model.
Once this model is selected the performance of this model on the held out
fold of the outer CV is determined. Different model parameters settings
could occur for different held out folds of the outer CV. This means that
model selection does not deliver a performance estimate for a single
best model but for the complete model selection process.
For each run of the outer CV KeBABS stores the selected parameter setting
for the best performing model. The default performance objective for
selecting the best parameters setting is based on minimizing the CV error
on the inner CV. With the parameter
kbsvm the balanced accuracy or the Matthews correlation
coefficient can be used instead for which the parameter setting with the
maximal value is selected. The parameter setting of the best performing
model for each fold in the outer CV can be retrieved from the KeBABS model
with the accessor
modelSelResult. The performance values on
the outer CV are retrieved from the model with the accessor
Model selection is invoked through the method
nestedCross > 1. For the parameters
pkg, svm and SVM hyperparameters the handling is identical to grid search
performGridSearch). The parameter cost in the usage
section above is just one representative of SVM hyperparameters to indicate
their relevance for model selection. The complete model selection process
can be repeated multiple times through setting
noNestedCross to the
number of desired repetitions. Nested cross validation used in model
selection is dynamically more demanding than grid search. Concerning runtime
please see the runtime hints for
model selection stores the results in the KeBABS model. They can be
retrieved with the accessor
from the outer cross validation are extracted from the model with the
Johannes Palme <[email protected]>
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## load transcription factor binding site data data(TFBS) enhancerFB ## The C-svc implementation from LiblineaR is chosen for most of the ## examples because it is the fastest SVM. With SVMs from other packages ## slightly better results could be achievable. Because of the higher ## runtime needed for nested cross validation please run the examples ## below manually. All samples of the data set are used in the examples. train <- sample(1:length(enhancerFB), length(enhancerFB)) ## model selection with single kernel object and multiple ## hyperparameter values, 5 fold inner CV and 3 fold outer CV ## create gappy pair kernel with normalization gappyK1M3 <- gappyPairKernel(k=1, m=3) ## show details of single gappy pair kernel object gappyK1M3 pkg <- "LiblineaR" svm <- "C-svc" cost <- c(50,100,150,200,250,300) model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3, pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=3, nestedCross=2, showProgress=TRUE) ## show best parameter settings modelSelResult(model) ## show model selection result which is the result of the outer CV cvResult(model) ## Not run: ## repeated model selection pkg <- "LiblineaR" svm <- "C-svc" cost <- c(50,100,150,200,250,300) model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3, pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=10, nestedCross=3, noNestedCross=3, showProgress=TRUE) ## show best parameter settings modelSelResult(model) ## show model selection result which is the result of the outer CV cvResult(model) ## plot CV result plot(cvResult(model)) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.