Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/modelSelection.R
Perform model selection with one or multiple sequence kernels on one or multiple SVMs with one or multiple SVM parameter sets.
1 2 3 4 5 | ## kbsvm(...., kernel=..., pkg=..., svm=..., cost=..., ....,
## cross=0, noCross=1, ...., nestedCross=0, noNestedCross=1, ....)
## For details see below. With parameter nestedCross > 1 model selection is
## performed, the other parameters are handled identical to grid search.
|
nestedCross |
for this and other parameters see |
Overview
Model selection in KeBABS is based on nested k-fold cross validation (CV)
(for details see performCrossValidation). The inner cross
validation is used to determine the best parameters settings (kernel
parameters and SVM parameters) and the outer cross validation to verify
the performance on data that was not included in the selection of the
best model. The training folds of the outer CV are used to run a grid
search with the inner cross validation running for each point of the
grid (see performGridSearch
to find the best performing model.
Once this model is selected the performance of this model on the held out
fold of the outer CV is determined. Different model parameters settings
could occur for different held out folds of the outer CV. This means that
model selection does not deliver a performance estimate for a single
best model but for the complete model selection process.
For each run of the outer CV KeBABS stores the selected parameter setting
for the best performing model. The default performance objective for
selecting the best parameters setting is based on minimizing the CV error
on the inner CV. With the parameter perfObjective
in
kbsvm
the balanced accuracy or the Matthews correlation
coefficient can be used instead for which the parameter setting with the
maximal value is selected. The parameter setting of the best performing
model for each fold in the outer CV can be retrieved from the KeBABS model
with the accessor modelSelResult
. The performance values on
the outer CV are retrieved from the model with the accessor
cvResult
.
Model selection is invoked through the method kbsvm
through
setting parameter nestedCross
> 1. For the parameters kernel,
pkg, svm
and SVM hyperparameters the handling is identical to grid search
(see performGridSearch
). The parameter cost in the usage
section above is just one representative of SVM hyperparameters to indicate
their relevance for model selection. The complete model selection process
can be repeated multiple times through setting noNestedCross
to the
number of desired repetitions. Nested cross validation used in model
selection is dynamically more demanding than grid search. Concerning runtime
please see the runtime hints for performGridSearch
.
model selection stores the results in the KeBABS model. They can be
retrieved with the accessor modelSelResult{KBModel}
. Results
from the outer cross validation are extracted from the model with the
accessorcvResult
.
Johannes Palme <kebabs@bioinf.jku.at>
http://www.bioinf.jku.at/software/kebabs
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
kbsvm
, performGridSearch
,
modelSelResult
,
cvResult
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | ## load transcription factor binding site data
data(TFBS)
enhancerFB
## The C-svc implementation from LiblineaR is chosen for most of the
## examples because it is the fastest SVM. With SVMs from other packages
## slightly better results could be achievable. Because of the higher
## runtime needed for nested cross validation please run the examples
## below manually. All samples of the data set are used in the examples.
train <- sample(1:length(enhancerFB), length(enhancerFB))
## model selection with single kernel object and multiple
## hyperparameter values, 5 fold inner CV and 3 fold outer CV
## create gappy pair kernel with normalization
gappyK1M3 <- gappyPairKernel(k=1, m=3)
## show details of single gappy pair kernel object
gappyK1M3
pkg <- "LiblineaR"
svm <- "C-svc"
cost <- c(50,100,150,200,250,300)
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3,
pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=3,
nestedCross=2, showProgress=TRUE)
## show best parameter settings
modelSelResult(model)
## show model selection result which is the result of the outer CV
cvResult(model)
## Not run:
## repeated model selection
pkg <- "LiblineaR"
svm <- "C-svc"
cost <- c(50,100,150,200,250,300)
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=gappyK1M3,
pkg=pkg, svm=svm, cost=cost, explicit="yes", cross=10,
nestedCross=3, noNestedCross=3, showProgress=TRUE)
## show best parameter settings
modelSelResult(model)
## show model selection result which is the result of the outer CV
cvResult(model)
## plot CV result
plot(cvResult(model))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.