Description Usage Details Value Author(s) References Examples
KeBABS - An R package for kernel based analysis
of biological sequences
1 |
Package Overview
The package provides functionality for kernel based analysis of DNA-, RNA- and amino acid sequences via SVM based methods. As core functionality kebabs contains following sequence kernels: spectrum kernel, mismatch kernel, gappy pair kernel and motif kernel. Apart from an efficient implementation of position independent functionality the kernels are extended in a novel way to take the position of patterns into account for the similarity measure. Because of the flexibility of the kernel formulation other kernels like the weighted degree kernel or the shifted weighted degree kernel are included as special cases. An annotation specific variant of the kernels uses annotation information placed along the sequence together with the patterns in the sequence. The package allows generation of a kernel matrix or an explicit representation for all available kernels which can be used with methods implemented in other R packages. With focus on SVM based methods kebabs provides a framework which simplifies the usage of existing SVM implementations in kernlab, e1071 and LiblineaR. Binary and multiclass classification as well as regression tasks can be used in a unified way without having to deal with the different functions, parameters and formats of the selected SVM. As support for choosing hyperparameters the package provides cross validation, grid search and model selection functions.For easier biological interpretation of the results the package computes feature weights for all SVMs and prediction profiles, which show the contribution of individual sequence positions to the prediction result and give an indication about the relevance of sequence sections for the learning result and the underlying biological functions.
see above
Johannes Palme <kebabs@bioinf.jku.at>
http://www.bioinf.jku.at/software/kebabs
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | ## load package provided sequence dataset
data(TFBS)
## display sequences
enhancerFB
## display part of label vector
head(yFB, 20)
## display no of samples of positive and negative class
table(yFB)
## split dataset into training and test samples
train <- sample(1:length(enhancerFB), 0.7*length(enhancerFB))
test <- c(1:length(enhancerFB))[-train]
## create the kernel object for the normalized spectrum kernel
spec <- spectrumKernel(k=5)
## train model
## pass sequence subset, label subset, kernel object, the package and
## svm which should be used for training together with the SVM parameters
model <- kbsvm(x=enhancerFB[train], y=yFB[train], kernel=spec,
pkg="LiblineaR", svm="C-svc", cost=10)
## predict the test samples
pred <- predict(model, enhancerFB, sel=test)
## evaluate the prediction result
evaluatePrediction(pred, yFB[test], allLabels=unique(yFB))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.