sequenceKernel: Sequence Kernel

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Create the kernel matrix for a kernel object

Retrieve kernel parameters from the kernel object

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
seqKernelAsChar(from)

getKernelMatrix(kernel, x, y, selx, sely)

## S4 method for signature 'SpectrumKernel'
kernelParameters(object)

## S4 method for signature 'MismatchKernel'
kernelParameters(object)

## S4 method for signature 'GappyPairKernel'
kernelParameters(object)

## S4 method for signature 'MotifKernel'
kernelParameters(object)

## S4 method for signature 'SymmetricPairKernel'
kernelParameters(object)

## S4 method for signature 'SequenceKernel'
isUserDefined(object)

Arguments

from

a sequence kernel object

kernel

one kernel object of class SequenceKernel or one kernlab string kernel (see stringdot

x

one or multiple biological sequences in the form of a DNAStringSet, RNAStringSet, AAStringSet (or as BioVector)

y

one or multiple biological sequences in the form of a DNAStringSet, RNAStringSet, AAStringSet (or as BioVector); if this parameter is specified a rectangular kernel matrix with the samples in x as rows and the samples in y as columns is generated otherwise a square kernel matrix with samples in x as rows and columns is computed; default=NULL

selx

subset of indices into x; when this parameter is present the kernel matrix is generated for the specified subset of x only; default=NULL

sely

subset of indices into y; when this parameter is present the kernel matrix is generated for the specified subset of y only; default=NULL

object

a sequence kernel object

Details

Sequence Kernel

A sequence kernel is used for determination of similarity values between biological sequences based on patterns occuring in the sequences. The kernels in this package were specifically written for the biological domain. The corresponding term in the kernlab package is string kernel which is a domain independent implementation of the same functionality which often used in other domains, for example in text classification. For the sequence kernels in this package DNA-, RNA- or AA-acid sequences are used as input with a reduced character set compared to regular text.

In string kernels the actual position of a pattern in the sequence/text is irrelevant just the number of occurances of the pattern is important for the similarity consideration. The kernels provided in this package can be created in a position-independent or position-dependent way. Position dependent kernels are using the postion of patterns on the pair of sequences to determine the contribution of a pattern match to the similarity value. For details see help page for positionMetadata. As second method of specializing similarity consideration in a kernel is to use annotation information which is placed along the sequences. For details see annotationMetadata. Following kernels are available:

These kernels are provided in a position-independent variant. For all kernels except the mismatch also the position-dependent and the annotation-specific variants of the kernel are supported. In addition the spectrum and gappy pair kernel can be created as mixture kernels with the weighted degree kernel and shifted weighted degree kernel being two specific examples of such mixture kernels. The functions described below apply for any kind of kernel in this package. Retrieving kernel paramters from the kernel object

The function 'kernelParameters' retrieves the kernel parameters and returns them as list. The function 'seqKernelAsChar' converts a sequnce kernel object into a character string.

Generation of kernel matrix

The function getKernelMatrix creates a kernel matrix for the specified kernel and one or two given sets of sequences. It contains similarity values between pairs of samples. If one set of sequences is used the square kernel matrix contains pairwise similarity values for this set. For two sets of sequences the similarities are calculated between these sets resulting in a rectangular kernel matrix. The kernel matrix is always created as dense matrix of the class KernelMatrix. Alternatively the kernel matrix can also be generated via a direct function call with the kernel object. (see examples below)

Generation of explicit representation

With the function getExRep an explicit representation for a specified kernel and a given set of sequences can be generated in sparse or dense form. Applying the linear kernel to the explicit representation with the function linearKernel also generates a dense kernel matrix.

Value

getKernelMatrix: upon successful completion, the function returns a kernel matrix of class KernelMatrix which contains similarity values between pairs of the biological sequences.

kernelParameters: the kernel parameters as list

isUserDefined: boolean indicating whether kernel is user-defined or not

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

See Also

as.KernelMatrix, KernelMatrix, spectrumKernel, mismatchKernel, gappyPairKernel, motifKernel

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
## RNA- or AA-sequences can be used as well with the motif kernel
dnaseqs <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
                          "CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
                          "GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(dnaseqs) <- paste("S", 1:length(dnaseqs), sep="")

## create the kernel object with the spectrum kernel
spec <- spectrumKernel(k=3, normalized=FALSE)

## generate the kernel matrix
km <- getKernelMatrix(spec, dnaseqs)
dim(km)
km[1:5,1:5]

## alternative way to generate the kernel matrix
km <- spec(dnaseqs)
km[1:5,1:5]

## generate rectangular kernel matrix
km <- getKernelMatrix(spec, x=dnaseqs, selx=1:3, y=dnaseqs, sely=4:5)
dim(km)
km[1:3,1:2]

## generate a sparse explicit representation
er <- getExRep(dnaseqs, spec)
er[1:5, 1:8]

## generate kernel matrix from explicit representation
km <- linearKernel(er)
km[1:5,1:5]

kebabs documentation built on Nov. 8, 2020, 7:38 p.m.