mismatchKernel: Mismatch Kernel

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/mismatch.R

Description

Create a mismatch kernel object and the kernel matrix

Usage

1
2
3
4
5
mismatchKernel(k = 3, m = 1, r = 1, normalized = TRUE, exact = TRUE,
  ignoreLower = TRUE, presence = FALSE)

## S4 method for signature 'MismatchKernel'
getFeatureSpaceDimension(kernel, x)

Arguments

k

length of the substrings also called kmers; this parameter defines the size of the feature space, i.e. the total number of features considered in this kernel is |A|^k, with |A| as the size of the alphabet (4 for DNA and RNA sequences and 21 for amino acid sequences). Default=3

m

number of maximal mismatch per kmer. The allowed value range is between 1 and k-1. The processing effort for this kernel is highly dependent on the value of m and only small values will allow efficient processing. Default=1

r

exponent which must be > 0 (see details section in spectrumKernel). Default=1

normalized

a kernel matrix or explicit representation generated with this kernel will be normalized(details see below). Default=TRUE

exact

use exact character set for the evaluation (details see below). Default=TRUE

ignoreLower

ignore lower case characters in the sequence. If the parameter is not set lower case characters are treated like uppercase. Default=TRUE

presence

if this parameter is set only the presence of a kmers will be considered, otherwise the number of occurances of the kmer is used. Default=FALSE

kernel

a sequence kernel object

x

one or multiple biological sequences in the form of a DNAStringSet, RNAStringSet, AAStringSet (or as BioVector)

Details

Creation of kernel object

The function 'mismatchKernel' creates a kernel object for the mismatch kernel. This kernel object can then be used with a set of DNA-, RNA- or AA-sequences to generate a kernel matrix or an explicit representation for this kernel. For values different from 1 (=default value) parameter r leads to a transfomation of similarities by taking each element of the similarity matrix to the power of r. If normalized=TRUE, the feature vectors are scaled to the unit sphere before computing the similarity value for the kernel matrix. For two samples with the feature vectors x and y the similarity is computed as:

s=(x^T y)/(|x| |y|)

For an explicit representation generated with the feature map of a normalized kernel the rows are normalized by dividing them through their Euclidean norm. For parameter exact=TRUE the sequence characters are interpreted according to an exact character set. If the flag is not set ambigous characters from the IUPAC characterset are also evaluated. The annotation specific variant (for details see positionMetadata) and the position dependent variant (for details see annotationMetadata) are not available for this kernel.

Creation of kernel matrix

The kernel matrix is created with the function getKernelMatrix or via a direct call with the kernel object as shown in the examples below.

Value

mismatchKernel: upon successful completion, the function returns a kernel object of class MismatchKernel.

of getDimFeatureSpace: dimension of the feature space as numeric value

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

(Leslie, 2002) – C. Leslie, E. Eskin, J. Weston and W.S. Noble. Mismatch String Kernels for SVM Protein Classification.

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

See Also

kernelParameters, getKernelMatrix, getExRep, spectrumKernel, gappyPairKernel, motifKernel, MismatchKernel

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
## RNA- or AA-sequences can be used as well with the mismatch kernel
dnaseqs <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
                          "CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
                          "GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(dnaseqs) <- paste("S", 1:length(dnaseqs), sep="")

## create the kernel object with one mismatch per kmer
mm <- mismatchKernel(k=2, m=1, normalized=FALSE)
## show details of kernel object
mm

## generate the kernel matrix with the kernel object
km <- mm(dnaseqs)
dim(km)
km[1:5, 1:5]

## alternative way to generate the kernel matrix
km <- getKernelMatrix(mm, dnaseqs)
km[1:5,1:5]

## Not run: 
## plot heatmap of the kernel matrix
heatmap(km, symm=TRUE)

## End(Not run)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

Mismatch Kernel: k=2, m=1, normalized=FALSE
[1] 5 5
An object of class "KernelMatrix"
     S1   S2   S3   S4   S5
S1 6978 6198 6726 6361 6198
S2 6198 6802 6191 6501 6802
S3 6726 6191 6898 6493 6191
S4 6361 6501 6493 6608 6501
S5 6198 6802 6191 6501 6802
An object of class "KernelMatrix"
     S1   S2   S3   S4   S5
S1 6978 6198 6726 6361 6198
S2 6198 6802 6191 6501 6802
S3 6726 6191 6898 6493 6191
S4 6361 6501 6493 6608 6501
S5 6198 6802 6191 6501 6802

kebabs documentation built on Nov. 8, 2020, 7:38 p.m.