motifKernel: Motif Kernel

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/motif.R

Description

Create a motif kernel object and the kernel matrix

Usage

1
2
3
4
5
motifKernel(motifs, r = 1, annSpec = FALSE, distWeight = numeric(0),
  normalized = TRUE, exact = TRUE, ignoreLower = TRUE, presence = FALSE)

## S4 method for signature 'MotifKernel'
getFeatureSpaceDimension(kernel, x)

Arguments

motifs

a set of motif patterns specified as character vector. The order in which the patterns are passed for creation of the kernel object also determines the order of the features in the explicit representation. Lowercase characters in motifs are always converted to uppercase. For details concerning the definition of motif patterns see below and in the examples section.

r

exponent which must be > 0 (see details section in spectrumKernel). Default=1

annSpec

boolean that indicates whether sequence annotation should be taken into account (details see on help page for annotationMetadata). Default=FALSE

distWeight

a numeric distance weight vector or a distance weighting function (details see on help page for gaussWeight). Default=NULL

normalized

generated data from this kernel will be normalized (details see below). Default=TRUE

exact

use exact character set for the evaluation (details see below). Default=TRUE

ignoreLower

ignore lower case characters in the sequence. If the parameter is not set lower case characters are treated like uppercase. default=TRUE

presence

if this parameter is set only the presence of a motif will be considered, otherwise the number of occurances of the motif is used; Default=FALSE

kernel

a sequence kernel object

x

one or multiple biological sequences in the form of a DNAStringSet, RNAStringSet, AAStringSet (or as BioVector)

Details

Creation of kernel object

The function 'motif' creates a kernel object for the motif kernel for a set of given DNA-, RNA- or AA-motifs. This kernel object can then be used to generate a kernel matrix or an explicit representation for this kernel. The individual patterns in the set of motifs are built similar to regular expressions through concatination of following elements in arbitrary order:

For values different from 1 (=default value) parameter r leads to a transfomation of similarities by taking each element of the similarity matrix to the power of r. For the annotation specific variant of this kernel see annotationMetadata, for the distance weighted variants see positionMetadata. If normalized=TRUE, the feature vectors are scaled to the unit sphere before computing the similarity value for the kernel matrix. For two samples with the feature vectors x and y the similarity is computed as:

s=(x^T y)/(|x| |y|)

For an explicit representation generated with the feature map of a normalized kernel the rows are normalized by dividing them through their Euclidean norm. For parameter exact=TRUE the sequence characters are interpreted according to an exact character set. If the flag is not set ambigous characters from the IUPAC characterset are also evaluated.

The annotation specific variant (for details see annotationMetadata) and the position dependent variants (for details see positionMetadata) either in the form of a position specific or a distance weighted kernel are supported for the motif kernel. The generation of an explicit representation is not possible for the position dependent variants of this kernel.

Hint: For a normalized motif kernel with a feature subset of a normalized spectrum kernel the explicit representation will not be identical to the subset of an explicit representation for the spectrum kernel because the motif kernel is not aware of the other kmers which are used in the spectrum kernel additionally for normalization.

Creation of kernel matrix

The kernel matrix is created with the function getKernelMatrix or via a direct call with the kernel object as shown in the examples below.

Value

motif: upon successful completion, the function returns a kernel object of class MotifKernel.

of getDimFeatureSpace: dimension of the feature space as numeric value

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

(Ben-Hur, 2003) – A. Ben-Hur, and D. Brutlag. Remote homology detection: a motif based approach.

(Bodenhofer, 2009) – U. Bodenhofer, K. Schwarzbauer, M. Ionescu and S. Hochreiter. Modelling position specificity in sequence kernels by fuzzy equivalence relations.

(Mahrenholz, 2011) – C.C. Mahrenholz, I.G. Abfalter, U. Bodenhofer, R. Volkmer and S. Hochreiter. Complex networks govern coiled-coil oligomerizations - predicting and profiling by means of a machine learning approach.

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

See Also

kernelParameters-method, getKernelMatrix, getExRep, spectrumKernel, mismatchKernel, gappyPairKernel

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
## RNA- or AA-sequences can be used as well with the motif kernel
dnaseqs <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
                          "CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
                          "GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
                          "ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(dnaseqs) <- paste("S", 1:length(dnaseqs), sep="")

## create the kernel object with the motif patterns
mot <- motifKernel(c("A[CG]T","C.G","G[^A][AT]"), normalized=FALSE)
## show details of kernel object
mot

## generate the kernel matrix with the kernel object
km <- mot(dnaseqs)
dim(km)
km

## alternative way to generate the kernel matrix
km <- getKernelMatrix(mot, dnaseqs)

## Not run: 
## plot heatmap of the kernel matrix
heatmap(km, symm=TRUE)

## generate rectangular kernel matrix
km <- mot(x=dnaseqs, selx=1:3, y=dnaseqs, sely=4:5)
dim(km)
km

## End(Not run)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

Motif Kernel:

Motifs:
A[CG]T 
C.G 
G[^A][AT] 

Kernel Parameters:
normalized=FALSE
[1] 5 5
An object of class "KernelMatrix"
   S1 S2 S3 S4 S5
S1 46 24 48 35 24
S2 24 13 27 18 13
S3 48 27 61 38 27
S4 35 18 38 29 18
S5 24 13 27 18 13
[1] 3 2
An object of class "KernelMatrix"
   S4 S5
S1 35 24
S2 18 13
S3 38 27
Warning message:
system call failed: Cannot allocate memory 

kebabs documentation built on Nov. 8, 2020, 7:38 p.m.