Description Usage Arguments Details Value Author(s) References See Also Examples
Assign annotation metadata to sequences and create a kernel
object which evaluates annotation information
Show biological sequence together with annotation
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | showAnnotatedSeq(x, sel = 1, ann = TRUE, pos = TRUE, start = 1,
end = width(x)[sel], width = NA)
## S4 method for signature 'XStringSet'
## annotationMetadata(x, annCharset= ...) <- value
## S4 method for signature 'BioVector'
## annotationMetadata(x, annCharset= ...) <- value
## S4 replacement method for signature 'BioVector'
annotationMetadata(x, ...) <- value
## S4 method for signature 'XStringSet'
annotationMetadata(x)
## S4 method for signature 'BioVector'
annotationMetadata(x)
## S4 method for signature 'XStringSet'
annotationCharset(x)
## S4 method for signature 'BioVector'
annotationCharset(x)
|
x |
biological sequences in the form of a
|
sel |
single index into x for displaying a specific sequence. Default=1 |
ann |
show annotation information along with the sequence |
pos |
show position information |
start |
first postion to be displayed, by default the full sequence is shown |
end |
last position to be displayed or use parameter 'width' |
width |
number of positions to be displayed or use parameter 'end' |
... |
additional parameters which are passed transparently. |
value |
character vector with annotation strings with same length as the number of sequences. Each anntation string must have the same number of characters as the corresponding sequence. In addition to the characters defined in the annotation character set the character "-" can be used in the annotation strings for masking sequence parts. |
annCharset |
character string listing all characters used in annotation sorted ascending according to the C locale, up to 32 characters are possible |
Annotation information for sequences
For the annotation specific kernel additional annotation information is
added to the sequence data. The annotation for one sequence consist of a
character string with a single annotation character per position, i.e.
the annotation sequence has the same length as the sequence. The character
set used for annotation is defined user specific on XStringSet level
with up to 32 different characters. Each biological sequence needs
an associated annotation sequence assigned consisting of characters from
this character set. The evaluation of annotation information as part of
the kernel processing during generation of a kernel matrix or an explict
representation can be activated per kernel object.
Assignment of annotation information
The annotation characterset consists of a character string listing all
allowed annotation characters in alphabetical order. Any single byte ASCII
character from the decimal range between 32 and 126, except 45, is allowed.
The character '-' (ASCII dec. 45) is used for masking sequence parts which
should not be evaluated. As it has assigned this special masking function
it must not be used in annotation charactersets.
The annotation characterset is assigned to the sequence set with the
annotationMetadata
function (see below). It is stored in the
metadata list as named element annotationCharset
and can be stored
along with other metadata assigned to the sequence set. The annotation
strings for the individual sequences are represented as a character vector
and can be assigned to the XStringSet together with the assignment of the
annotation characterset as element related metadata. Element related
metadata is stored in a DataFrame and the columns of this data frame
represent the different types of metadata that can be assigned in parallel.
The column name for the sequence related annotation information is
"annotation". (see Example section for an example of annotation metadata
assignment) Annotation metadata can be assigned together with position
metadata (see positionMetadata
to a sequence set.
Annotation Specific Kernel Processing
The annotation specific kernel variant of a kernel, e.g. the spectrum kernel
appends the annotation characters corresponding to a specific kmer to this
kmer and treats the resulting pattern as one feature - the basic unit for
similarity determination. The full feature space of an annotation specific
spectrum kernel is the cartesian product of the set of all possible sequence
patterns with the set of all possible anntotions patterns. Dependent on the
number of characters in the annotation character set the feature space
increases drastically compared to the normal spectrum kernel. But through
annotation the similarity consideration between two sequences can be split
into independent parts considered separately, e.g. coding/non-coding,
exon/intron, etc... . For amino acid sequences e.g. a heptad annotation
(consisting of a usually periodic pattern of 7 characters (a to g) can be
used as annotation like in prediction of coiled coil structures. (see
reference Mahrenholz, 2011)
The flag annSpec
passed during creation of a kernel object controls
whether annotation information is evaluated by the kernel. (see functions
spectrumKernel, gappyPairKernel, motifKernel
)
In this way sequences with annotation can be evaluated annotation specific
and without annotation through using two different kernel objects. (see
examples below) The annotation specific kernel variant is available for all
kernels in this package except for the mismatch kernel.
annotationMetadata function
With this function annotation metadata can be assigned to sequences defined
as XStringSet (or BioVector). The sequence annotation strings are stored
as element related information and can be retrieved with the method
mcols
. The characters used for anntation are stored as
annotation characterset for the sequence set and can be retrieved
with the method metadata
. For the assignment of annotation
metadata to biological sequences this function should be used instead of the
lower level functions metadata and mcols. The function
annotationMetadata
performs several checks and also takes care
that other metadata or element metadata assigned to the object is kept.
Annotation metadata are deleted if the parameters annCharset
and
annotation
are set to NULL.
showAnnotatedSeq function
This function displays individual sequences aligned with the annotation
string with 50 positions per line. The two header lines show the start
postion for each bock of 10 characters.
Accessor-like methods
The method annotationMetadata<- assigns annotation metadata to a sequence
set. In the assignment also the annotation characterset must be specified.
Annotation characters which are not listed in the characterset are treated
like invalid sequence characters. They interrupt open patterns and lead
to a restart of the pattern search at this position.
annotationMetadata
: a character vector with the annotation
strings
annotationCharset
: a character vector with the annotation
Johannes Palme <kebabs@bioinf.jku.at>
http://www.bioinf.jku.at/software/kebabs
C.C. Mahrenholz, I.G. Abfalter, U. Bodenhofer, R. Volkmer and
S. Hochreiter (2011) Complex networks govern coiled coil
oligomerization - predicting and profiling by means of a machine
learning approach. Mol. Cell. Proteomics.
DOI: 10.1074/mcp.M110.004994.
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
spectrumKernel
, gappyPairKernel
,
motifKernel
, positionMetadata
,
metadata
, mcols
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | ## create a set of annotated DNA sequences
## instead of user provided sequences in XStringSet format
## for this example a set of DNA sequences is created
x <- DNAStringSet(c("AGACTTAAGGGACCTGGTCACCACGCTCGGTGAGGGGGACGGGGTGT",
"ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC",
"CAGGAATCAGCACAGGCAGGGGCACGGCATCCCAAGACATCTGGGCC",
"GGACATATACCCACCGTTACGTGTCATACAGGATAGTTCCACTGCCC",
"ATAAAGGTTGCAGACATCATGTCCTTTTTGTCCCTAATTATTTCAGC"))
names(x) <- paste("S", 1:length(x), sep="")
## define the character set used in annotation
## the masking character '-' is is not part of the character set
anncs <- "ei"
## annotation strings for each sequence as character vector
## in the third and fourth sample a part of the sequence is masked
annotStrings <- c("eeeeeeeeeeeeiiiiiiiiieeeeeeeeeeeeeeeeiiiiiiiiii",
"eeeeeeeeeiiiiiiiiiiiiiiiiiiieeeeeeeeeeeeeeeeeee",
"---------eeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiiiiiii",
"eeeeeeeeeeeeeeeeeeeeeeeiiiiiiiiiiiiiiiiiiii----",
"eeeeeeeeeeeeiiiiiiiiiiiiiiiiiiiiiiieeeeeeeeeeee")
## assign metadata to DNAString object
annotationMetadata(x, annCharset=anncs) <- annotStrings
## show annotation
annotationMetadata(x)
annotationCharset(x)
## show sequence 3 aligned with annotation string
showAnnotatedSeq(x, sel=3)
## create annotation specific spectrum kernel
speca <- spectrumKernel(k=3, annSpec=TRUE, normalized=FALSE)
## show details of kernel object
kernelParameters(speca)
## this kernel object can be now be used in a classification or regression
## task in the usual way or you can use the kernel for example to generate
## the kernel matrix for use with another learning method in another R
## package.
kma <- speca(x)
kma[1:5,1:5]
## generate a dense explicit representation for annotation-specific kernel
era <- getExRep(x, speca, sparse=FALSE)
era[1:5,1:8]
## when a standard spectrum kernel is used with annotated
## sequences the anntotation information is not evaluated
spec <- spectrumKernel(k=3, normalized=FALSE)
km <- spec(x)
km[1:5,1:5]
## finally delete annotation metadata if no longer needed
annotationMetadata(x) <- NULL
## show empty metadata
annotationMetadata(x)
annotationCharset(x)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.