Description Usage Arguments Details Value Accessor-like methods Subsetting and concatination Coercion methods Note Author(s) References See Also Examples
Create an object containing a set of DNA-, RNA- or amino acid sequences
1 2 3 4 5 6 7 8 9 10 11 12 13 |
x |
character vector containing a set of sequences as uppercase characters or in mixed uppercase/lowercase form. |
i |
numeric vector with indicies or character with element names |
use.names |
when set to |
The class DNAVector
is used for storing DNA sequences,
RNAVector
for RNA sequences and AAVector
for
amino acid sequences. The class BioVector
is derived from
the R base type character
representing a vector of character
strings. It is an abstract class which can not be instantiated.
BioVector
is the parent class for DNAVector
, RNAVector
and AAVector
. For the three derived classes identically named
functions exist which are constructors. It should be noted that the
constructors only wrap the sequence data into a class without copying or
recoding the data.
The functions provided for DNAVector
, RNAVector
and
AAVector
classes are only a very small subset compared to those
of XStringSet
but are designed along their counterparts
from the Biostrings package. Assignment of metadata
and element metadata via mcols
is supported for the
DNAVector
, RNAVector
and AAVector
objects similar to
objects of XStringSet
derived classes (for details on metadata
assignment see annotationMetadata
and
positionMetadata
).
In contrast to XStringSet
the BioVector
derived classes
also support the storage of lowercase characters. This can be relevant
for repeat regions which are often coded in lowercase characters. During
the creation of XStringSet
derived classes the lowercase
characters are converted to uppercase automatically and the information
about repeat regions is lost. For BioVector
derived classes the
user can specify during creation of a sequence kernel object whether
lowercase characters should be included as uppercase characters or
whether repeat regions should be ignored during sequence analysis.
In this way it is possible to perform both types of analysis on the same
set of sequences through defining one kernel object which accepts lowercase
characters and another one which ignores them.
constructors DNAVector, RNAVector, AAVector
return a
sequence set of identical class name
In the code snippets below, x
is a BioVector
.
length(x)
:
the number of sequences in x
.
width(x)
:
vector of integer values with the number of bases/amino
acids for each sequence in the set.
names(x)
:
character vector of sample names.
In the code snippets below, x
is a BioVector
.
x[i]
:
return a BioVector
object that only contains the samples selected
with the subsetting parameter i
. This parameter can be a numeric
vector with indices or a character vector which is matched against the
names of x
. Element related metadata is subsetted accordingly if
available.
c(x, ...)
:
return a sequence set that is a concatination of the given sequence sets.
In the code snippets below, x
is a BioVector
.
as.character(x, use.names=TRUE)
:
return the sequence set as named or unnamed character vector dependent on
the use.names parameter.
Sequence data can be processed by KeBABS in XStringSet and BioVector based
format. Within KeBABS except for treatment of lowercase characters both
formats are equivalent. It is recommended to use XStringSet
based formats whenever the support of lowercase characters is not of
interest because these classes provide in general much richer functionality
than the BioVector
classes. String kernels provided in the
kernlab
package (see stringdot) do not
support XStringSet
derived objects. The usage of these kernels
is possible in KeBABS with sequence data in BioVector
based format.
Johannes Palme <kebabs@bioinf.jku.at>
http://www.bioinf.jku.at/software/kebabs
J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package
for kernel-based analysis of biological sequences.
Bioinformatics, 31(15):2574-2576, 2015.
DOI: 10.1093/bioinformatics/btv176.
metadata
, elementMetadata
,
XStringSet
, DNAStringSet
,
RNAStringSet
, AAStringSet
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | ## in general DNAStringSet should be prefered as described above
## create DNAStringSet object for a set of sequences
x <- DNAStringSet(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
"GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
## assign names to the sequences
names(x) <- c("Sample1", "Sample2")
## to show the different handling of lowercase characters
## create DNAVector object for the same set of sequences and assign names
xv <- DNAVector(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
"GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
names(xv) <- c("Sample1", "Sample2")
## show DNAStringSet object - lowercase characters were translated
x
## in the DNAVector object lowercase characters are unmodified
## their handling can be defined at the level of the sequence kernel
xv
## show number of the sequences in the set and their number of characters
length(xv)
width(xv)
nchar(xv)
|
Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel
Attaching package: 'BiocGenerics'
The following objects are masked from 'package:parallel':
clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
clusterExport, clusterMap, parApply, parCapply, parLapply,
parLapplyLB, parRapply, parSapply, parSapplyLB
The following objects are masked from 'package:stats':
IQR, mad, sd, var, xtabs
The following objects are masked from 'package:base':
Filter, Find, Map, Position, Reduce, anyDuplicated, append,
as.data.frame, basename, cbind, colMeans, colSums, colnames,
dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
intersect, is.unsorted, lapply, lengths, mapply, match, mget,
order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
union, unique, unsplit, which, which.max, which.min
Loading required package: S4Vectors
Loading required package: stats4
Attaching package: 'S4Vectors'
The following object is masked from 'package:base':
expand.grid
Loading required package: IRanges
Loading required package: XVector
Attaching package: 'Biostrings'
The following object is masked from 'package:base':
strsplit
Loading required package: kernlab
Attaching package: 'kernlab'
The following object is masked from 'package:Biostrings':
type
A DNAStringSet instance of length 2
width seq names
[1] 45 AACCGCGATTATCGATATATATATATATATTGGAAGCTAGGACTA Sample1
[2] 42 GACTTACCCGAGAGAGAGAGAGACATGAGAGGGAAGCTAGTA Sample2
A DNAVector instance of length 2
width seq names
[1] 45 AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA Sample1
[2] 42 GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA Sample2
[1] 2
[1] 45 42
[1] 45 42
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.