BioVector: DNAVector, RNAVector, AAVector Objects and BioVector Class

Description Usage Arguments Details Value Accessor-like methods Subsetting and concatination Coercion methods Note Author(s) References See Also Examples

Description

Create an object containing a set of DNA-, RNA- or amino acid sequences

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
## Constructors:

RNAVector(x = character())

AAVector(x = character())

## Accessor-like methods: see below

## S4 method for signature 'BioVector,index,missing,ANY'
x[i]

## S4 method for signature 'BioVector'
as.character(x, use.names = TRUE)

Arguments

x

character vector containing a set of sequences as uppercase characters or in mixed uppercase/lowercase form.

i

numeric vector with indicies or character with element names

use.names

when set to TRUE the names are preserved

Details

The class DNAVector is used for storing DNA sequences, RNAVector for RNA sequences and AAVector for amino acid sequences. The class BioVector is derived from the R base type character representing a vector of character strings. It is an abstract class which can not be instantiated. BioVector is the parent class for DNAVector, RNAVector and AAVector. For the three derived classes identically named functions exist which are constructors. It should be noted that the constructors only wrap the sequence data into a class without copying or recoding the data.

The functions provided for DNAVector, RNAVector and AAVector classes are only a very small subset compared to those of XStringSet but are designed along their counterparts from the Biostrings package. Assignment of metadata and element metadata via mcols is supported for the DNAVector, RNAVector and AAVector objects similar to objects of XStringSet derived classes (for details on metadata assignment see annotationMetadata and positionMetadata).

In contrast to XStringSet the BioVector derived classes also support the storage of lowercase characters. This can be relevant for repeat regions which are often coded in lowercase characters. During the creation of XStringSet derived classes the lowercase characters are converted to uppercase automatically and the information about repeat regions is lost. For BioVector derived classes the user can specify during creation of a sequence kernel object whether lowercase characters should be included as uppercase characters or whether repeat regions should be ignored during sequence analysis. In this way it is possible to perform both types of analysis on the same set of sequences through defining one kernel object which accepts lowercase characters and another one which ignores them.

Value

constructors DNAVector, RNAVector, AAVector return a sequence set of identical class name

Accessor-like methods

In the code snippets below, x is a BioVector.

length(x): the number of sequences in x.

width(x): vector of integer values with the number of bases/amino acids for each sequence in the set.

names(x): character vector of sample names.

Subsetting and concatination

In the code snippets below, x is a BioVector.

x[i]: return a BioVector object that only contains the samples selected with the subsetting parameter i. This parameter can be a numeric vector with indices or a character vector which is matched against the names of x. Element related metadata is subsetted accordingly if available.

c(x, ...): return a sequence set that is a concatination of the given sequence sets.

Coercion methods

In the code snippets below, x is a BioVector.

as.character(x, use.names=TRUE): return the sequence set as named or unnamed character vector dependent on the use.names parameter.

Note

Sequence data can be processed by KeBABS in XStringSet and BioVector based format. Within KeBABS except for treatment of lowercase characters both formats are equivalent. It is recommended to use XStringSet based formats whenever the support of lowercase characters is not of interest because these classes provide in general much richer functionality than the BioVector classes. String kernels provided in the kernlab package (see stringdot) do not support XStringSet derived objects. The usage of these kernels is possible in KeBABS with sequence data in BioVector based format.

Author(s)

Johannes Palme <kebabs@bioinf.jku.at>

References

http://www.bioinf.jku.at/software/kebabs

J. Palme, S. Hochreiter, and U. Bodenhofer (2015) KeBABS: an R package for kernel-based analysis of biological sequences. Bioinformatics, 31(15):2574-2576, 2015. DOI: 10.1093/bioinformatics/btv176.

See Also

metadata, elementMetadata, XStringSet, DNAStringSet, RNAStringSet, AAStringSet

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
## in general DNAStringSet should be prefered as described above
## create DNAStringSet object for a set of sequences
x <- DNAStringSet(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
                    "GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
## assign names to the sequences
names(x) <- c("Sample1", "Sample2")

## to show the different handling of lowercase characters
## create DNAVector object for the same set of sequences and assign names
xv <- DNAVector(c("AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA",
                  "GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA"))
names(xv) <- c("Sample1", "Sample2")

## show DNAStringSet object - lowercase characters were translated
x
## in the DNAVector object lowercase characters are unmodified
## their handling can be defined at the level of the sequence kernel
xv

## show number of the sequences in the set and their number of characters
length(xv)
width(xv)
nchar(xv)

Example output

Loading required package: Biostrings
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

Loading required package: kernlab

Attaching package: 'kernlab'

The following object is masked from 'package:Biostrings':

    type

  A DNAStringSet instance of length 2
    width seq                                               names               
[1]    45 AACCGCGATTATCGATATATATATATATATTGGAAGCTAGGACTA     Sample1
[2]    42 GACTTACCCGAGAGAGAGAGAGACATGAGAGGGAAGCTAGTA        Sample2
  A DNAVector instance of length 2 
    width seq                                               names               
[1]    45 AACCGCGATTATCGatatatatatatatatTGGAAGCTAGGACTA     Sample1             
[2]    42 GACTTACCCgagagagagagagaCATGAGAGGGAAGCTAGTA        Sample2             
[1] 2
[1] 45 42
[1] 45 42

kebabs documentation built on Nov. 8, 2020, 7:38 p.m.