conversion: Convert sequences between binary and character string...

Description Usage Arguments Details Value Author(s) References Examples

Description

These functions convert DNA and amino acid sequences in "DNAbin" or "AAbin" format to concatenated character strings, and vice versa.

Usage

1
2
3
4
5
6
7
dna2char(x)

aa2char(x)

char2dna(z, simplify = FALSE)

char2aa(z, simplify = FALSE)

Arguments

x

a "DNAbin" or "AAbin" object.

z

a vector of concatenated strings representing DNA or amino acid sequences in upper case.

simplify

logical indicating whether length-one "DNAbin" or "AAbin" objects should be simplified to vectors. Defaults to FALSE.

Details

These functions are used to convert concatenated character strings (e.g. "TAACGC") to binary format and vice versa. To convert DNAbin and AAbin objects to non-concatenated character objects (e.g. c("T", "A", "A", "C", "G", "C")) refer to the ape package functions as.character.DNAbin and as.character.AAbin. Likewise the ape package functions as.DNAbin and as.AAbin are used to convert non-concatenated character objects to binary format.

Value

dna2char and aa2char return vectors of upper case character strings. char2dna and char2aa return "DNAbin" and "AAbin" objects, respectively. These will be lists unless the input object has length one and simplify = TRUE, in which case the returned object will be a vector.

Author(s)

Shaun Wilkinson

References

Paradis E, Claude J, Strimmer K, (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20, 289-290.

Paradis E (2007) A bit-level coding scheme for nucleotides. http://ape-package.ird.fr/misc/BitLevelCodingScheme.html.

Paradis E (2012) Analysis of Phylogenetics and Evolution with R (Second Edition). Springer, New York.

Examples

1
2
3
4
5
6
  char2dna("TAACGC")
  char2aa("VGAHAGEY")
  dna2char(char2dna("TAACGC"))
  aa2char(char2aa("VGAHAGEY"))
  char2dna(list(seq1 = "TAACGC", seq2 = "ATTGCG"))
  char2aa(list(seq1 = "VGAHAGEY", seq2 = "VNVDEV"))

Example output

1 DNA sequence in binary format stored in a list.

Sequence length: 6 

Label:

Base composition:
    a     c     g     t 
0.333 0.333 0.167 0.167 
(Total: 6 bases)
1 amino acid sequence in a list

All sequences of the same length: 8 

[1] "TAACGC"
[1] "VGAHAGEY"
2 DNA sequences in binary format stored in a list.

All sequences of same length: 6 

Labels:
seq1
seq2

Base composition:
   a    c    g    t 
0.25 0.25 0.25 0.25 
(Total: 12 bases)
2 amino acid sequences in a list

Mean sequence length: 7 
   Shortest sequence: 6 
    Longest sequence: 8 

insect documentation built on Aug. 9, 2021, 5:07 p.m.