phyDat: Generic functions for class phyDat

print.phyDatR Documentation

Generic functions for class phyDat

Description

These functions help to manipulate alignments of class phyDat.

Usage

## S3 method for class 'phyDat'
print(x, ...)

## S3 method for class 'phyDat'
subset(x, subset, select, site.pattern = TRUE, ...)

## S3 method for class 'phyDat'
x[i, j, ..., drop = FALSE]

## S3 method for class 'phyDat'
unique(x, incomparables = FALSE, identical = TRUE, ...)

removeUndeterminedSites(x, ...)

removeAmbiguousSites(x)

allSitePattern(n, levels = NULL, names = NULL, type = "DNA", code = 1)

Arguments

x

An object containing sequences.

...

further arguments passed to or from other methods.

subset

a subset of taxa.

select

a subset of characters.

site.pattern

select site pattern or sites (see details).

i, j

indices of the rows and/or columns to select or to drop. They may be numeric, logical, or character (in the same way than for standard R objects).

drop

for compatibility with the generic (unused).

incomparables

for compatibility with unique.

identical

if TRUE (default) sequences have to be identical, if FALSE sequences are considered duplicates if distance between sequences is zero (happens frequently with ambiguous sites).

n

Number of sequences.

levels

Level attributes.

names

Names of sequences.

type

Type of sequences ("DNA", "AA" or "USER").

code

The ncbi genetic code number for translation. By default the standard genetic code is used.

Details

allSitePattern generates all possible site patterns and can be useful in simulation studies. For further details see the vignette AdvancedFeatures.

The generic function c can be used to to combine sequences and unique to get all unique sequences or unique haplotypes.

phyDat stores identical columns of an alignment only once and keeps an index of the original positions. This saves memory and especially computations as these are usually need to be done only once for each site pattern. In the example below the matrix x in the example has 8 columns, but column 1 and 2 and also 3 and 5 are identical. The phyDat object y has only 6 site pattern. If argument site.pattern=FALSE the indexing behaves like on the original matrix x. site.pattern=TRUE can be useful inside functions.

Value

The functions return an object of class phyDat.

Author(s)

Klaus Schliep klaus.schliep@gmail.com

See Also

DNAbin, as.DNAbin, baseFreq, glance.phyDat, dna2codon, read.dna, read.aa, read.nexus.data and the chapter 1 in the vignette("AdvancedFeatures", package="phangorn") and the example of pmlMix for the use of allSitePattern.

Examples


data(Laurasiatherian)
class(Laurasiatherian)
Laurasiatherian
# base frequencies
baseFreq(Laurasiatherian)
# subsetting phyDat objects
# the first 5 sequences
subset(Laurasiatherian, subset=1:5)
# the first 5 characters
subset(Laurasiatherian, select=1:5, site.pattern = FALSE)
# subsetting with []
Laurasiatherian[1:5, 1:20]
# short for
subset(Laurasiatherian, subset=1:5, select=1:20, site.pattern = FALSE)
# the first 5 site patterns (often more than 5 characters)
subset(Laurasiatherian, select=1:5, site.pattern = TRUE)

x <- matrix(c("a", "a", "c", "g", "c", "t", "a", "g",
              "a", "a", "c", "g", "c", "t", "a", "g",
              "a", "a", "c", "c", "c", "t", "t", "g"), nrow=3, byrow = TRUE,
            dimnames = list(c("t1", "t2", "t3"), 1:8))
(y <- phyDat(x))

subset(y, 1:2)
subset(y, 1:2, compress=TRUE)

subset(y, select=1:3, site.pattern = FALSE) |> as.character()
subset(y, select=1:3, site.pattern = TRUE) |> as.character()
y[,1:3] # same as subset(y, select=1:3, site.pattern = FALSE)

# Compute all possible site patterns
# for nucleotides there $4 ^ (number of tips)$ patterns
allSitePattern(5)


phangorn documentation built on Jan. 23, 2023, 5:37 p.m.