solveAmbiguousBases: Solve Ambiguous Bases in DNA Sequences
In ape: Analyses of Phylogenetics and Evolution

solveAmbiguousBases

R Documentation

Solve Ambiguous Bases in DNA Sequences

Description

Replaces ambiguous bases in DNA sequences (R, Y, W, ...) by A, G, C, or T.

Usage

solveAmbiguousBases(x, method = "columnwise", random = TRUE)

Arguments

`x`	a matrix of class `"DNAbin"`; a list is accepted and is converted into a matrix.
`method`	the method used (no other choice than the default for the moment; see details).
`random`	a logical value (see details).

Details

The replacements of ambiguous bases are done columwise. First, the base frequencies are counted: if no ambiguous base is found in the column, nothing is done. By default (i.e., if random = TRUE), the replacements are done by random sampling using the frequencies of the observed compatible, non-ambiguous bases. For instance, if the ambiguous base is Y, it is replaced by either C or T using their observed frequencies as probabilities. If random = FALSE, the greatest of these frequencies is used. If there are no compatible bases in the column, equal probabilities are used. For instance, if the ambiguous base is R, and only C and T are observed, then it is replaced by either A or G with equal probabilities.

Alignment gaps are not changed; see the function latag2n to change the leading and trailing gaps.

Value

a matrix of class "DNAbin".

Author(s)

Emmanuel Paradis

Examples

X <- as.DNAbin(matrix(c("A", "G", "G", "R"), ncol = 1))
alview(solveAmbiguousBases(X)) # R replaced by either A or G
alview(solveAmbiguousBases(X, random = FALSE)) # R always replaced by G

ape documentation built on April 3, 2025, 7:53 p.m.