IUPAC_CODE_MAP: The IUPAC Extended Genetic Alphabet
In Bioconductor/Biostrings: Efficient manipulation of biological strings

IUPAC_CODE_MAP

R Documentation

The IUPAC Extended Genetic Alphabet

Description

The IUPAC_CODE_MAP named character vector contains the mapping from the IUPAC nucleotide ambiguity codes to their meaning.

The mergeIUPACLetters function provides the reverse mapping.

Usage

IUPAC_CODE_MAP
mergeIUPACLetters(x)

Arguments

`x`	A vector of non-empty character strings made of IUPAC letters.

Details

IUPAC nucleotide ambiguity codes are used for representing sequences of nucleotides where the exact nucleotides that occur at some given positions are not known with certainty.

Value

IUPAC_CODE_MAP is a named character vector where the names are the IUPAC nucleotide ambiguity codes and the values are their corresponding meanings. The meaning of each code is described by a string that enumarates the base letters ("A", "C", "G" or "T") associated with the code.

The value returned by mergeIUPACLetters is an unnamed character vector of the same length as its argument x where each element is an IUPAC nucleotide ambiguity code.

Author(s)

H. Pagès

References

http://www.chick.manchester.ac.uk/SiteSeer/IUPAC\_codes.html

IUPAC-IUB SYMBOLS FOR NUCLEOTIDE NOMENCLATURE: Cornish-Bowden (1985) Nucl. Acids Res. 13: 3021-3030.

Examples

  IUPAC_CODE_MAP
  some_iupac_codes <- c("R", "M", "G", "N", "V")
  IUPAC_CODE_MAP[some_iupac_codes]
  mergeIUPACLetters(IUPAC_CODE_MAP[some_iupac_codes])

  mergeIUPACLetters(c("Ca", "Acc", "aA", "MAAmC", "gM", "AB", "bS", "mk"))

Bioconductor/Biostrings documentation built on June 10, 2025, 1:14 p.m.