StockholmMultipleAlignment-class: StockholmMultipleAlignment objects

StockholmMultipleAlignment-classR Documentation

StockholmMultipleAlignment objects

Description

The StockholmMultipleAlignment class contains a multiple sequence alignment along with its annotations, as defined for the Stockholm file format.

Usage

StockholmDNAMultipleAlignment(
  x = character(),
  start = NA,
  end = NA,
  width = NA,
  use.names = TRUE,
  rowmask = NULL,
  colmask = NULL,
  GF = character(),
  GS = list(),
  GR = list(),
  GC = character()
)

StockholmRNAMultipleAlignment(
  x = character(),
  start = NA,
  end = NA,
  width = NA,
  use.names = TRUE,
  rowmask = NULL,
  colmask = NULL,
  GF = character(),
  GS = list(),
  GR = list(),
  GC = character()
)

StockholmAAMultipleAlignment(
  x = character(),
  start = NA,
  end = NA,
  width = NA,
  use.names = TRUE,
  rowmask = NULL,
  colmask = NULL,
  GF = character(),
  GS = list(),
  GR = list(),
  GC = character()
)

Arguments

x

(character vector, aligned XStringSet or MultipleAlignment of an appropriate type) multiple sequence alignment without Stockholm extensions

start, end, width, use.names, rowmask, colmask

passed to the appropriate MultipleAlignment constructor, unless "x" is already a MultipleAlignment

GF

(named character vector or BStringSet) Free-text annotations which belong to the alignment file as a whole. The name of each element is a tag identifying the type of data. (See Details).

GS

(named list of named character vectors or BStringSetList) Free-text annotations which belong to the individual sequences in the alignment. The names of the outer list or BStringSetList are tags identifying the type of data for each element. (See Details). Names of inner character vectors or BStringSets match the names of sequences in the alignment, but there is no requirement that every sequence must be annotated for every tag.

GR

(named list of named character vectors or BStringSetList) Annotations for individual residues in the alignment. The names of the outer list or BStringSetList are tags identifying the type of data for each element. (See Details). Names of inner character vectors or BStringSets match the names of sequences in the alignment, but there is no requirement that every sequence must be annotated for every tag. Unlike GS tags, the width of all elements must be the same, and must match the width of the alignment.

GC

(named character vector or BStringSet) Annotations which belong to each column of the alignment as a whole. The name of each element is a tag identifying the type of data. (See Details). Unlike GF tags, the width of all elements must be the same, and must match the width of the alignment.

Details

Although the StockholmMultipleAlignment class is agnostic about the specific tags used, the following tags are the most likely to be recognized by Infernal or other software which reads or writes Stockholm files:

Type Tag Description
GF ID IDentifier
GF AC ACcession
GF DE DEscription
GF AU AUthor
GF GA GAthering threshold
GF NC Noise Cutoff
GF TC Trusted Cutoff
GS WT WeighT
GS AC ACcession number
GS DE DEscription
GS DR Database Reference
GS OS OrganiSm (species)
GS OC Organism Classification (clade, etc.)
GS LO Look (Color, etc.)
GR SS Secondary Structure
GR SA Surface Accessibility
GR TM TransMembrane
GR PP Posterior Probability
GR LI LIgand binding
GR AS Active Site
GR pAS AS - Pfam predicted
GR sAS AS - from SwissProt
GR IN INtron (in or after)
GC RF ReFerence
GC SS_cons Secondary Structure consensus
GC SA_cons Surface Accessibility consensus
GC TM_cons TransMembrane consensus
GC PP_cons Posterior Probability consensus
GC LI_cons LIgand binding consensus
GC AS_cons Active Site consensus
GC pAS_cons AS - Pfam predicted consensus
GC sAS_cons AS - from SwissProt consensus
GC IN_cons INtron (in or after) consensus

Value

a new StockholmMultipleAlignment object

Slots

GF

BStringSet. Free-text annotations which belong to the alignment file as a whole. The name of each element is a tag identifying the type of data. (See Details).

GS

BStringSetList. Free-text annotations which belong to the individual sequences in the alignment. The name of each BStringSet is a tag identifying the type of data. (See Details). Names of individual BString elements match the names of sequences in the alignment, but there is no requirement that every sequence must be annotated for every tag.

GR

BStringSetList. Annotations for individual residues in the alignment. The name of each BStringSet is a tag identifying the type of data. (See Details). Names of individual BString elements match the names of sequences in the alignment, but there is no requirement that every sequence must be annotated for every tag. Unlike GS tags, the width of all elements must be the same, and must match the width of the alignment.

GC

BStringSet. Annotations which belong to each column of the alignment as a whole. The name of each element is a tag identifying the type of data. (See Details). Unlike GF tags, the width of all elements must be the same, and must match the width of the alignment.

Examples

# Typically a StockholmMultipleAlignment object is read from a file created
# by other software, but it can also be created manually.
# This example reproduces the example file given in the Stockholm format
# definition.
samp <- StockholmAAMultipleAlignment(
    x = c(
        "O83071/192-246" = "MTCRAQLIAVPRASSLAE..AIACAQKM....RVSRVPVYERS",
        "O83071/259-312" = "MQHVSAPVFVFECTRLAY..VQHKLRAH....SRAVAIVLDEY",
        "O31698/18-71"   = "MIEADKVAHVQVGNNLEH..ALLVLTKT....GYTAIPVLDPS",
        "O31698/88-139"  = "EVMLTDIPRLHINDPIMK..GFGMVINN......GFVCVENDE",
        "O31699/88-139"  = "EVMLTDIPRLHINDPIMK..GFGMVINN......GFVCVENDE"
    ),
    GF = c(
        ID = "CBS",
        AC = "PF00571",
        AU = "Bateman A",
        CC = paste("CBS domains are small intracellular modules mostly",
                   "found in 2 or four copies within a protein."),
        SQ = "67"
    ),
    GS = list(
        # ACcession number
        AC = c(
            "O31698/18-71" = "O31698",
            "O83071/192-246" = "O83071",
            "O83071/259-312" = "O83071",
            "O31698/88-139" = "O31698"
        ),
        # OrganiSm
        OS = c("O31698/88-139" = "Bacillus subtilis")
    ),
    GR = list(
        # Surface Accessibility
        SA = c(
            "O83071/192-246" = "999887756453524252..55152525....36463774777"
        ),
        # Secondary Structure
        SS = c(
            "O83071/259-312" = "CCCCCHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEEEE",
            "O31698/18-71"   = "CCCHHHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEHHH",
            "O31698/88-139"  = "CCCCCCCHHHHHHHHHHH..HEEEEEEE....EEEEEEEEEEH"
        ),
        # Active Site
        AS = c(
            "O31699/88-139"  = "________________*__________________________"
        ),
        # INtron
        IN = c(
            "O31699/88-139"  = "____________1______________2__________0____"
        )
    ),
    GC = c(
        # Secondary Structure consensus
        SS_cons = "CCCCCHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEEEH"
    )
 )
 samp

brendanf/inferrnal documentation built on Feb. 4, 2023, 4:49 p.m.