StockholmMultipleAlignment-class | R Documentation |
The StockholmMultipleAlignment
class contains a multiple sequence
alignment along with its annotations, as defined for the Stockholm file
format.
StockholmDNAMultipleAlignment(
x = character(),
start = NA,
end = NA,
width = NA,
use.names = TRUE,
rowmask = NULL,
colmask = NULL,
GF = character(),
GS = list(),
GR = list(),
GC = character()
)
StockholmRNAMultipleAlignment(
x = character(),
start = NA,
end = NA,
width = NA,
use.names = TRUE,
rowmask = NULL,
colmask = NULL,
GF = character(),
GS = list(),
GR = list(),
GC = character()
)
StockholmAAMultipleAlignment(
x = character(),
start = NA,
end = NA,
width = NA,
use.names = TRUE,
rowmask = NULL,
colmask = NULL,
GF = character(),
GS = list(),
GR = list(),
GC = character()
)
x |
( |
start , end , width , use.names , rowmask , colmask |
passed to the appropriate
|
GF |
(named |
GS |
(named |
GR |
(named |
GC |
(named |
Although the StockholmMultipleAlignment
class is agnostic about the
specific tags used, the following tags are the most likely to be recognized
by Infernal or other software which reads or writes Stockholm files:
Type | Tag | Description |
GF | ID | IDentifier |
GF | AC | ACcession |
GF | DE | DEscription |
GF | AU | AUthor |
GF | GA | GAthering threshold |
GF | NC | Noise Cutoff |
GF | TC | Trusted Cutoff |
GS | WT | WeighT |
GS | AC | ACcession number |
GS | DE | DEscription |
GS | DR | Database Reference |
GS | OS | OrganiSm (species) |
GS | OC | Organism Classification (clade, etc.) |
GS | LO | Look (Color, etc.) |
GR | SS | Secondary Structure |
GR | SA | Surface Accessibility |
GR | TM | TransMembrane |
GR | PP | Posterior Probability |
GR | LI | LIgand binding |
GR | AS | Active Site |
GR | pAS | AS - Pfam predicted |
GR | sAS | AS - from SwissProt |
GR | IN | INtron (in or after) |
GC | RF | ReFerence |
GC | SS_cons | Secondary Structure consensus |
GC | SA_cons | Surface Accessibility consensus |
GC | TM_cons | TransMembrane consensus |
GC | PP_cons | Posterior Probability consensus |
GC | LI_cons | LIgand binding consensus |
GC | AS_cons | Active Site consensus |
GC | pAS_cons | AS - Pfam predicted consensus |
GC | sAS_cons | AS - from SwissProt consensus |
GC | IN_cons | INtron (in or after) consensus |
a new StockholmMultipleAlignment
object
GF
BStringSet
. Free-text annotations
which belong to the alignment file as a whole. The name of each element is
a tag identifying the type of data. (See Details).
GS
BStringSetList
. Free-text
annotations which belong to the individual sequences in the alignment. The
name of each BStringSet
is a tag
identifying the type of data. (See Details). Names of individual
BString
elements match the names of sequences
in the alignment, but there is no requirement that every sequence must be
annotated for every tag.
GR
BStringSetList
. Annotations
for individual residues in the alignment. The name of each
BStringSet
is a tag identifying the type
of data. (See Details). Names of individual
BString
elements match the names of sequences
in the alignment, but there is no requirement that every sequence must be
annotated for every tag. Unlike GS tags, the width of all elements must be
the same, and must match the width of the alignment.
GC
BStringSet
. Annotations
which belong to each column of the alignment as a whole. The name of each
element is a tag identifying the type of data. (See Details). Unlike GF
tags, the width of all elements must be the same, and must match the width
of the alignment.
# Typically a StockholmMultipleAlignment object is read from a file created
# by other software, but it can also be created manually.
# This example reproduces the example file given in the Stockholm format
# definition.
samp <- StockholmAAMultipleAlignment(
x = c(
"O83071/192-246" = "MTCRAQLIAVPRASSLAE..AIACAQKM....RVSRVPVYERS",
"O83071/259-312" = "MQHVSAPVFVFECTRLAY..VQHKLRAH....SRAVAIVLDEY",
"O31698/18-71" = "MIEADKVAHVQVGNNLEH..ALLVLTKT....GYTAIPVLDPS",
"O31698/88-139" = "EVMLTDIPRLHINDPIMK..GFGMVINN......GFVCVENDE",
"O31699/88-139" = "EVMLTDIPRLHINDPIMK..GFGMVINN......GFVCVENDE"
),
GF = c(
ID = "CBS",
AC = "PF00571",
AU = "Bateman A",
CC = paste("CBS domains are small intracellular modules mostly",
"found in 2 or four copies within a protein."),
SQ = "67"
),
GS = list(
# ACcession number
AC = c(
"O31698/18-71" = "O31698",
"O83071/192-246" = "O83071",
"O83071/259-312" = "O83071",
"O31698/88-139" = "O31698"
),
# OrganiSm
OS = c("O31698/88-139" = "Bacillus subtilis")
),
GR = list(
# Surface Accessibility
SA = c(
"O83071/192-246" = "999887756453524252..55152525....36463774777"
),
# Secondary Structure
SS = c(
"O83071/259-312" = "CCCCCHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEEEE",
"O31698/18-71" = "CCCHHHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEHHH",
"O31698/88-139" = "CCCCCCCHHHHHHHHHHH..HEEEEEEE....EEEEEEEEEEH"
),
# Active Site
AS = c(
"O31699/88-139" = "________________*__________________________"
),
# INtron
IN = c(
"O31699/88-139" = "____________1______________2__________0____"
)
),
GC = c(
# Secondary Structure consensus
SS_cons = "CCCCCHHHHHHHHHHHHH..EEEEEEEE....EEEEEEEEEEH"
)
)
samp
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.