Description Usage Format Details Source References
This small dataset contains aligned protein sequences for seven alleles of the aryl hydrocarbon receptor (AhR).
1 |
The format is a character matrix in which column i represents the i'th position in the alignment, and contains an amino acid code or "-" indicating an indel. Row names contain the animal species.
A DNA or protein sequence has an associated index set
{1, 2, ..., n} that labels the n
positions of the nucleotides or amino acids (AA).
This index set can be partitioned such that all members referring to
the same AA share a homogeneous partition.
For example, given the sequence ATGTA
and its index
set {1,2,…,5}, the "A" partition
contains the subset {1,5}, the "T" partition contains
{2,4}, and so on.
Given two aligned sequences and their respective partitions of the
index set, a metric distance between these partitions can be computed. See
partitionMetric
for such a metric, along with an example
of clustering this AhR dataset.
This dataset was derived from NCBI HomoloGene:1224.
Mark Hahn, Aryl hydrocarbon receptors: diversity and evolution. Chem Biol Interact, 2002, 141, 131-160
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.