bamboo.validation.astral30: Validation (Test) Dataset named astral30

Description Usage Format Source References

Description

This validation dataset gives the names, the primary structure (amino acid sequences), and the secondary structure of 3,344 individual proteins from the SCOPe 2.03 data set filtered at 30% sequence identity as used in the paper cited below.

Usage

1

Format

A data frame containing 3,344 observations on the following 3 variables.

  1. name: protein name;

  2. primary: protein primary structure (amino acid sequence) in 20 letters denoting the 20 amino acids;

  3. hetc: secondary structure in 4 letters denoting the 4 structure types: helix (H), strand (E), turn (T) and coil (C).

Source

Fox NK, Brenner SE, Chandonia JM (2013) Scope: Structural classification of proteins extended, integrating scop and astral data and classification of new structures. Nucleic Acids Research 42: D304-D309. <DOI:10.1093/nar/gkt1240>

References

Q. Li, D. B. Dahl, M. Vannucci, H. Joo, J. W. Tsai (2014), Bayesian Model of Protein Primary Sequence for Secondary Structure Prediction, PLOS ONE, 9(10), e109832. <DOI:10.1371/journal.pone.0109832>


bamboo documentation built on April 14, 2020, 6:53 p.m.