Predict Protein Secondary Structure as Helix, Beta-Sheet, or Coil

Share:

Description

Predicts 3-state protein secondary structure based on the primary (amino acid) sequence using the GOR IV method (Garnier et al., 1996).

Usage

1
2
3
4
5
6
PredictHEC(myAAStringSet,
           type = "states",
           windowSize = 7,
           background = c(H = -0.12, E = -0.25, C = 0.23),
           HEC_MI1 = NULL,
           HEC_MI2 = NULL)

Arguments

myAAStringSet

An AAStringSet object of sequences.

type

Character string indicating the type of results desired. This should be (an unambiguous abbreviation of) one of "states", "scores", or "probabilities".

windowSize

Numeric specifying the number of residues to the left or right of the center position to use in the prediction.

background

Numeric vector with the background “scores” for each of the three states (H, E, and C).

HEC_MI1

An array of dimensions 20 x 21 x 3 giving the mutual information for single residues.

HEC_MI2

An array of dimensions 20 x 20 x 21 x 21 x 3 giving the mutual information for pairs of residues.

Details

The GOR (Garnier-Osguthorpe-Robson) method is an information-theory method for prediction of secondary structure based on the primary sequence of a protein. Version IV of the method makes 3-state predictions based on the mutual information contained in single residues and pairs of residues within windowSize residues of the position being assigned. This approach is about 65% accurate, and is one of the most accurate methods for assigning secondary structure that only use a single sequence. This implementation of GOR IV does not use decision constants or the number of contiguous states when assigning the final state. Note that characters other than the standard 20 amino acids are not assigned a state.

Value

If type is "states" (the default), then the output is a character vector with the secondary structure assignment ("H", "E", or "C") for each residue in myAAStringSet.

Otherwise, the output is a list with one element for each sequence in myAAStringSet. Each list element contains a matrix of dimension 3 (H, E, or C) by the number of residues in the sequence. If type is "scores", then values in the matrix represent log-odds “scores”. If type is "probabilities" then the values represent the normalized probabilities of the three states at a position.

Author(s)

Erik Wright DECIPHER@cae.wisc.edu

References

Garnier, J., Gibrat, J. F., & Robson, B. (1996). GOR method for predicting protein secondary structure from amino acid sequence. Methods in Enzymology, 266, 540-553.

See Also

HEC_MI1, HEC_MI2, PredictDBN

Examples

1
2
3
4
5
fas <- system.file("extdata", "50S_ribosomal_protein_L2.fas", package="DECIPHER")
dna <- readDNAStringSet(fas)
aa <- translate(dna)
hec <- PredictHEC(aa)
head(hec)

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.