extractGeary: Geary Autocorrelation Descriptor

View source: R/desc-06-Geary.R

extractGearyR Documentation

Geary Autocorrelation Descriptor

Description

This function calculates the Geary autocorrelation descriptor (dim: length(props) * nlag).

Usage

extractGeary(
  x,
  props = c("CIDH920105", "BHAR880101", "CHAM820101", "CHAM820102", "CHOC760101",
    "BIGC670101", "CHAM810101", "DAYM780201"),
  nlag = 30L,
  customprops = NULL
)

Arguments

x

A character vector, as the input protein sequence.

props

A character vector, specifying the Accession Number of the target properties. 8 properties are used by default, as listed below:

AccNo. CIDH920105

Normalized average hydrophobicity scales (Cid et al., 1992)

AccNo. BHAR880101

Average flexibility indices (Bhaskaran-Ponnuswamy, 1988)

AccNo. CHAM820101

Polarizability parameter (Charton-Charton, 1982)

AccNo. CHAM820102

Free energy of solution in water, kcal/mole (Charton-Charton, 1982)

AccNo. CHOC760101

Residue accessible surface area in tripeptide (Chothia, 1976)

AccNo. BIGC670101

Residue volume (Bigelow, 1967)

AccNo. CHAM810101

Steric parameter (Charton, 1981)

AccNo. DAYM780201

Relative mutability (Dayhoff et al., 1978b)

nlag

Maximum value of the lag parameter. Default is 30.

customprops

A n x 21 named data frame contains n customized property. Each row contains one property. The column order for different amino acid types is 'AccNo', 'A', 'R', 'N', 'D', 'C', 'E', 'Q', 'G', 'H', 'I', 'L', 'K', 'M', 'F', 'P', 'S', 'T', 'W', 'Y', 'V', and the columns should also be exactly named like this. The AccNo column contains the properties' names. Then users should explicitly specify these properties with these names in the argument props. See the examples below for a demonstration. The default value for customprops is NULL.

Value

A length length(props) * nlag named vector.

Note

For this descriptor type, users need to intelligently evaluate the underlying details of the descriptors provided, instead of using this function with their data blindly. It would be wise to use some negative and positive control comparisons where relevant to help guide interpretation of the results.

Author(s)

Nan Xiao <https://nanx.me>

References

AAindex: Amino acid index database. https://www.genome.jp/dbget/aaindex.html

Feng, Z.P. and Zhang, C.T. (2000) Prediction of membrane protein types based on the hydrophobic index of amino acids. Journal of Protein Chemistry, 19, 269-275.

Horne, D.S. (1988) Prediction of protein helix content from an autocorrelation analysis of sequence hydrophobicities. Biopolymers, 27, 451-477.

Sokal, R.R. and Thomson, B.A. (2006) Population structure inferred by local spatial autocorrelation: an usage from an Amerindian tribal population. American Journal of Physical Anthropology, 129, 121-131.

See Also

See extractMoreauBroto and extractMoran for Moreau-Broto autocorrelation descriptors and Moran autocorrelation descriptors.

Examples

x <- readFASTA(system.file("protseq/P00750.fasta", package = "protr"))[[1]]
extractGeary(x)

myprops <- data.frame(
  AccNo = c("MyProp1", "MyProp2", "MyProp3"),
  A = c(0.62, -0.5, 15), R = c(-2.53, 3, 101),
  N = c(-0.78, 0.2, 58), D = c(-0.9, 3, 59),
  C = c(0.29, -1, 47), E = c(-0.74, 3, 73),
  Q = c(-0.85, 0.2, 72), G = c(0.48, 0, 1),
  H = c(-0.4, -0.5, 82), I = c(1.38, -1.8, 57),
  L = c(1.06, -1.8, 57), K = c(-1.5, 3, 73),
  M = c(0.64, -1.3, 75), F = c(1.19, -2.5, 91),
  P = c(0.12, 0, 42), S = c(-0.18, 0.3, 31),
  T = c(-0.05, -0.4, 45), W = c(0.81, -3.4, 130),
  Y = c(0.26, -2.3, 107), V = c(1.08, -1.5, 43)
)

# Use 4 properties in the AAindex database, and 3 cutomized properties
extractGeary(
  x,
  customprops = myprops,
  props = c(
    "CIDH920105", "BHAR880101",
    "CHAM820101", "CHAM820102",
    "MyProp1", "MyProp2", "MyProp3"
  )
)

protr documentation built on Sept. 12, 2024, 6:44 a.m.