Description Usage Arguments Details References See Also Examples
These functions calculate chemical metrics of proteins given a data frame of amino acid compositions.
1 2 3 4 5 6 7 |
AAcomp |
data frame, amino acid compositions |
nothing |
dummy argument |
basis |
character, basis species |
Columns in AAcomp
should be named with the three-letter abbreviations for the amino acids (Ala, Arg, ...).
Abbreviations are matched without regard to case (e.g. ALA is the same as ala).
The metrics are described below:
ZCAA
Average oxidation state of carbon (\ZC) (Dick, 2014).
nothing
is an extra argument that does nothing.
It is provided so that do.call
can be used to run ZCAA
or H2OAA
with the same number of arguments.
This metric is independent of the choice of basis species.
H2OAA
Stoichiometric hydration state (\nH2O) per residue.
The available basis
species are:
QEC - glutamine, glutamic acid, cysteine, \H2O, \O2 (Dick et al., 2020) (this is the default for getOption("basis")
)
QCa - glutamine, cysteine, acetic acid, \H2O, \O2
Any other valid basis specification for basis
, such as CHNOS for \CO2, \NH3, \H2S, \H2O, and \O2
O2AA
Stoichiometric oxidation state (\nO2) per residue. The basis species also affect this calculation.
GRAVY
Grand average of hydropathicity. Values of the hydropathy index for individual amino acids are from Kyte and Doolittle (1982).
pI
Isoelectric point.
The net charge for each ionizable group was pre-calculated from pH 0 to 14 at intervals of 0.01.
The isoelectric point is found as the pH where the sum of charges of all groups in the protein is closest to zero.
The \pK values for the terminal groups and sidechains are taken from Bjellqvist et al. (1993) and Bjellqvist et al. (1994); note that the calculation does not implement position-specific adjustments described in the latter paper.
The number of N- and C-terminal groups is taken to be one, unless a value for chains
(number of polypeptide chains) is given in AAcomp
.
MWAA
Molecular weight per residue.
Note that \ZC is a per-carbon average, but \nH2O is a per-residue average. The contribution of \H2O from the terminal groups of proteins is counted, so shorter proteins have slightly greater \nH2O.
Tests for a few proteins (see examples) indicate that GRAVY
and pI
are equal those calculated with the ProtParam tool (https://web.expasy.org/protparam/; Gasteiger et al., 2005).
basis.text
is used in the vignettes to generate a textual description of the names of the basis species, except \H2O and \O2, for one of the keywords QEC or QCa.
Bjellqvist, B., Hughes, G. J., Pasquali, C., Paquet, N., Ravier, F., Sanchez, J.-C., Frutiger, S. and Hochstrasser, D. (1993) The focusing positions of polypeptides in immobilized pH gradients can be predicted from their amino acid sequences. Electrophoresis 14, 1023–1031. doi: 10.1002/elps.11501401163
Bjellqvist, B. and Basse, B. and Olsen, E. and Celis, J. E. (1994) Reference points for comparisons of two-dimensional maps of proteins from different human cell types defined in a pH scale where isoelectric points correlate with polypeptide compositions. Electrophoresis 15, 529–539. doi: 10.1002/elps.1150150171
Dick, J. M. (2014) Average oxidation state of carbon in proteins. J. R. Soc. Interface 11, 20131095. doi: 10.1098/rsif.2013.1095
Dick, J. M., Yu, M. and Tan, J. (2020) Uncovering chemical signatures of salinity gradients through compositional analysis of protein sequences. Biogeosciences 17, 6145–6162. doi: 10.5194/bg-17-6145-2020
Gasteiger, E., Hoogland, C., Gattiker, A., Duvaud, S., Wilkins, M. R., Appel, R. D. and Bairoch, A. (2005) Protein identification and analysis tools on the ExPASy server. In J. M. Walker (Ed.), The Proteomics Protocols Handbook (pp. 571–607). Totowa, NJ: Humana Press Inc. doi: 10.1385/1-59259-890-0:571
Kyte, J. and Doolittle, R. F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. doi: 10.1016/0022-2836(82)90515-0
For calculation of \ZC from a chemical formula instead of amino acid composition, see the ZC
function in CHNOSZ.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | # we need CHNOSZ for these examples
require(CHNOSZ)
# for reference, compute ZC of alanine and glycine "by hand"
ZC.Gly <- ZC("C2H5NO2")
ZC.Ala <- ZC("C3H7NO2")
# define the composition of a Gly-Ala-Gly tripeptide
AAcomp <- data.frame(Gly = 2, Ala = 1)
# calculate the ZC of the tripeptide (value: 0.571)
ZC.GAG <- ZCAA(AAcomp)
# this is equal to the carbon-number-weighted average of the amino acids
nC.Gly <- 2 * 2
nC.Ala <- 1 * 3
ZC.average <- (nC.Gly * ZC.Gly + nC.Ala * ZC.Ala) / (nC.Ala + nC.Gly)
stopifnot(all.equal(ZC.GAG, ZC.average))
# compute the per-residue nH2O of Gly-Ala-Gly
basis("QEC")
nH2O.GAG <- species("Gly-Ala-Gly")$H2O
# divide by the length to get residue average (we keep the terminal H-OH)
nH2O.residue <- nH2O.GAG / 3
# compare with the value calculated by H2OAA() (-0.2)
nH2O.H2OAA <- H2OAA(AAcomp, "QEC")
stopifnot(all.equal(nH2O.residue, nH2O.H2OAA))
# calculate GRAVY for a few proteins
# first get the protein index in CHNOSZ's list of proteins
iprotein <- pinfo(c("LYSC_CHICK", "RNAS1_BOVIN", "AMYA_PYRFU"))
# then get the amino acid compositions
AAcomp <- pinfo(iprotein)
# then calculate GRAVY
Gcalc <- as.numeric(GRAVY(AAcomp))
# these are equal to values obtained with ProtParam on uniprot.org
# https://web.expasy.org/cgi-bin/protparam/protparam1?P00698@19-147@
# https://web.expasy.org/cgi-bin/protparam/protparam1?P61823@27-150@
# https://web.expasy.org/cgi-bin/protparam/protparam1?P49067@2-649@
Gref <- c(-0.472, -0.663, -0.325)
stopifnot(all.equal(round(Gcalc, 3), Gref))
# also calculate molecular weight of the proteins
MWcalc <- as.numeric(MWAA(AAcomp)) * protein.length(iprotein)
MWref <- c(14313.14, 13690.29, 76178.25)
stopifnot(all.equal(round(MWcalc, 2), MWref))
# calculate pI for a few proteins
iprotein <- pinfo(c("LYSC_CHICK", "RNAS1_BOVIN", "AMYA_PYRFU", "CSG_HALJP"))
AAcomp <- pinfo(iprotein)
pI_calc <- pI(AAcomp)
# reference values calculated with ProtParam on uniprot.org
# LYSC_CHICK: residues 19-147 (sequence v1)
# RNAS1_BOVIN: residues 27-150 (sequence v1)
# AMYA_PYRFU: residues 2-649 (sequence v2)
# CSG_HALJP: residues 35-862 (sequence v1)
pI_ref <- c(9.32, 8.64, 5.46, 3.37)
stopifnot(all.equal(as.numeric(pI_calc), pI_ref))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.