library(knitr)
## use pngquant to reduce size of PNG images
knit_hooks$set(pngquant = hook_pngquant)
pngquant <- "--speed=1 --quality=0-25"
# in case pngquant isn't available
if (!nzchar(Sys.which("pngquant"))) pngquant <- NULL 

## colorize messages 20171031
## adapted from https://gist.github.com/yihui/2629886#file-knitr-color-msg-rnw
color_block = function(color) {
  function(x, options) sprintf('<pre style="color:%s">%s</pre>', color, x)
}
knit_hooks$set(warning = color_block('magenta'), error = color_block('red'), message = color_block('blue'))
options(width = 80)
# https://stackoverflow.com/questions/595365/how-to-render-narrow-non-breaking-spaces-in-html-for-windows
logfO2 <- "log&#x202F;<i>f</i>O<sub>2</sub>"
logaH2O <- "log&#x202F;<i>a</i>H<sub>2</sub>O"
nH2O <- "<i>n</i>H<sub>2</sub>O"
Zc <- "<i>Z</i><sub>C</sub>"

This vignette runs the code to make the plots from the following paper first published by Springer Nature:

Dick JM. 2022. A thermodynamic model for water activity and redox potential in evolution and development. Journal of Molecular Evolution 90(2): 182--199. doi: 10.1007/s00239-022-10051-7

Use this link for full-text access to a view-only version of the paper: https://rdcu.be/cITho. A preprint of the paper is available on bioRxiv at doi: 10.1101/2021.01.29.428804.

On 2023-12-18, Figure 3a was modified from the original publication to use chemical metrics computed from the sum of amino acid compositions of proteins in each gene age category. The original publication used mean values of pre-computed chemical metrics for all proteins in each gene age category. The tables of chemical metrics for all proteins were removed to save space in the current version of the package; they remain available in the Zenodo archive up to JMDplots version 1.2.18 (https://doi.org/10.5281/zenodo.8207128). Compared to the original publication, the summation of amino acid compositions gives greater weight to longer proteins. The lines shift somewhat because of this revision, but the overall trends are unchanged.

This vignette was compiled on r Sys.Date() with JMDplots r packageDescription("JMDplots")$Version, CHNOSZ r packageDescription("CHNOSZ")$Version, and canprot r packageDescription("canprot")$Version.

To reduce running time, the plots in this vignette are made with 99 bootstrap replicates. To reproduce the plots in the paper, the value of boot.R in the function calls below should be changed to 999.

library(JMDplots)

Comparison of different sets of basis species (Figure 1)

evdevH2O1()

Data source: UniProt reference protoemes (https://uniprot.org).

Protein length and chemical metrics for phylostratigraphic age groups (Figure 2)

evdevH2O2(boot.R = 99)

Data sources: Phylostrata are from @TPPG17. Consensus gene ages are from @LMM16.

Evolution of protein r Zc in eukaryotic lineages (Figure 3)

evdevH2O3()

Data sources: a Consensus gene ages are from @LMM16. Divergence times of human lineage are from @KSSH17. b Amino acid compositions of homology groups for Pfam domains are from @JWN+20 and @JWN+21.

MaximAct: Thermodynamic analysis of optimal r logaH2O and r logfO2 for target proteins (Figure 4)

evdevH2O4()

Optimal r logaH2O and r logfO2 and virtual Eh for target proteins (Figure 5)

evdevH2O5()

Data sources: Blood plasma and subcellular redox potentials (E~GSH~) are from @JS15 and @SDMM16.

Chemical metrics for and thermodynamic parameters with different background proteomes (Figure 6)

evdevH2O6()

Chemical and thermodynamic analysis of B. subtilis biofilm transcriptome and proteome (Figure 7)

evdevH2O7(boot.R = 99)

Data source: Transcriptomic and proteomic data are from @FOK+21.

Organismal water content, proteomic r nH2O, and optimal r logaH2O for fruit fly development (Figure 8)

evdevH2O8(boot.R = 99)

Data sources: a Whole-organism water content is from @CR66. b Stoichiometric hydration state of proteins is calculated in this study using proteomic data from @CBS+17. d r Zc and r nH2O of differentially expressed proteins in embryos and adult flies is calculated in this study using proteomic data from @FKL+19.

Specific values mentioned in the text

Total and unmapped numbers of genes in @TPPG17 phylostrata dataset

file <- system.file("extdata/evdevH2O/phylostrata/TPPG17.csv.xz", package = "JMDplots")
dat <- read.csv(file)
message(paste("Total number of genes:", nrow(dat)))
message("Total gene count in each phylostratum:")
table(dat$Phylostrata)
message(paste("Genes not mapped to UniProt:", sum(is.na(dat$Entry))))
message("Unmapped genes in each phylostratum:")
table(dat$Phylostrata[is.na(dat$Entry)])

Calculating r Zc of chicken egg-white lysozyme from the chemical formula

(pf <- protein.formula("LYSC_CHICK"))
CHNOSZ::ZC(pf)

Calculating r Zc of lysozyme from the amino acid composition

(aa <- pinfo(pinfo("LYSC_CHICK")))
canprot::Zc(aa)

Per-residue chemical formula, formation reaction, equilibrium constant, and activity product

LYSC_example()

Number of background proteins from the human proteome

TPPG17_file <- "extdata/evdevH2O/phylostrata/TPPG17.csv.xz"
LMM16_file <- "extdata/evdevH2O/phylostrata/LMM16.csv.xz"
TPPG17 <- read.csv(system.file(TPPG17_file, package = "JMDplots"), as.is = TRUE)
LMM16 <- read.csv(system.file(LMM16_file, package = "JMDplots"), as.is = TRUE)
UniProt_IDs <- na.omit(intersect(TPPG17$Entry, LMM16$UniProt))
length(UniProt_IDs)

Data sources: @TPPG17 (TPPG17) and @LMM16 (LMM16).

References



jedick/JMDplots documentation built on April 12, 2025, 1:35 p.m.