library(knitr)
## use pngquant to reduce size of PNG images
knit_hooks$set(pngquant = hook_pngquant)
pngquant <- "--speed=1 --quality=0-25"
# in case pngquant isn't available
if (!nzchar(Sys.which("pngquant"))) pngquant <- NULL 

## colorize messages 20171031
## adapted from https://gist.github.com/yihui/2629886#file-knitr-color-msg-rnw
color_block = function(color) {
  function(x, options) sprintf('<pre style="color:%s">%s</pre>', color, x)
}
knit_hooks$set(warning = color_block('magenta'), error = color_block('red'), message = color_block('blue'))
options(width = 80)
# https://stackoverflow.com/questions/595365/how-to-render-narrow-non-breaking-spaces-in-html-for-windows
logfO2 <- "log&#x202F;<i>f</i>O<sub>2</sub>"
logaH2O <- "log&#x202F;<i>a</i>H<sub>2</sub>O"
nH2O <- "<i>n</i>H<sub>2</sub>O"
Zc <- "<i>Z</i><sub>C</sub>"

This vignette runs the code to make the plots from the following paper first published by Springer Nature:

Dick JM, Tan J. 2023. Chemical links between redox conditions and estimated community proteomes from 16S rRNA and reference protein sequences. Microbial Ecology 85(4): 1338--1355. doi: 10.1007/s00248-022-01988-9

Use this link for full-text access to a view-only version of the paper: https://rdcu.be/cMCDa. A preprint of the paper is available on bioRxiv at doi: 10.1101/2021.05.31.446500.

This vignette was compiled on r Sys.Date() with JMDplots r packageDescription("JMDplots")$Version and chem16S r packageDescription("chem16S")$Version.

library(JMDplots)

Distinct chemical parameters of reference proteomes for major taxonomic groups (Figure 1)

Table_S5 <- geo16S1()

Data source: NCBI Reference Sequence (RefSeq) database [@OWB+16]. Numbered symbols: (1) Methanococci, (2) Archaeoglobi, (3) Thermococci, (4) Halobacteria, (5) Clostridia.

Specific values mentioned in the text

r Zc for reference proteomes of genera that are abundant in produced fluids of shale gas wells:

datadir <- system.file("RefDB/RefSeq_206", package = "JMDplots")
taxon_metrics <- read.csv(file.path(datadir, "taxon_metrics.csv.xz"), as.is = TRUE)
subset(taxon_metrics, group %in% c("Halanaerobium", "Thermoanaerobacter"))

r Zc for reference proteomes of Halanaerobium species (numeric names are NCBI taxids):

datadir <- system.file("RefDB/RefSeq_206", package = "JMDplots")
refseq <- read.csv(file.path(datadir, "genome_AA.csv.xz"))
Zc.refseq <- Zc(refseq)
names(Zc.refseq) <- refseq$organism

names <- read.csv(file.path(datadir, "taxonomy.csv.xz"))
is.Halanaerobium <- names$genus %in% "Halanaerobium" & !is.na(names$species)
(Zc.Halanaerobium <- round(Zc.refseq[is.Halanaerobium], 3))
range(Zc.Halanaerobium)

Estimated community proteomes from different environments have distinct chemical signatures (Figure 2)

Table_S6 <- geo16S2()

Data sources: Guerrero Negro mat [@HCW+13], Yellowstone hot springs [@BGPF13], Baltic Sea water [@HLA+16], Lake Fryxell mat [@JHM+16], Tibetan Plateau lakes [@ZLM+16], Manus Basin vents [@MPB+17], Qarhan Salt Lake soils [@XDZ+17], Black Sea water [@SVH+19].

Lower carbon oxidation state is tied to oxygen depletion in water columns (Figure 3)

Table_S7 <- geo16S3()

Data sources: Black Sea [@SVH+19], Swiss lakes (Lake Zug and Lake Lugano) [@MZG+20], Eastern Tropical North Pacific (ETNP) [@GBL+15], Sansha Yongle Blue Hole [@HXZ+20], Ursu Lake [@BCA+21].

Common trends of carbon oxidation state of estimated community proteomes for shale gas wells and hydrothermal systems (Figure 4)

Table_S8 <- geo16S4()

Data sources: Northwestern Pennsylvania stream water and sediment [@UKD+18], Pennsylvania State Forests stream water in spring and fall [@MMA+20], Marcellus Shale [@CHM+14], Denver--Julesburg Basin [@HRR+18], Duvernay Formation [@ZLF+19].

Comparison of protein r Zc from metagenomic or metatranscriptomic data with estimates from 16S and reference sequences (Figure 5)

Table_S9 <- geo16S5()

Data sources: A. Guerrero Negro mat metagenome [@KRH+08], 16S [@HCW+13]; Bison Pool metagenome [@HRM+11], 16S [@SMS+12]; Eastern Tropical North Pacific metagenome [@GKG+15], metatranscriptome and 16S [@GBL+15]; Mono Lake metatranscriptome [@EH17], 16S [@EH18]. B. Marcellus Shale metagenome [@DBW+16], 16S [@CHM+14]. C. Manus Basin vents [@MPB+17], Black Sea metagenome [@VMW+21], 16S [@SVH+19]. D. Human Microbiome Project [@HMP12]. E. Soils [@FLA+12]; mammalian guts [@MKK+11].

RefSeq and 16S rRNA data processing outline (Figure S1)

geo16S_S1()

Scatterplots of r Zc and r nH2O for bacterial and archaeal genera vs higher taxonomic levels (Figure S2)

geo16S_S2()

r nH2O-r Zc plots for major phyla and their genera (Figure S3)

geo16S_S3()

Venn diagrams for phylum and genus names in the RefSeq (NCBI), RDP, and SILVA taxonomies (Figure S4)

Table_S10 <- geo16S_S4()

Data sources: RefSeq (NCBI): Names of taxa with protein sequences in RefSeq as listed in system.file("RefDB/RefSeq_206/taxonomy.csv.xz", package = "JMDplots"); RDP: trainset18_062020_speciesrank.fa in https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/RDPClassifier_16S_trainsetNo18_rawtrainingdata.zip; SILVA: https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/SILVA_138.1_SSURef_NR99_tax_silva.fasta.gz.

Correlations between r Zc estimated from metagenomes and 16S rRNA sequences (Figure S5)

geo16S_S5()

Correlation of r Zc with GC content of metagenomic and 16S amplicon reads (Figure S6)

geo16S_S6()

Data source: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR*******, where SRR******* is the SRA Run accession for metagenomic or 16S amplicon sequences.

Supplementary Table files

This code shows how the files for each of the Supplementary Tables is saved. The dat* objects are created by running the code blocks above, but the following code block is not run in this vignette in order to avoid cluttering the working directory.

write.csv(Table_S5, "Table_S5.csv", row.names = FALSE, quote = FALSE)
write.csv(Table_S6, "Table_S6.csv", row.names = FALSE, quote = FALSE)
write.csv(Table_S7, "Table_S7.csv", row.names = FALSE, quote = FALSE)
write.csv(Table_S8, "Table_S8.csv", row.names = FALSE, quote = FALSE)
write.csv(Table_S9, "Table_S9.csv", row.names = FALSE, quote = FALSE)
write.csv(Table_S10, "Table_S10.csv", row.names = FALSE, quote = FALSE)

References



jedick/JMDplots documentation built on April 12, 2025, 1:35 p.m.