library(knitr) ## use pngquant to reduce size of PNG images knit_hooks$set(pngquant = hook_pngquant) pngquant <- "--speed=1 --quality=0-25" # in case pngquant isn't available if (!nzchar(Sys.which("pngquant"))) pngquant <- NULL ## colorize messages 20171031 ## adapted from https://gist.github.com/yihui/2629886#file-knitr-color-msg-rnw color_block = function(color) { function(x, options) sprintf('<pre style="color:%s">%s</pre>', color, x) } knit_hooks$set(warning = color_block('magenta'), error = color_block('red'), message = color_block('blue'))
options(width = 80)
# https://stackoverflow.com/questions/595365/how-to-render-narrow-non-breaking-spaces-in-html-for-windows logfO2 <- "log <i>f</i>O<sub>2</sub>" logaH2O <- "log <i>a</i>H<sub>2</sub>O" nH2O <- "<i>n</i>H<sub>2</sub>O" Zc <- "<i>Z</i><sub>C</sub>"
This vignette runs the code to make the plots from the following paper first published by Springer Nature:
Dick JM, Tan J. 2023. Chemical links between redox conditions and estimated community proteomes from 16S rRNA and reference protein sequences. Microbial Ecology 85(4): 1338--1355. doi: 10.1007/s00248-022-01988-9
Use this link for full-text access to a view-only version of the paper: https://rdcu.be/cMCDa. A preprint of the paper is available on bioRxiv at doi: 10.1101/2021.05.31.446500.
This vignette was compiled on r Sys.Date()
with JMDplots r packageDescription("JMDplots")$Version
and chem16S r packageDescription("chem16S")$Version
.
library(JMDplots)
Table_S5 <- geo16S1()
Data source: NCBI Reference Sequence (RefSeq) database [@OWB+16]. Numbered symbols: (1) Methanococci, (2) Archaeoglobi, (3) Thermococci, (4) Halobacteria, (5) Clostridia.
r Zc
for reference proteomes of genera that are abundant in produced fluids of shale gas wells:
datadir <- system.file("RefDB/RefSeq_206", package = "JMDplots") taxon_metrics <- read.csv(file.path(datadir, "taxon_metrics.csv.xz"), as.is = TRUE) subset(taxon_metrics, group %in% c("Halanaerobium", "Thermoanaerobacter"))
r Zc
for reference proteomes of Halanaerobium species (numeric names are NCBI taxids):
datadir <- system.file("RefDB/RefSeq_206", package = "JMDplots") refseq <- read.csv(file.path(datadir, "genome_AA.csv.xz")) Zc.refseq <- Zc(refseq) names(Zc.refseq) <- refseq$organism names <- read.csv(file.path(datadir, "taxonomy.csv.xz")) is.Halanaerobium <- names$genus %in% "Halanaerobium" & !is.na(names$species) (Zc.Halanaerobium <- round(Zc.refseq[is.Halanaerobium], 3)) range(Zc.Halanaerobium)
Table_S6 <- geo16S2()
Data sources: Guerrero Negro mat [@HCW+13], Yellowstone hot springs [@BGPF13], Baltic Sea water [@HLA+16], Lake Fryxell mat [@JHM+16], Tibetan Plateau lakes [@ZLM+16], Manus Basin vents [@MPB+17], Qarhan Salt Lake soils [@XDZ+17], Black Sea water [@SVH+19].
Table_S7 <- geo16S3()
Data sources: Black Sea [@SVH+19], Swiss lakes (Lake Zug and Lake Lugano) [@MZG+20], Eastern Tropical North Pacific (ETNP) [@GBL+15], Sansha Yongle Blue Hole [@HXZ+20], Ursu Lake [@BCA+21].
Table_S8 <- geo16S4()
Data sources: Northwestern Pennsylvania stream water and sediment [@UKD+18], Pennsylvania State Forests stream water in spring and fall [@MMA+20], Marcellus Shale [@CHM+14], Denver--Julesburg Basin [@HRR+18], Duvernay Formation [@ZLF+19].
r Zc
from metagenomic or metatranscriptomic data with estimates from 16S and reference sequences (Figure 5)Table_S9 <- geo16S5()
Data sources: A. Guerrero Negro mat metagenome [@KRH+08], 16S [@HCW+13]; Bison Pool metagenome [@HRM+11], 16S [@SMS+12]; Eastern Tropical North Pacific metagenome [@GKG+15], metatranscriptome and 16S [@GBL+15]; Mono Lake metatranscriptome [@EH17], 16S [@EH18]. B. Marcellus Shale metagenome [@DBW+16], 16S [@CHM+14]. C. Manus Basin vents [@MPB+17], Black Sea metagenome [@VMW+21], 16S [@SVH+19]. D. Human Microbiome Project [@HMP12]. E. Soils [@FLA+12]; mammalian guts [@MKK+11].
geo16S_S1()
r Zc
and r nH2O
for bacterial and archaeal genera vs higher taxonomic levels (Figure S2)geo16S_S2()
r nH2O
-r Zc
plots for major phyla and their genera (Figure S3)geo16S_S3()
Table_S10 <- geo16S_S4()
Data sources: RefSeq (NCBI): Names of taxa with protein sequences in RefSeq as listed in system.file("RefDB/RefSeq_206/taxonomy.csv.xz", package = "JMDplots")
; RDP: trainset18_062020_speciesrank.fa
in https://sourceforge.net/projects/rdp-classifier/files/RDP_Classifier_TrainingData/RDPClassifier_16S_trainsetNo18_rawtrainingdata.zip; SILVA: https://www.arb-silva.de/fileadmin/silva_databases/release_138_1/Exports/SILVA_138.1_SSURef_NR99_tax_silva.fasta.gz.
r Zc
estimated from metagenomes and 16S rRNA sequences (Figure S5)geo16S_S5()
r Zc
with GC content of metagenomic and 16S amplicon reads (Figure S6)geo16S_S6()
Data source: https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR*******, where SRR*******
is the SRA Run accession for metagenomic or 16S amplicon sequences.
This code shows how the files for each of the Supplementary Tables is saved.
The dat*
objects are created by running the code blocks above, but the following code block is not run in this vignette in order to avoid cluttering the working directory.
write.csv(Table_S5, "Table_S5.csv", row.names = FALSE, quote = FALSE) write.csv(Table_S6, "Table_S6.csv", row.names = FALSE, quote = FALSE) write.csv(Table_S7, "Table_S7.csv", row.names = FALSE, quote = FALSE) write.csv(Table_S8, "Table_S8.csv", row.names = FALSE, quote = FALSE) write.csv(Table_S9, "Table_S9.csv", row.names = FALSE, quote = FALSE) write.csv(Table_S10, "Table_S10.csv", row.names = FALSE, quote = FALSE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.