First, let's load some packages including HTSSIP
.
library(dplyr) library(ggplot2) library(HTSSIP)
See HTSSIP introduction vignette for a description on why dataset parsing (all treatment-control comparisons) is needed.
Let's see the already parsed dataset
physeq_S2D2_l
Let's set some parameters used later.
# adjusted P-value cutoff padj_cutoff = 0.1 # number of cores for parallel processing (increase depending on your computational hardware) ncores = 2
First, we'll just run HR-SIP on 1 treatment-control comparison. Let's get the individual phyloseq object.
physeq = physeq_S2D2_l[[1]]
physeq
Let's check that the samples belong to either a 13C-treatment or 12C-control.
physeq %>% sample_data %>% .$Substrate %>% table
OK, we should be ready to run HR-SIP!
Note that the design
parameter for HRSIP()
is the experimental design parameter for calculating log2 fold change (l2fc) values with DESeq. Here, it's used to distinguish label-treatment and unlabel-control samples.
df_l2fc = HRSIP(physeq, design = ~Substrate, padj_cutoff = padj_cutoff, sparsity_threshold = c(0,0.15,0.3)) # just using 3 thresholds to reduce time df_l2fc %>% head(n=3)
How many "incorporators"" (rejected hypotheses)?
df_l2fc %>% filter(padj < padj_cutoff) %>% group_by() %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length)
Let's plot a breakdown of incorporators for each phylum.
# summarizing df_l2fc_s = df_l2fc %>% filter(padj < padj_cutoff) %>% mutate(Rank2 = gsub('^__', '', Rank2)) %>% group_by(Rank2) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>% ungroup() # plotting ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) + geom_bar(stat='identity') + labs(x='Phylum', y='Number of incorporators') + theme_bw() + theme( axis.text.x = element_text(angle=45, hjust=1) )
Let's now run HR-SIP on all treatment-control comparisons in the dataset:
# Number of comparisons physeq_S2D2_l %>% length
The function plyr::ldply()
is useful (compared to lapply()
) beccause it can be run in parallel and returns a data.frame object.
# Running in parallel; you may need to change the backend for your environment. # Or you can just set .parallel=FALSE. doParallel::registerDoParallel(ncores) df_l2fc = plyr::ldply(physeq_S2D2_l, HRSIP, design = ~Substrate, padj_cutoff = padj_cutoff, sparsity_threshold = c(0,0.15,0.3), # just using 3 thresholds to reduce run time .parallel=TRUE) df_l2fc %>% head(n=3)
Each specific phyloseq subset (treatment-control comparison) is delimited with the ".id" column.
df_l2fc %>% .$.id %>% unique
For clarity, let's edit these long strings to make them more readable when plotted.
df_l2fc = df_l2fc %>% mutate(.id = gsub(' \\| ', '\n', .id)) df_l2fc %>% .$.id %>% unique
How many incorporators (rejected hypotheses) & which sparsity cutoff was used for each comparison?
Note: you could set one sparsity cutoff for all comparisons by setting the sparsity_cutoff
to a specific value.
df_l2fc %>% filter(padj < padj_cutoff) %>% group_by(.id, sparsity_threshold) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>% as.data.frame
How about a breakdown of incorporators for each phylum in each comparision.
# summarizing df_l2fc_s = df_l2fc %>% filter(padj < padj_cutoff) %>% mutate(Rank2 = gsub('^__', '', Rank2)) %>% group_by(.id, Rank2) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>% ungroup() # plotting ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) + geom_bar(stat='identity') + labs(x='Phylum', y='Number of incorporators') + facet_wrap(~ .id, scales='free') + theme_bw() + theme( axis.text.x = element_text(angle=55, hjust=1) )
MW-HR-SIP is run very similarly to HRSIP, but it uses multiple buoyant density (BD) windows. MW-HR-SIP is performed with the HRSIP()
function, but multiple BD windows are specified.
Let's use 3 buoyant density windows (g/ml):
1.70-1.73, 1.72-1.75, 1.74-1.77
windows = data.frame(density_min=c(1.70, 1.72, 1.74), density_max=c(1.73, 1.75, 1.77)) windows
Running HRSIP with all 3 BD windows. Let's run this in parallel to speed things up.
You can turn off parallel processing by setting the parallel
option to FALSE
. Also, there's 2 different levels that could be parallelized: either the ldply()
or HRSIP()
. Here, I'm running HRSIP in parallel, but it may make sense in other situations (eg., many comparisons but few density windows and/or sparsity cutoffs) to use ldply in parallel only.
doParallel::registerDoParallel(ncores) df_l2fc = plyr::ldply(physeq_S2D2_l, HRSIP, density_windows = windows, design = ~Substrate, padj_cutoff = padj_cutoff, sparsity_threshold = c(0,0.15,0.3), # just using 3 thresholds to reduce run time .parallel = TRUE) df_l2fc %>% head(n=3)
Let's check that we have all treatment-control comparisons.
df_l2fc %>% .$.id %>% unique
How many incorporators (rejected hypotheses) & which sparsity cutoff was used for each comparison?
Note: one sparsity cutoff could be set for all comparisons by setting the sparsity_cutoff
to a specific value.
df_l2fc %>% filter(padj < padj_cutoff) %>% group_by(.id, sparsity_threshold) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>% as.data.frame
The density windows can vary for each OTU. Let's plot which density windows were used for the OTUs in the dataset.
# summarizing df_l2fc_s = df_l2fc %>% mutate(.id = gsub(' \\| ', '\n', .id)) %>% filter(padj < padj_cutoff) %>% mutate(density_range = paste(density_min, density_max, sep='-')) %>% group_by(.id, sparsity_threshold, density_range) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) #plotting ggplot(df_l2fc_s, aes(.id, n_incorp_OTUs, fill=density_range)) + geom_bar(stat='identity', position='fill') + labs(x='Control-treatment comparision', y='Fraction of incorporators') + scale_y_continuous(expand=c(0,0)) + theme_bw() + theme( axis.text.x = element_text(angle=55, hjust=1) )
Different BD windows were used for different treatment-control comparisons because the amount of BD shift likely varied among taxa. For example, if a taxon incorporates 100% 13C isotope, then a very 'heavy' BD window may show a larger l2fc than a less 'heavy' BD window.
Let's look at a breakdown of incorporators for each phylum in each comparision.
# summarizing df_l2fc_s = df_l2fc %>% mutate(.id = gsub(' \\| ', '\n', .id)) %>% filter(padj < padj_cutoff) %>% mutate(Rank2 = gsub('^__', '', Rank2)) %>% group_by(.id, Rank2) %>% summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>% ungroup() # plotting ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) + geom_bar(stat='identity') + labs(x='Phylum', y='Number of incorporators') + facet_wrap(~ .id, scales='free') + theme_bw() + theme( axis.text.x = element_text(angle=55, hjust=1) )
Note that the MW-HR-SIP method identifies more incorporators than the HR-SIP method (which uses just one BD window).
MW-HR-SIP detects more taxa for 2 main reasons. First, taxa vary in G+C content, so using only 1 BD window likely encompasses BD shifts for taxa of certain G+C contents (eg., ~50% G+C), but may miss other taxa with higher or lower G+C content. Second, taxa can vary in how much isotope was incorporated, which will affect where each taxon's DNA is in the density gradient.
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.