Prior to the development of these HTS-SIP analysis methods, DNA- and RNA-SIP experiments that utilized Sanger or high throughput sequencing were usually analyzed with standard statistical processes (e.g. t-tests), in order to identify incorporators. Previous work suggests that these methods generally have low senstivity and/or high false positive rates when applied to sequence data. Here, these analysis methods will be referred to "heavy-SIP" methods. While the work of Youngblut et al., (https://doi.org/10.1101/138719) suggests that HR-SIP analysis methods (eg., MW-HR-SIP) should be used for processing HTS-SIP datasets, the HTSSIP R package provides heavy-SIP methods so researchers have the option of using these methods and making their own comparisons to HR-SIP methods.

heavy-SIP is performed with the `heavy_SIP()`

function, which consists of multiple possible tests. See `?heavy_SIP`

for more details. This vignette demonstrates the use of `heavy_SIP()`

.

First, let's load some packages including `HTSSIP`

.

library(dplyr) library(ggplot2) library(HTSSIP)

# adjusted P-value cutoff padj_cutoff = 0.1

For unreplicated datasets (no experiment replicates of controls or treatments), the options are limited on how to identify incorporators.

We will be using a dataset that is already parsed. See HTSSIP introduction vignette for a description on why dataset parsing (all treatment-control comparisons) is needed.

physeq_S2D2_l

First, we'll just focus on 1 treatment-control comparison. Let's get the individual phyloseq object.

physeq = physeq_S2D2_l[[1]] physeq

Let's check that the samples belong to either a 13C-treatment or 12C-control.

physeq %>% sample_data %>% .$Substrate %>% table

Since this dataset is an unreplicated comparison between treatment & control, we are just going to use the 'binary' method, which will call incorporators if they are present in the "heavy" gradient fractions of the treatment and not present in the "heavy" fractions of the control. Note that the "heavy" fractions are user-defined.

df_res = heavy_SIP(physeq, ex="Substrate=='12C-Con'", comparison='H-v-H', hypo_test='binary') df_res %>% head(n=3)

Since no real statistical test, the "statistic" is just 0 (not an incorporator) or 1 (an incorporator). Also, the "p" and "padj" columns are thus "NA".

How many "incorporators"?

df_res$statistic %>% table

Experimental replicates allows us to use tradional hypothesis testing (*e.g.,* t-tests) for determining significantly differ OTU abundances between treatment and controls. Note that there is a reason why more suffisticated statistical methods have been developed for assessing differentially abundant features in high throughput sequencing datasets (*e.g.,* DESeq2, EdgeR, or MetagenomeSeq). The traditional methods don't account for many challenging aspects of identifying statistically different abundances in sequence data such as i) a high number of multiple hypotheses ii) zero-inflation iii) compositional data (relative abundances; the sum-to-one constraint).

With that said, let's try out these heavy-SIP methods on a replicated dataset, with 3 experimental replicates of the control and treatment (total gradients = 6)

physeq_rep3

physeq_rep3 %>% sample_data %>% head(n=3)

To compare "heavy" fractions in the treatment versus "heavy" fractions in the control, we will use the "H-v-H" comparison method. See `?heavy_SIP`

for details on other possible comparisons.

df_res = heavy_SIP(physeq_rep3, ex="Treatment=='12C-Con'", comparison='H-v-H', hypo_test='t-test') df_res %>% head(n=3)

"padj" is p-values adjusted with the Benjamini Hochberg method.

How many incorporators?

df_res %>% filter(padj < padj_cutoff) %>% nrow

No incorporators. Obviously, the sensitivity of this method is pretty low. What's the distribution of p-values?

df_res$p %>% summary

Does anything change when we use a nonparametric test? Here, we will use the Mann Whitney U test (a nonparametric t-test).

df_res = heavy_SIP(physeq_rep3, ex="Treatment=='12C-Con'", comparison='H-v-H', hypo_test='wilcox') df_res %>% head(n=3)

What's the p-value and adjusted-pvalue distribution?

df_res$p %>% summary %>% print df_res$padj %>% summary %>% print

Again, no incorporators. The change in abundances must be pretty dramatic for heavy-SIP methods to ID incorporators, especially when there's many multiple hypotheses.

sessionInfo()

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.