As with all statistical models, the assumptions of DivNet
should be verified before interpreting the results. In this vignette, I will show you how to investigate a key assumption of the DivNet
model: that the log-ratio relative abundances are linear (or at least not strongly curved).
You do not need to do this if you are fitting DivNet
with categorical covariates, only if you are fitting DivNet
with continuous covariates.
We are going to use the Lee et al dataset and investigate the effect of temperature on phylum-level Shannon diversity. Recall that this dataset contains samples of microbial communities living on seafloor rocks near a hydrothermal vent.
We first load the data and necessary packages, aggregate counts to the phylum level, and subset to samples with temperature data.
if ("speedyseq" %in% row.names(installed.packages())) { library(speedyseq) # if speedyseq is installed, use it for tax_glom } else { library(phyloseq) # if not, use phyloseq for tax_glom } library(breakaway) library(DivNet) library(magrittr) library(ggplot2) data(Lee) lee_phylum <- Lee %>% tax_glom(taxrank = "Phylum") %>% subset_samples(temp != "<NA>")
Annoyingly, temperature is a categorical variable in this dataset, so we convert it to a continuous variable and run DivNet
, modelling the log-ratios as a linear function of temperature:
lee_phylum %>% sample_data %$% temp temp_continuous <- lee_phylum %>% sample_data %$% temp %>% as.character %>% as.numeric divnet_temperature <- lee_phylum %>% divnet(X = temp_continuous, base=1)
I'm a big fan of plotting the estimated diversity against the covariate before doing any hypothesis testing to make sure that everything looks reasonable.
shannon_table <- divnet_temperature %$% shannon %>% summary %>% cbind(temp_continuous) shannon_table %>% ggplot(aes(x = temp_continuous, y = estimate)) + geom_point() + geom_linerange(aes(x = temp_continuous, ymin = lower, ymax = upper))
Looks good to me! Let's go ahead and see if that upward trend is statistically significant given those wide error bars:
betta(shannon_table$estimate, shannon_table$error, cbind(1, divnet_temperature$X))$table
We reject the null hypothesis of no significant linear increase in Shannon diversity with temperature with $p < 10^{-3}$.
Before taking that result too seriously, let's check for model misspecification. The DivNet
model performs well when there is not significant curvature in the log-ratios. To investigate this, let's pull out the observed log-ratios (observed_yy
) as well as the fitted log-ratios (fitted_yy
).
fitted_yy <- DivNet::to_log_ratios(divnet_temperature$fitted_z, base = 1) observed_yy <- lee_phylum %>% otu_table %>% as.matrix %>% t %>% DivNet::to_log_ratios(base = 1)
Now let's put both the observed log-ratios and the fitted log-ratios against temperature. DivNet
assumes a linear model for all log-ratios, and since there are 20 taxa in this dataset, there are 19 log-ratios. Therefore I'm going to write a function that plots the log-ratios.
draw_log_ratios <- function(taxon) { data.frame(temperature = temp_continuous, log_ratio = fitted_yy[,taxon], log_ratio_observed = observed_yy[,taxon]) %>% ggplot(aes(x = temperature, y = log_ratio_observed)) + geom_point() + xlab("Temperature") + ylab("log-ratios") + geom_point(aes(y = log_ratio), col = "blue") + ggtitle("Observed (black) and fitted (blue) log-ratios against temperature") } draw_log_ratios(taxon=1) draw_log_ratios(taxon=2) draw_log_ratios(taxon=3)
I don't see any significant curvature in any of these plots. Note that it's not important that these scatterplots display a strong linear trend -- just that they are not strongly nonlinear. Note that we found that model is robust to mild nonlinearity, but its performance decreases as the nonlinear trend gets stronger.
In theory I should run the function and inspect those plots, but given that there may be hundreds or thousands of taxa model, I would recommend checking the most abundant taxa, since these are the ones that are most likely to throw off diversity estimation if misspecified.
Happy modelling!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.