knitr::opts_chunk$set(echo = TRUE)
We need to specify some inclusion criteria: host_sample_min
specifies the minimum number of samples a host must have; taxa must have at least a presence of count_threshold
in at least the proportion of per-host samples specified by sample_threshold
to be included (otherwise they are groups into an "other" category).
The loaded object is a phyloseq object and can be manipulated as such!
suppressWarnings(suppressMessages(library(phyloseq))) suppressWarnings(suppressMessages(library(ROL))) tax_level <- "ASV" data <- load_data(tax_level = tax_level, host_sample_min = 75, count_threshold = 5, sample_threshold = 0.2) metadata <- sample_data(data) cat(paste0("There are ",length(unique(metadata$sname))," unique hosts in this data set and a total of ",phyloseq::nsamples(data)," samples!\n"))
Stacked bar plots can be rendered per-host (here, for host "Pebble") as:
plot_timecourse(data, host = "PEB", gapped = FALSE, legend = TRUE, selected_samples = NULL, show_plot = TRUE)
The gapped
argument specifies whether or not to visualize gaps in sampling:
plot_timecourse(data, host = "PEB", gapped = TRUE, legend = TRUE, selected_samples = NULL, show_plot = TRUE)
All subsequent computation is pretty costly, so we'll use phyloseq's subset_samples
method to subset to a tiny data set of three individuals: "Wiper" (WIP), "Vapour" (VAP), and "Nike" (NIK).
include_hosts <- c("WIP", "VAP", "NIK") data <- subset_samples(data, sname %in% include_hosts) metadata <- sample_data(data) cat(paste0("There are ",length(unique(metadata$sname))," unique hosts in this data set and a total of ",phyloseq::nsamples(data)," samples!\n"))
We can visualize autocorrelation in two ways: mean autocorrelation only (relatively quick to calculate) or intervals of autocorrelation, which repeatedly resamples.
Mean AC is visualized as:
lagged_ac <- calc_autocorrelation(data, lag_units = "weeks", lag_max = 52*2, resample = FALSE) plot_mean_autocorrelation(lagged_ac, show_plot = TRUE)
To visualize with an uncertainty interval, we would specify the arguments resample`` and
resample_rate```. Note that this (un-optimized) code can take hours to run!
Models are fit in the additive logratio and -- very anecdotally -- optimization seem fits seem stabler if we use as the ALR reference taxon, one with about median log abundance. The formalize_parameters
function calculates this for us and returns the empirical variance, which we can use (but don't by default) to scale the kernel parameters.
params <- formalize_parameters(data) str(params)
For demo purposes, we'll fit two models for each host, a MAP (point estimate) only fit and a full posterior fit with just a few posterior samples for illustration purposes.
for(host in include_hosts) { # fit MAP and full; the output is saved to designated directories fit_GP(data, host = host, tax_level = tax_level, SE_days_to_baseline = 90, alr_ref = params$alr_ref, MAP = TRUE) fit_GP(data, host = host, tax_level = tax_level, SE_days_to_baseline = 90, alr_ref = params$alr_ref, n_samples = 10, MAP = FALSE) }
We'll visualize the posterior predictive plot over an arbitrary taxon (via the predict_coords
argument). This function assumes that a fitted model meeting these criteria (host and taxonomic level) exists in the appropriate directory. Note, this function also takes minutes to run as prediction over long time intervals can be slow.
plot_posterior_predictive(host = "WIP", tax_level = tax_level, predict_coords = c(1), show_plot = TRUE)
And we can visualize the MAP covariance and correlation matrices between taxa in the ALR. (Still to do: allow for visualization in other logratio representations.)
plot_MAP_covariance(host = "WIP", tax_level = tax_level, show_plot = TRUE)
We first need to calculate posterior distances between samples. Though it's trivial to visualize, we'll do this on the MAP estimates since it will be very fast. Then we calculate the embedding, which returns the top eigenvectors.
calc_posterior_distances(tax_level = tax_level, which_measure = "Sigma", MAP = FALSE) embed_posteriors(tax_level = tax_level, which_measure = "Sigma", MAP = TRUE)
Finally, plot the embedding.
plot_ordination(tax_level = tax_level, which_measure = "Sigma", annotation = "host", MAP = TRUE, show_plot = TRUE)
Load the MAP estimates of bacterial covariance in the CLR.
logratio <- "clr" Sigmas <- load_MAP_estimates(tax_level = tax_level, logratio = logratio)
Then plot the heatmap, where our 3 hosts are rows and all interactions between taxa are columns. These are hierarchically clustered, such that like interactions (across hosts) should be visible as vertical stripes.
plot_interaction_heatmap(tax_level = tax_level, logratio = logratio, Sigmas = Sigmas, show_plot = TRUE, return_matrix = FALSE)
Get the interaction matrix (hote that this can be returned by plot_interaction_heatmap
above).
interaction_matrix <- get_pairwise_correlations(tax_level = tax_level, logratio = logratio, Sigmas = Sigmas) str(interaction_matrix)
Get the (weighted) universality score for (1) an single interaction and (2) a CLR sequence variant (~bacterium).
single_pairwise_score <- calc_universality_score(interaction_matrix$interactions[,1]) cat("The universality score of interaction",interaction_matrix$labels[1],"is",round(single_pairwise_score, 3),"\n") sv_scores <- calc_microbe_universality_score(tax_level = tax_level, Sigmas = Sigmas) cat(names(sv_scores)[1],"has universality score:",round(sv_scores[[1]], 3),"\n")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.