knitr::opts_chunk$set(fig.width=12, fig.height=7, echo=TRUE, warning=FALSE, message=FALSE, tidy=TRUE, collapse = TRUE, comment = "#>")
set.seed(12345) library(panstripe) library(ggplot2) library(patchwork)
We first simulate a pangenome presence/absence matrix and corresponding core genome phylogeny.
sim <- simulate_pan(rate = 1e-3)
We can now estimate the expected number of gene gain and loss events along each branch of the phylogeny and fit a tweedie regression model.
fit <- panstripe(sim$pa, sim$tree, quiet=TRUE)
We can test whether the expected gene gain/loss rate is significantly different from a model consisting of just errors by looking at the p-value of the fit.
fit$summary
We can also generate a plot of the resulting fit with confidence intervals.
plot_pangenome_params(fit)
It is useful to compare this with a plot of the data. Panstripe includes the option to generate a plot similar to that created by TempEst.
plot_pangenome_cumulative(fit)
We can use the patchwork package to combined these plots
plot_pangenome_params(fit) + plot_pangenome_cumulative(fit) + patchwork::plot_layout(nrow = 1)
For simplicity we consider a comparison with a simulated pangenome with very little accessory variation. This could reflect a species with a very stable set of genes such as M. tuberculosis.
sim_slow <- simulate_pan(rate = 1e-4)
Again we fit a Tweedie regression model after first estimating the gene gain/loss events per branch.
fit_slow <- panstripe(sim_slow$pa, sim_slow$tree) fit_slow$summary
We can now test to see whether there is a significant difference between the dynamics of these two pangenomes
comp <- compare_pangenomes(fit, fit_slow, nboot = 100) comp$summary
We can also plot the difference between the fits.
plot_pangenome_params(list( fast=fit, slow=fit_slow)) + plot_pangenome_cumulative(list( fast=fit, slow=fit_slow))
Sometimes we would like to know if it is the size rather than the rate of recombination events that is driving an apparent different in the rate of gene gain/loss. As we are fitting a tweedie regression model it is possible to tease apart the driving force of apparent accessory rate differences.
First lets simulate and fit two pangenomes. One with a larger average recombination size (in number of genes).
sim_small_rec <- simulate_pan(rate = 1e-3, mean_trans_size = 3) sim_large_rec <- simulate_pan(rate = 1e-3, mean_trans_size = 5) res_small_rec <- panstripe(sim_small_rec$pa, sim_small_rec$tree) res_large_rec <- panstripe(sim_large_rec$pa, sim_large_rec$tree)
We can first test whether the overall rate of gene gain/loss is different between the two pangenomes.
comp <- compare_pangenomes(res_small_rec, res_large_rec) comp$summary
As expected given the large difference in the average size of gene gain/loss events there is a significant difference in the total rate between the two pangenomes.
We can now look closer into what is driving this difference.
plot_pangenome_params(list( large=res_large_rec, small=res_small_rec))
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.