library(treeduckenValidation) library(ggplot2) library(khroma)
The following are validation tests of the treeducken
package with accompanying figures.
We validated each level of the three-tree model: species, locus, and gene. Species trees are able to be simulated using two different algorithms: the generalized sampling algorithm (GSA) \cite{Hartmann2010} and the simple sampling algorithm (SSA) \cite{Stadler2010}. To test the former, we simulated under six different birth rates ($\lambda$) with death rates ($\mu$) set such that $\mu / \lambda = 0.25$ using 10000 replicate trees for each rate and 50 tips for each tree. The mean heights of the six different sets of birth and death rates were then compared to the mean heights of trees generated using {\tt TreeSim} \cite{Stadler2010}. To test the latter, we simulated under the same six birth and death rates from above with 10000 replicates for each rate, and a fixed tree height, $t = 2.0$. The average number of tips for each set of birth and death rates were then compared to the theoretical expectation $E(N(t)) = 2e^{(\lambda - \mu) t}$ under this model \citep{mooers2012branch}.
# first decide on parameters sbr <- seq(from= 1, to = 6) sdr <- sbr / 4 num_tips <- 50 reps <- 1000 # run the main function gsa_spt_nds <- treeduckenValidation::gsa_test_data_generation(sbr, sdr, num_tips, reps = reps) plot_df <- format_gsa_test_df(gsa_spt_nds, sbr) ggplot2::ggplot(data = plot_df, ggplot2::aes(x = key, y = value, fill = simulator)) + ggplot2::geom_boxplot() + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Speciation rate") + ggplot2::ylab("Tree depth")
sim_time <- 1.0 ssa_spt_tips <- ssa_test_data_generation(sbr, sdr, time = sim_time, reps = reps) ssa_tidy_df <- format_ssa_test(ssa_spt_tips, sbr) ssa_exp_tidy_df <- get_ssa_expected_values(sbr, sdr, sim_time, ssa_tidy_df) ggplot2::ggplot(data = ssa_tidy_df, ggplot2::aes(x = key, y = value)) + ggplot2::geom_boxplot() + ggplot2::geom_point(data = ssa_exp_tidy_df, ggplot2::aes(x = key, y = value, col = "red", size = 0.5)) + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Speciation rate") + ggplot2::ylab("Number of leaves") + ggplot2::guides(col=FALSE, size=FALSE)
To determine if the locus trees were simulated correctly we tested that our trees had the expected number of tips. This was done using six different gene birth rates ($\delta$) and gene loss rates ($l$) with a fixed species tree depth of 2.0, and $\lambda = 1.0$ and $\mu = 0.5$. The average number of tips for each set of gene birth and loss rates were then compared to the theoretical expectation: $E(N(t)) = \frac{n e^{(\delta - l)t}}{1 - m^{n - 2}}$ with $m = \frac{l (1 - e^{-(\delta - l)t})}{\delta - l e^{-(\delta - l)t}}$ with $n$ being the number of species on the species tree.
num_tips <- 20 gbr <- seq(from = 1, to = 5) / 5 gdr <- gbr / 4 loctr_tips <- locus_tree_test(gbr, gdr, num_tips, reps) loctr_tips_tidy_df <- format_loctree_test(loctr_tips, gbr) loct_expected_tidy_df <- get_loctr_expected_values(gbr, gdr, num_tips, loctr_tips, loctr_tips_tidy_df) ggplot2::ggplot(data = loctr_tips_tidy_df, ggplot2::aes(x = key, y = value)) + ggplot2::geom_boxplot() + ggplot2::geom_point(data = loct_expected_tidy_df, ggplot2::aes(x = key, y = log(value), col = "red", size = 0.5)) + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Gene Birte Rate") + ggplot2::ylab("Number of leaves") + ggplot2::guides(col = FALSE, size = FALSE) # make into dataframe # make into dataframe # name columns # plot
Gene trees can be simulated using two different processes: the multispecies coalescent and the multilocus coalescent. The multispecies coalescent was tested by simulating 10 species trees each with $\lambda = 1.0$ and 25 tips. For each tree we then set population genetic parameters and simulated 100,000 gene trees. We set the generation time to $1\times10^-6$ time units per generation, and $1\times10^-9$ mutations per generation. The generation time value assumes our species tree is in time units of millions of years with 1 generation per year. The mutation rate was chosen to be a biologically realistic value based on the mutation rates of primates given in \citet{Rannala1645}. The effective population size was varied along orders of magnitudes from 10 to one million individuals. We then compared the time to most recent common ancestor (TMRCA) was to the expected TMRCA \citep{Rannala1645} The multilocus coalescent was validated in a similar fashion, using the same parameters, but simulating under the multilocus coalescent. The simulated TMRCAs were then compared with the expectated TMRCA given in \citet{Rasmussen2012}.
gt_reps <- reps tips <- rep(25, times = test_reps) ne <- c(10, 100, 1000, 10000, 100000, 1000000) test_reps <- length(ne) gt_tmrcas <- gene_tree_msc_test(theta = ne, tips, gt_reps = gt_reps, test_reps = test_reps) gt_tidy_df <- format_gene_tree_df(gt_tmrcas, ne) # expected tmrca (might nee other stuff?) ggplot2::ggplot(data = gt_tidy_df, ggplot2::aes(x = key, y = log(value))) + ggplot2::geom_boxplot() + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Population size") + ggplot2::ylab("Time to most recent common ancestor") + ggplot2::guides(col = FALSE, size = FALSE) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
gt_reps <- reps gt_mlc_tmrcas <- gene_tree_mlc_test(sbr = 1.0, popsize = ne, gbr = 0.2, locus_tree_reps = 10, gene_tree_reps = gt_reps, test_reps = test_reps) gt_mlc_tidy_df <- format_gene_tree_df(gt_mlc_tmrcas, theta) # expected tmrca (might nee other stuff?) ggplot2::ggplot(data = gt_mlc_tidy_df, ggplot2::aes(x = key, y = log(value))) + ggplot2::geom_boxplot() + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Population size") + ggplot2::ylab("Time to most recent common ancestor") + ggplot2::guides(col = FALSE, size = FALSE) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
We validated the host tree for the cophylogenetic birth-death model simulation.
The symbiont tree and associations have no direct effect on the outcome of the host tree simulation directly.
Thus, we compared the host tree output with results from the constant rate birth-death process simulated using the SSA.
Specifically, we compared the expected number of tips, $E(N(t))$, of the host tree over a time, $t$, with the theoretical expectation given by \citet{mooers2012branch}: $E(N(t)) = 2e^{(\lambda - \mu) t}$ where $\lambda$ and $\mu$ are the speciation and extinction rates of the constant rate birth-death process.
Here the speciation rate is the sum of the host speciation rate ($\lambda_H$) and cospeciation rate ($\lambda_C$), and the extinction rate is the host extinction rate ($\mu_H$).
We simulated 10000 host-symbiont tree sets for $t = 2.0$ time units under 10 different host speciation rates, $\lambda_H = (0.1, 0.2, \ldots, 1.0)$, and a cospeciation rate of $\lambda_C = 1.0$.
We set extinction rates set such that the turnover (i.e. $\mu / \lambda$) was kept constant between simulation scenarios.
We validated the symbiont tree by comparing results with those found under the gene tree-species tree model. We used nine combinations of three different symbiont speciation rates and three different symbiont extinction rates with no host speciation or extinction, a cospeciation rate of $\lambda_C = 0.5$ and $t = 2.0$. We compared the average number of tips for each set of symbiont speciation and extinction rates to the theoretical expectation for the expected number of leaves of a locus tree: (E(N(t)) = \frac{n e^{\left(\lambda_S - \mu_S\right)t}}{1 - m^{n - 2}},) with $m = \frac{\mu_S (1 - e^{-(\lambda_S - l)t})}{\lambda_S - l e^{-(\lambda_S - l)t}}$ % TAH: this should be formatted better. Also there is a ")" missing in the numerator of m with $n$ being the number of species on the host tree, and $\lambda_S$ and $\mu_S$ are the symbiont speciation and extinction rate respectively \citep{Mallo2016}.
lambda_h <- seq(from = 1, to = 10) / 10 lambda_c <- 1.0 mu_h <- (lambda_h + lambda_c) / 2 ht_test_tips <- host_tree_test(hbr = lambda_h, hdr = mu_h, cosp_rate = lambda_c, reps = reps, time = sim_time) ht_test_tidy_df <- format_host_test_data(ht_test_tips, lambda_h) ht_expected_tips_tidy_df <- get_host_expected_values(lambda_h, lambda_c, mu_h, sim_time, ht_test_tidy_df) ggplot2::ggplot(data = ht_test_tidy_df, ggplot2::aes(x = key, y = value)) + ggplot2::geom_boxplot() + ggplot2::geom_point(data = ht_expected_tips_tidy_df, ggplot2::aes(x = key, y = value, col = "red", size = 0.5)) + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Host speciation rate") + ggplot2::ylab("Number of leaves") + guides(col=FALSE, size=FALSE)
lambda_c <- 0.5 lambda_s <- seq(from = 1, to = 3) / 1.0 mu_s <- lambda_s / 4 symbt_test_tips <- symb_tree_test(cosp_rate = lambda_c, sbr = lambda_s, sdr = mu_s, reps = reps, time = sim_time) symbt_expected_tips <- matrix(nrow = reps, ncol = length(lambda_s)) for(i in seq_len(length(lambda_s))) { for(j in 1:reps) { symbt_expected_tips[j, i] <- treeducken::calculate_expected_leaves_locustree( t = sim_time, dup_rate = lambda_s[i], loss_rate = mu_s[i], num_species = symbt_test_tips[j, i + length(lambda_s)]) if(is.infinite(symbt_expected_tips[j, i])) symbt_expected_tips[j, i] <- NA } } symbt_test_tips_df <- as.data.frame(symbt_test_tips[,1:length(lambda_s)]) symbt_column_namer <- function(lambda_s) { paste("lambda_s=", lambda_s) } colnames(symbt_test_tips_df) <- symbt_column_namer(lambda_s) symbt_test_tips_tidy_df <- tidyr::gather(symbt_test_tips_df) symbt_test_tips_tidy_df$type <- rep("observed", times = nrow(symbt_test_tips_tidy_df)) symbt_expected_tips_df <- as.data.frame(symbt_expected_tips) colnames(symbt_expected_tips_df) <- colnames(symbt_test_tips_df) symbt_expected_tips_df <- tidyr::gather(symbt_expected_tips_df) symbt_expected_tips_df$type <- rep("expected", times = nrow(symbt_expected_tips_df)) symbt_plot_df <- rbind(symbt_test_tips_tidy_df, symbt_expected_tips_df) ggplot2::ggplot(data = symbt_plot_df, ggplot2::aes(x = key, y = value, fill = type)) + ggplot2::geom_boxplot() + ggplot2::theme_bw() + khroma::scale_fill_bright() + ggplot2::xlab("Symbiont speciation rate") + ggplot2::ylab("Number of leaves")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.