knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
This vignette summarizes the workflow for simulating a host and symbiont tree using the
This model has a number of parameters that are input into the
sim_cophyBD function. Let us assume that we just want to simulate to a time of 2.0 units. We can then use
ave_tips_lt to figure out the average number of extant tips that will be on our host and symbiont trees respectively.
library(treeducken) set.seed(42) exp_tips_host <- ave_tips_st(1.0, 0.5, 2.0) # on average we will get about 5 tips # maybe we want something more like 8, so let's decrease the extinction rate exp_tips_host <- ave_tips_st(1.0, 0.3, 2.0) # that looks about right, let's assume there are no host speciations that are # not cospeciations h_lambda <- 0.0 h_mu <- 0.3 c_lambda <- 1.0 # now we need to worry about the symbiont tree since only cospeciations are # occurring # we can use the same math as for the locus tree simulator exp_tips_symb <- ave_tips_lt(t = 2.0, dup_rate = 0.2, loss_rate = 0.1, 8) # a little less than 10 symbiont tips on average s_lambda <- 0.2 s_mu <- 0.1 # let's assume that when symbionts speciate that they always inherit their # ancestor's host repertoire s_her <- 0.0
Now we have all the necessary parameters set with some idea of what they mean.
We will now use the main function,
sim_cophyBD with the parameters we set in the previous section.
Let's simulate ten sets and then we can print out a summary for each set.
# simulate 10 paired trees with our set parameters. host_symb_sets <- sim_cophyBD(hbr = h_lambda, hdr = h_mu, sbr = s_lambda, cosp_rate = c_lambda, sdr = s_mu, host_exp_rate = s_her, time_to_sim = 2.0, numbsim = 10) print(host_symb_sets, details = TRUE)
Notice that some of these are not particularly interesting with only 2 host or symbiont tips. This is expected behavior, and occurs since there is a nonzero probability that no events occur in the time we simulated.
Let's check out one set.
# # plot one or two of them tree_set_of_interest <- host_symb_sets[] print(tree_set_of_interest)
plot(tree_set_of_interest, col = "red", lty = "dotted") add_events(tree_set_of_interest)
Notice that the output of
sim_cophyBD is a list of
cophy object is composed of 4 elements:
symb_tree are both of the
phylo object (from the ape package).
association_mat which is a matrix with number of rows equal to the number of extant tips in the symbiont tree, and number of columns equal to the number of extant tips in the host tree.
event_history is a data frame which shows the full history of the simulation.
It contains which events occur on which symbiont and/or host and at which time.
The following event code is used: "C"- cospeciation, "HG" - host gain (a host speciation), "HL" - host less (host extinction), "SG" - symbiont gain (symbiont speciation), "SL" - symbiont loss (symbiont extinction), "AG" - association gain, and "AL" - association loss.
This bit is a mess at present so please let me know if you think of ways I could trim this down.
We can also perform the parafit test using the results.
host_tree <- host_tree(tree_set_of_interest) symb_tree <- symb_tree(tree_set_of_interest) a <- association_mat(tree_set_of_interest) d <- parafit_stat(host_tr = host_tree, symb_tr = symb_tree, assoc_mat = a) parafit_test(host_tr = host_tree, symb_tr = symb_tree, assoc_mat = a, D = d, reps = 99)
We can also do a full summary of the results with the following function.
This tabulates the number of each event in the
event_history data frame, and then performs the Parafit test of Legendre et al. 2004.
# of course that bit is maybe a little too verbose # we may be interested in a lot of trees # we can instead use cophylo_summary_stat host_symb_summary_df <- summarize_cophy(host_symb_sets) host_symb_summary_df
To model gene tree-species tree discordance on these trees we use the three-tree model (Rasmussen and Kellis 2012). This model has three levels: species trees, locus trees, and gene trees. The species tree models speciation and extinction, the locus tree models gene birth, death and transfers, and the gene tree models incomplete lineage sorting. Transfers can occur randomly to any recipient throughout a tree or to species that are more closely related to the transfer donor species.
We can use either symbiont or host trees to simulate locus trees. First we will use the host tree:
host_tree_locus_trees <- sim_ltBD(host_tree, gbr = 0.4, gdr = 0.2, lgtr = 0.0, num_loci = 10) host_tree_locus_trees
Now we will use the symbiont tree:
symb_tree_locus_trees <- sim_ltBD(symb_tree, gbr = 0.2, gdr = 0.1, lgtr = 0.1, num_loci = 10) str(symb_tree_locus_trees[])
Using these locus trees we can simulate under the multi-locus coalescent process (see Rasmussen and Kellis 2012). We can specify an effective population size and a generation time measured in generations per unit time. For now we will assume that the time of our trees is in units of millions of years. The default for generation time is then 1 time per year (1e-6). And we will randomly draw an effective population size from a Lognormal with mean 14 and standard deviation 0.4. Note that the only reason these values chosen is to mirror the validation test of Mallo et al. (2015) for the SimPhy program.
host_locus_tree <- host_tree_locus_trees[] # randomly choose an effective popsize popsize <- 1e6 host_loci_gene_trees <- sim_mlc(host_locus_tree, effective_pop_size = popsize, num_reps = 100)
And we can calculate some tree summary statistics on all of genes that were simulated on one of the locus trees (in this case the first one).
You can also combine a host tree, symbiont tree, and association matrix into a
cophy object that can be input into functions in
treeducken. The example below reads in the classic gopher and lice dataset from Hafner et al. 1994.
The more common association table format with two columns with hosts in the first and symbionts (or parasites) in the second column is converted into an association matrix.
These are all then converted into
You can then print out a summary of these and then calculate summary statistics on these.
Currently, this only calculates the parafit statistic and conducts the permutation test for that statistic if the
event_history element of
cophy is not set.
If that is set it will count the numbers of each event.
gopher_lice_map <- read.table(system.file("extdata", "gopher_lice_mapping.txt", package = "treeducken"), stringsAsFactors = FALSE, header = TRUE) interaction_mat <- make_mat(gopher_lice_map) gopher_tree <- ape::read.nexus(system.file("extdata", "gophers_bd.tre", package = "treeducken")) lice_tree <- ape::read.nexus(system.file("extdata", "lice_bd.tre", package = "treeducken")) gopher_lice_cophylo <- to_cophy(hostTree = gopher_tree, symbTree = lice_tree, assocMat = interaction_mat) print(gopher_lice_cophylo) summarize_cophy(gopher_lice_cophylo) plot(gopher_lice_cophylo, fsize = 0.5, show_tip_label = FALSE, gap = 1, col = "purple", lty = "dashed")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.