View source: R/generate_gene_tree_msc_hgt_dl.R
generate_gene_tree_msc_hgt_dl | R Documentation |
Generate a random gene tree within a given species timetree, based on an extension of the multi-species coalescent (MSC) model that includes horizontal gene transfers (HGT, incorporation of non-homologous genes as new loci), gene duplication and gene loss. The simulation consists of two phases. In the first phase a random "locus tree" is generated in forward time, according to random HGT, duplication and loss events. In the 2nd phase, alleles picked randomly from each locus are coalesced in backward time according to the multispecies coalescent, an extension of the Wright-Fisher model to multiple species. This function does not account for hybridization.
generate_gene_tree_msc_hgt_dl( species_tree,
allele_counts = 1,
population_sizes = 1,
generation_times = 1,
mutation_rates = 1,
HGT_rates = 0,
duplication_rates = 0,
loss_rates = 0,
gene_edge_unit = "time",
Nsites = 1,
bottleneck_at_speciation = FALSE,
force_coalescence_at_root = FALSE,
ploidy = 1,
HGT_source_by_locus = FALSE,
HGT_only_to_empty_clades = FALSE,
no_loss_before_time = 0,
max_runtime = NULL,
include_event_times = TRUE)
species_tree |
Rooted timetree of class "phylo". The tree can include multifurcations and monofurcations. The tree need not necessarily be ultrametric, i.e. it may include extinct species. Edge lengths are assumed to be in time units. |
allele_counts |
Integer vector, listing the number of alleles sampled per species and per locus. This can be interpreted as the number if individual organisms surveyed from each species, assuming that all loci are included once from each individual. The number of tips in the generated gene tree will be equal to the sum of allele counts across all species.
|
population_sizes |
Integer vector, listing the effective population size on the edge leading into each tip/node in the species tree. Either |
generation_times |
Numeric vector, listing the generation time along the edge leading into each clade. Either |
mutation_rates |
Numeric vector, listing the probability of mutation per site and per generation along the edge leading into each clade. Either |
HGT_rates |
Numeric vector, listing horizontal gene transfer rates per lineage per time, along the edge leading into each clade. Either |
duplication_rates |
Numeric vector, listing gene duplication rates per locus per lineage per time, along the edge leading into each clade. Either |
loss_rates |
Numeric vector, listing gene loss rates per locus per lineage per time, along the edge leading into each clade. Either |
gene_edge_unit |
Character, either "time", "generations", "mutations_expected" (expected mean number of mutations per site), or "mutations_random" (randomly generated mean number of mutations per site), specifying how edge lengths in the gene tree should be measured. By default, gene-tree edges are measured in time, as is the case for the input species tree. |
Nsites |
Integer, specifying the number of sites (nucleotides) in the gene. Only relevant when generating edge lengths in terms of random mutation counts, i.e. if |
bottleneck_at_speciation |
Logical. If |
force_coalescence_at_root |
Logical. If |
ploidy |
Integer, specifying the assumed genetic ploidy, i.e. number of gene copies per individual. Typically 1 for haploids, or 2 for diploids. |
HGT_source_by_locus |
Logical. If |
HGT_only_to_empty_clades |
Logical, specifying whether HGT transfers only occur into clades with no current loci. |
no_loss_before_time |
Numeric, optional time since the root during which no gene losses shall occur (even if |
max_runtime |
Numeric, optional maximum computation time (in seconds) to allow for the simulation. Use this to avoid occasional explosions of runtimes, for example due to very large generated trees. Aborted simulations will return with the flag |
include_event_times |
Logical, specifying whether the times of HGT, duplication and loss events should be returned as well. If these are not needed, then set |
This function assumes that the species tree is a time tree, i.e. with edge lengths given in actual time units. If species_tree
is ultrametric and gene_edge_unit=="time"
, then the gene tree (but not necessarily the locus tree) will be ultrametric as well. The root of the locus and gene tree coincides with the root of the species tree.
The meaning of gene_edge_unit
is the same as for the function generate_gene_tree_msc
.
A named list with the following elements:
success |
Logical, indicating whether the gene tree was successfully generated. If |
locus_tree |
The generated locus timetree, of class "phylo". The locus tree describes the genealogy of loci due to HGT, duplication and loss events. Each tip and node of the locus tree is embedded within a specific species edge. For example, tips of the locus tree either coincide with tips of the species tree (if the locus persisted until the species went extinct or until the present) or they correspond to gene loss events. In the absence of any HGT, duplication and loss events, the locus tree will resemble the species tree. |
locus_type |
Character vector of length NLtips + NLnodes (where NLtips and NLnodes are the number of tips and nodes in the locus tree, respectively), specifying the type/origin of each tip and node in the locus tree. For nodes, type 'h' corresponds to an HGT event, type 'd' to a duplication event, and type 's' to a speciation event. For tips, type 'l' represents a loss event, and type 't' a terminal locus (i.e., coinciding with a species tip). For example, if the input species tree was an ultrametric tree representing only extant species, then the locus tree tips of type 't' are the loci that could potentially be sampled from those extant species. |
locus2clade |
Integer vector of length NLtips + NLnodes, with values in NStips+NSnodes, specifying for every locus tip or node the correspondng "host" species tip or node. |
HGT_times |
Numeric vector, listing HGT event times (counted since the root) in ascending order. Only included if |
HGT_source_clades |
Integer vector of the same length as |
HGT_target_clades |
Integer vector of the same length as |
duplication_times |
Numeric vector, listing gene duplication event times (counted since the root) in ascending order. Only included if |
duplication_clades |
Integer vector of the same length as |
loss_times |
Numeric vector, listing gene loss event times (counted since the root) in ascending order. Only included if |
loss_clades |
Integer vector of the same length as |
gene_tree |
The generated gene tree, of type "phylo". |
gene_tip2species_tip |
Integer vector of length NGtips (where NGtips is the number of tips in the gene tree) with values in 1,..,Ntips+Nnodes, mapping gene-tree tips to species-tree tips. |
gene_tip2locus_tip |
Integer vector of length NGtips with values in 1,..,NLtips, mapping gene-tree tips to locus-tree tips. |
gene_node2locus_edge |
Integer vector of length NGnodes with values in 1,..,NLedges, mapping gene-tree nodes to locus-tree edges. |
gene_clade_times |
Numeric vector of size NGtips+NGnodes, listing the time (temporal distance from species root) of each tip and node in the gene tree. The units will be the same as the time units of the species tree. Note that this may include negative values, if some gene lineages coalesce at a greater age than the root. |
error |
Character, containing an explanation of the error that occurred. Only included if |
Stilianos Louca
J. H. Degnan, N. A. Rosenberg (2009). Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends in Ecology & Evolution. 24:332-340.
B. Rannala, Z. Yang (2003). Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics. 164:1645-1656.
generate_random_tree
,
generate_gene_tree_msc
# Simulate a simple species tree
parameters = list(birth_rate_factor=1)
Nspecies = 10
species_tree = generate_random_tree(parameters,max_tips=Nspecies)$tree
# Simulate a haploid gene tree within the species tree, including HGTs and gene loss
# Assume the same population size and generation time everywhere
# Assume the number of alleles samples per species is poisson-distributed
results = generate_gene_tree_msc_hgt_dl(species_tree,
allele_counts = rpois(Nspecies,3),
population_sizes = 1000,
generation_times = 1,
ploidy = 1,
HGT_rates = 0.1,
loss_rates = 0.05);
if(!results$success){
# simulation failed
cat(sprintf(" ERROR: %s\n",results$error))
}else{
# simulation succeeded
gene_tree = results$gene_tree
cat(sprintf(" Gene tree has %d tips\n",length(gene_tree$tip.label)))
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.