View source: R/coalescent.sim.R
coalescent.sim | R Documentation |
This funtion allows the user to simulate a phylogenetic tree, as well as phenotypic and genetic data, including associated and unassociated loci.
coalescent.sim(
n.ind = 100,
n.snps = 10000,
n.subs = 1,
n.snps.assoc = 0,
assoc.prob = 100,
n.phen.subs = 15,
phen = NULL,
plot = TRUE,
heatmap = FALSE,
reconstruct = FALSE,
dist.dna.model = "JC69",
grp.min = 0.25,
row.names = TRUE,
set = 1,
tree = NULL,
coaltree = TRUE,
s = 20,
af = 10,
filename.plot = NULL,
seed = NULL
)
n.ind |
An integer specifying the number of individual genomes to simulate (ie. the number of terminal nodes in the tree). |
n.snps |
An integer specifying the number of genetic loci to simulate. |
n.subs |
Either an integer or a vector (containing a distribution) that is used to determine the number of substitutions to occur on the phylogenetic tree for each genetic locus (see details). |
n.snps.assoc |
An optional integer specifying the number of genetic loci |
assoc.prob |
An optional integer (> 0, <= 100) specifying the strength of the association between the n.snps.assoc loci and the phenotype (see details). |
n.phen.subs |
An integer specifying the expected number of phenotypic substitutions to occur on the phylogenetic tree (through the same process as the n.subs parameter when n.subs is an integer (see details)). |
phen |
An optional vector containing a phenotype for each of the n.ind individuals if no phenotypic simulation is desired. |
plot |
A logical indicating whether to generate a plot of the phylogenetic tree ( |
heatmap |
A logical indicating whether to produce a heatmap of the genetic distance between the simulated genomes of the n.ind individuals. |
reconstruct |
Either a logical indicating whether to attempt to reconstruct a phylogenetic tree using the simulated genetic data, or one of c("UPGMA", "nj", "ml") to specify that tree reconstruction is desired by one of these three methods (Unweighted Pair Group Method with Arithmetic Mean, Neighbour-Joining, Maximum-Likelihood). |
dist.dna.model |
A character string specifying the type of model to use in reconstructing the phylogenetic tree for
calculating the genetic distance between individual genomes, only used if |
grp.min |
An optional number between 0.1 and 0.9 to control the proportional size of the smaller phenotypic group. |
row.names |
An optional vector containing row names for the individuals to be simulated. |
set |
An integer (1, 2, or 3) required to select the method of generating associated loci if |
coaltree |
A logical indicating whether to generate a coalescent tree ( |
s |
If |
af |
If |
filename.plot |
An optional character string denoting the file location for saving any plots produced; else |
seed |
An optional integer to control the pseudo-randomisation process and allow for identical repeat runs of the function;
else |
Homoplasy Distribution
The homoplasy distribution contains the number of substitutions per site.
If the value of the n.subs
parameter is set to an integer, this integer is
used as the parameter of a Poisson distribution from which the number of substitutions to
occur on the phylogenetic tree is drawn for each of the n.snps
simulated genetic loci.
The n.subs
argument can also be used to provide a distribution
to define the number of substitutions per site.
It must be in the form of a named vector (or table), or a vector in which the i'th element
contains the number of loci that have been estimated to undergo i substitutions on the tree.
The vector must be of length max n.subs, and "empty" indices must contain zeros.
For example: the vector n.subs = c(1833, 642, 17, 6, 1, 0, 0, 1)
,
could be used to define the homoplasy distribution for a dataset with 2500 loci,
where the maximum number of substitutions to be undergone on the tree by any locus is 8,
and no loci undergo either 6 or 7 substitutions.
Association Probability
The assoc.prob
parameter is only functional when set
is set to 1.
If so, assoc.prob
controls the strength of association through a process analagous to dilution.
All n.snps.assoc
loci are initially simulated to undergo a substitution
every time the phenotype undergoes a substitution (ie. perfect association).
The assoc.prob parameter then acts like a dilution factor, removing (100 - assoc.prob)%
of the substitutions that occurred during simulation under perfect association.
Caitlin Collins caitiecollins@gmail.com
## Not run:
## load example homoplasy distribution
data(dist_0)
str(dist_0)
## simulate a matrix with 10 associated loci:
dat <- coalescent.sim(n.ind = 100,
n.snps = 1000,
n.subs = dist_0,
n.snps.assoc = 10,
assoc.prob = 90,
n.phen.subs = 15,
phen = NULL,
plot = TRUE,
heatmap = FALSE,
reconstruct = FALSE,
dist.dna.model = "JC69",
grp.min = 0.25,
row.names = NULL,
coaltree = TRUE,
s = NULL,
af = NULL,
filename = NULL,
set = 1,
seed = 1)
## examine output:
str(dat)
## isolate elements of output:
snps <- dat$snps
phen <- dat$phen
snps.assoc <- dat$snps.assoc
tree <- dat$tree
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.