Description Usage Arguments Fields and Methods Author(s) References See Also Examples
PhyloSim is an extensible object-oriented framework for the Monte Carlo simulation
of sequence evolution written in 100 percent R
.
It is built on the top of the R.oo
and ape
packages and uses
Gillespie's direct method to simulate substitutions, insertions and deletions.
Key features offered by the framework:
Simulation of the evolution of a set of discrete characters with arbitrary states evolving by a continuous-time Markov process with an arbitrary rate matrix.
Explicit implementations of the most popular substitution models (for nucleotides, amino acids and codons).
Simulation under the popular models of among-sites rate variation, like the gamma (+G) and invariants plus gamma (+I+G) models.
The possibility to simulate with arbitrarily complex patterns of among-sites rate variation by setting the site specific rates according to any R
expression.
Simulation with one or more separate insertion and/or deletion processes acting on the sequences, which sample the insertion/deletion length from an arbitrary discrete distribution or an R
expression (so all the probability distributions implemented in R
are readily available for this purpose).
Simulation of the effects of variable functional constraints over the sites by site-process-specific insertion and deletion tolerance parameters, which determine the rejection probability of a proposed insertion/deletion.
The possibility of having a different set of processes and site-process-specific parameters for every site, which allow for an arbitrary number of partitions in the simulated data.
Simulation of heterotachy and other cases of non-homogeneous evolution by allowing the user to set "node hook" functions altering the site properties at internal nodes of the phylogeny.
The possibility to export the counts of various events ("branch statistics") as phylo objects (see exportStatTree.PhyloSim
).
General notes:
The Sequence
objects have no "immortal links". The simulation
is aborted if the sequence length shrinks to zero. It is up to the user
to choose sensible indel rates and sequence lengths to prevent that.
The sites near the beginning and end of the sequences have less sites proposing insertion and deletion events around the so the insertion and deletion processes have an "edge effect". The user can simulate realistic flanking sequences to alleviate the edge effect in the simulation settings where it may be an issue.
Notes on performance:
The pure R
implementation offers felxibility, but also comes
with a slower simulation speed. If the PSIM_FAST
object is present in the environment, a "fast & careless"
mode is enabled. In this mode most of the error checking is skipped, increasing the speed.
It is recomended that simulations are only run in fast mode if you are sure that the simulation
settings are free from errors. It is probably a good practice to set up the simulations in normal mode
with short sequences and enable fast mode when running the actual simulation with long sequences.
Please note, that no "branch statistics" are saved in fast mode.
Logging also has a negative impact on performance, so it's not a good idea to run large simulations with the logging enabled.
The time needed to run a simulation depends not only on the number of the sites, but also on the length of the tree.
Constructing Sequence
objects with large number of sites is expensive. Avoid doing
that inside a cycle.
In the case of Sequence
objects with a large number of sites (more than 10 000) the
amount of available memory can be limiting as well.
The examples below demonstrate only some more common simulation settings,
the framework offers much more flexibility. See the package
vignette (vignette("PhyloSim",package="phylosim")
) and the
examples directory (http://github.com/bsipos/phylosim/tree/master/examples/)
for additional examples.
Package:
Class PhyloSim
Object
~~|
~~+--
PSRoot
~~~~~~~|
~~~~~~~+--
PhyloSim
Directly known subclasses:
public static class PhyloSim
extends PSRoot
1 |
phylo |
A rooted phylo object, constructed by the APE package. |
root.seq |
A valid Sequence object with Process objects attached. Used as the starting sequence during simulation. |
name |
The name of the object (a character vector of length one). |
log.file |
Name of the file used for logging. |
log.level |
An integer specifying the verbosity of logging (see |
... |
Not used. |
Methods:
Debug | - | |
Log | - | |
Simulate | - | |
Undocumented | - | |
as.character | - | |
attachHookToNode | - | |
attachSeqToNode | - | |
checkConsistency | - | |
detachHookFromNode | - | |
detachSeqFromNode | - | |
exportStatTree | - | |
getAlignment | - | |
getAlignmentLength | - | |
getBranchEvents | - | |
getEdge | - | |
getEdges | - | |
getId | - | |
getLogFile | - | |
getLogLevel | - | |
getName | - | |
getNedges | - | |
getNodes | - | |
getNtips | - | |
getPhylo | - | |
getRootNode | - | |
getRootSeq | - | |
getSeqFromNode | - | |
getSequences | - | |
getTipLabels | - | |
getTips | - | |
getTreeLength | - | |
is.tip | - | |
plot | - | |
readAlignment | - | |
readTree | - | |
saveAlignment | - | |
scaleTree | - | |
setAlignment | - | |
setBranchEvents | - | |
setEdges | - | |
setId | - | |
setLogFile | - | |
setLogLevel | - | |
setName | - | |
setNedges | - | |
setNodes | - | |
setNtips | - | |
setPhylo | - | |
setRootNode | - | |
setRootSeq | - | |
setSequences | - | |
setTipLabels | - | |
setTips | - | |
setTreeLength | - | |
summary | - | |
Methods inherited from PSRoot:
checkConsistency, enableVirtual, getComments, getMethodsList, globalConsistencyCheck, intersect.list, is, is.na, ll, my.all.equal, plot, setComments, setMethodsList, summary, virtualAssignmentForbidden
Methods inherited from Object:
$, $<-, [[, [[<-, as.character, attach, attachLocally, clearCache, clearLookupCache, clone, detach, equals, extend, finalize, getEnvironment, getFieldModifier, getFieldModifiers, getFields, getInstantiationTime, getStaticInstance, hasField, hashCode, ll, load, names, objectSize, print, save
Botond Sipos, Gregory Jordan
Gillespie, DT (1977) Exact stochastic simulation of coupled chemical reactions - J. Phys. Chem. 81 (25):2340-2361 http://dx.doi.org/10.1021/j100540a008
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 | set.seed(1)
## The following examples demonstrate
## the typical use of the framework.
## See the package vignette and
## \url{http://github.com/bsipos/phylosim/tree/master/examples/}
## The ll() method gives information about the methods defined
## in the immediate class of an object.
## Useful when exploring the framework.
s<-Sequence()
ll(s)
ll(PhyloSim())
ll(GTR())
## Example 1 - A short simulation:
## simulate nucleotide seqeunces and display
## the resulting alignment matrix.
Simulate(
PhyloSim(phy=rcoal(3),
root=NucleotideSequence(string="ATGC", proc=list(list(JC69())) ) )
)$alignment
# Construct a phylo object for the following
# simulations, scale total tree length to 1:
tmp<-PhyloSim(phylo=rcoal(3))
scaleTree(tmp,1/tmp$treeLength)
tmp$treeLength
t<-tmp$phylo
## Example 3 - simulating rate variation,
## insertions and deletions.
## See the examples/example_3_clean.R file
## in the phylosim GitHub repository.
# construct a GTR process object
gtr<-GTR(
name="MyGTR",
rate.params=list(
"a"=1, "b"=2, "c"=3,
"d"=1, "e"=2, "f"=3
),
base.freqs=c(2,2,1,1)/6
)
# get object summary
summary(gtr)
# display a bubble plot
plot(gtr)
# construct root sequence object
s<-NucleotideSequence(length=20)
# attach process via virtual field
s$processes<-list(list(gtr))
# sample states from the equilibrium
# distribution of the attached processes
sampleStates(s)
# create among-site rate variation by sampling
# the "rate.multiplier" site-process-specific parameter
# from a discrete gamma distribution (GTR+G).
plusGamma(s,gtr,shape=0.1)
# make the range 11:12 invariable
setRateMultipliers(s,gtr,0,11:12)
# get the rate multipliers for s and gtr
getRateMultipliers(s,gtr)
# proposing lengths in the range 1:3
d<-DiscreteDeletor(
rate=0.1,
name="MyDel",
sizes=c(1:3),
probs=c(3/6,2/6,1/6)
)
# get object
summary(d)
# plot deletion length distribution
plot(d)
# attach deletion process d to sequence s
attachProcess(s,d)
# create a region rejecting all deletions
setDeletionTolerance(s,d,0,11:12)
# construct an insertion process object
# proposing lengths in the range 1:3
i<-DiscreteInsertor(
rate=0.1,
name="MyDel",
sizes=c(1:2),
probs=c(1/2,1/2),
template.seq=NucleotideSequence(length=1,processes=list(list(JC69())))
)
# states will be sampled from the JC69 equilibrium distribution
# get object
summary(i)
# plot insertion length distribution
plot(i)
# attach insertion process i to sequence s
attachProcess(s,i)
# create a region rejecting all insertions
setInsertionTolerance(s,i,0,11:12)
# plot total site rates
plot(s)
# construct simulation object
sim<-PhyloSim(root.seq=s, phylo=t)
# get object summary
summary(sim)
# plot tree
plot(sim)
# run simulation
Simulate(sim)
# get the list of recorded per-branch event counts
getBranchEvents(sim)
# export the number of substitutions as a phylo object
subst<-exportStatTree(sim,"substitution")
# plot the exported phylo object
plot(subst)
# plot tree and alignment
plot(sim)
# save and display alingment
file<-paste("PhyloSim_dummy_fasta_",Sys.getpid(),".fas",sep="");
saveAlignment(sim,file=file);
# print out the Fasta file
cat(paste(scan(file=file,what=character(),sep="\n"),collapse="\n"));cat("\n");
# delete Fasta file
unlink(file);
# See \url{http://github.com/bsipos/phylosim/tree/master/examples/examples_class.R}
# for the full list of PhyloSim constructor examples.
## See the package vignette and
## the GitHub repository for even more examples.
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.