Description Usage Arguments Details Value Note Author(s) References See Also Examples
Simulate tumor progression including possible restrictions in the order of driver mutations. Optionally add passenger mutations. Simulation is done using the BNB algorithm of Mather et al., 2012.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 | oncoSimulIndiv(fp, model = "Exp",
numPassengers = 0, mu = 1e-6, muEF = NULL,
detectionSize = 1e8, detectionDrivers = 4,
detectionProb = NA,
sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1,
0.025),
initSize = 500, s = 0.1, sh = -1,
K = initSize/(exp(1) - 1), keepEvery = sampleEvery,
minDetectDrvCloneSz = "auto",
extraTime = 0,
finalTime = 0.25 * 25 * 365, onlyCancer = TRUE,
keepPhylog = FALSE,
mutationPropGrowth = ifelse(model == "Bozic",
FALSE, TRUE),
max.memory = 2000, max.wall.time = 200,
max.num.tries = 500,
errorHitWallTime = TRUE,
errorHitMaxTries = TRUE,
verbosity = 0,
initMutant = NULL,
AND_DrvProbExit = FALSE,
fixation = NULL,
seed = NULL)
oncoSimulPop(Nindiv, fp, model = "Exp", numPassengers = 0, mu = 1e-6,
muEF = NULL,
detectionSize = 1e8, detectionDrivers = 4,
detectionProb = NA,
sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1,
0.025),
initSize = 500, s = 0.1, sh = -1,
K = initSize/(exp(1) - 1), keepEvery = sampleEvery,
minDetectDrvCloneSz = "auto",
extraTime = 0,
finalTime = 0.25 * 25 * 365, onlyCancer = TRUE,
keepPhylog = FALSE,
mutationPropGrowth = ifelse(model == "Bozic",
FALSE, TRUE),
max.memory = 2000, max.wall.time = 200,
max.num.tries = 500,
errorHitWallTime = TRUE,
errorHitMaxTries = TRUE,
initMutant = NULL,
AND_DrvProbExit = FALSE,
fixation = NULL,
verbosity = 0,
mc.cores = detectCores(),
seed = "auto")
oncoSimulSample(Nindiv,
fp,
model = "Exp",
numPassengers = 0,
mu = 1e-6,
muEF = NULL,
detectionSize = round(runif(Nindiv, 1e5, 1e8)),
detectionDrivers = {
if(inherits(fp, "fitnessEffects")) {
if(length(fp$drv)) {
nd <- (2: round(0.75 * length(fp$drv)))
} else {
nd <- 9e6
}
} else {
nd <- (2 : round(0.75 * max(fp)))
}
if (length(nd) == 1)
nd <- c(nd, nd)
sample(nd, Nindiv,
replace = TRUE)
},
detectionProb = NA,
sampleEvery = ifelse(model %in% c("Bozic", "Exp"), 1,
0.025),
initSize = 500,
s = 0.1,
sh = -1,
K = initSize/(exp(1) - 1),
minDetectDrvCloneSz = "auto",
extraTime = 0,
finalTime = 0.25 * 25 * 365,
onlyCancer = TRUE, keepPhylog = FALSE,
mutationPropGrowth = ifelse(model == "Bozic",
FALSE, TRUE),
max.memory = 2000,
max.wall.time.total = 600,
max.num.tries.total = 500 * Nindiv,
typeSample = "whole",
thresholdWhole = 0.5,
initMutant = NULL,
AND_DrvProbExit = FALSE,
fixation = NULL,
verbosity = 1,
showProgress = FALSE,
seed = "auto")
|
Nindiv |
Number of individuals or number of different trajectories to simulate. |
fp |
Either a poset that specifies the order restrictions (see
Other arguments below (s, sh, numPassengers) make sense only if you use a poset, as they are included in the fitnessEffects object. |
model |
One of "Bozic", "Exp", "McFarlandLog" (the last one can be abbreviated to "McFL"). The default is "Exp". |
numPassengers |
This has no effect if you use the If you use the specification of v.1., the number of passenger genes. Note that using v.1 the total number of genes (drivers plus passengers) must be smaller than 64. All driver genes should be included in the poset (even if they depend on no one and no one depends on them), and will be numbered from 1 to the total number of driver genes. Thus, passenger genes will be numbered from (number of driver genes + 1):(number of drivers + number of passengers). |
mu |
Mutation rate. Can be a single value or a named vector. If a single value, all genes will have the same mutation rate. If a named vector, the entries in the vector specify the gene-specific mutation rate. If you pass a vector, it must be named, and it must have entries for all the genes in the fitness specification. Passing a vector is only available when using fitnessEffects objects for fitness specification. See also |
muEF |
Mutator effects. A mutatorEffects object as obtained from
All the genes specified in If you use mutator effects you must also use fitnessEffects in
|
detectionSize |
What is the minimal number of cells for cancer
to be detected. For If set to NA, |
detectionDrivers |
The minimal number of drivers (not modules,
drivers, whether or not they are from the same module) present in any
clone for cancer to be detected. For For If set to NA, |
detectionProb |
Vector of arguments for the mechanism where
probability of detection depends on size. If
If you only provide some of the elements (except for the pair This option can not be used with v.1 objects. |
sampleEvery |
How often the whole population is sampled. This is not the same as the
interval between successive samples that are kept or stored (for that,
see For very fast growing clones, you might need to have a small value here to minimize possible numerical problems (such as huge increase in population size between two successive samples that can then lead to problems for random number generators). Likewise, for models with density dependence (such as McF) this value should be very small. |
initSize |
Initial population size. |
K |
Initial population equilibrium size in the McFarland models. |
keepEvery |
Time interval between successive whole population samples that are
actually stored. This must be larger or equal to If you want nice plots, set Setting |
minDetectDrvCloneSz |
A value of 0 or larger than 0 (by default equal to
The reason for this parameter is to ensure that, say, a clone with a certain number of drivers that would cause the simulation to end has not just appeared and is present in only one individual that might then immediately go extinct. This can be relevant in secenarios such as the McFarland model. See also |
extraTime |
A value larger than zero waits those many additional time periods before exiting after having reached the exit condition (population size, number of drivers). The reason for this setting is to prevent the McFL models from always
exiting at a time when one clone is increasing its size quickly (see
|
finalTime |
What is the maximum number of time units that the simulation can run. Set to NA to disable this limit. |
onlyCancer |
Return only simulations that reach cancer? If set to TRUE, only simulations that satisfy the
If |
keepPhylog |
If TRUE, keep track of when and from which clone each clone is
created. See also |
mutationPropGrowth |
If TRUE, make mutation rate proportional to growth rate, so clones that grow faster also mutate faster. Thus, $mutation_rate = mu * birth_rate$. This is a simple way of approximating that mutation happens at cell division (it is not strictly making mutation happen at cell division, since mutation is not strictly coupled with division). Of course, this only makes sense in models where birth rate changes. |
initMutant |
For v.2: a string with the mutations of the initial
mutant, if any. This is the same format as for
|
max.num.tries |
Only applies when |
max.num.tries.total |
Only applies when |
max.wall.time |
Maximum wall time for the simulation of one
individual (over all |
max.wall.time.total |
Maximum wall time for all the simulations (when using
|
errorHitMaxTries |
If TRUE (the default) a simulation that reaches
the maximum number of repetitions allowed is considered not to have
succesfully finished and, thus, an error, and no output from it will
be reported. This is often what you want.
See |
errorHitWallTime |
If TRUE (the default) a simulation that reaches the maximum wall time
is considered not to have succesfully finished and, thus, an error,
and no output from it will be reported. This is often what you
want.
See |
max.memory |
The largest size (in MB) of the matrix of Populations by Time. If it creating it would use more than this amount of memory, it is not created. This prevents you from accidentally passing parameters that will return an enormous object. |
verbosity |
If 0, run silently. Iincreasing values of verbosity provide progressively more information about intermediate steps, possible numerical notes/warnings from the C++ code, etc. Values less than 0 supress some default notes: use with care. |
typeSample |
"singleCell" (or "single") for single cell sampling, where the
probability of sampling a cell (a clone) is directly proportional to
its population size. "wholeTumor" (or "whole") for whole tumor
sampling (i.e., this is similar to a biopsy being the entire
tumor). See |
thresholdWhole |
In whole tumor sampling, whether a gene is detected as mutated depends
on thresholdWhole: a gene is considered mutated if it is altered in at
least thresholdWhole proportion of the cells in that individual. See |
mc.cores |
Number of cores to use when simulating more than one individual (i.e., when calling oncoSimulPop). |
showProgress |
If TRUE, provide information, during exection, of the individual done, and the number of attempts and time used. |
AND_DrvProbExit |
If TRUE, cancer will be considered to be reached
if both the |
fixation |
If non-NULL, a list or a vector, where each element of
is a string with a gene or a gene combination or a genotype (see
below). Simulations will stop as soon as any of the genes or gene
combinations or genotypes are fixed (i.e., reach a minimal
frequency). If you pass gene combinations or genotypes, separate genes
with commas (not '>'); this means order is not (yet?) supported. This
way of specifying gene combinations is the same as the one used for
To differentiate between gene combinations and specific genotypes,
genotypes are specified by prepending them with a "_,". For instance,
In addition to the gene combinations or genotypes themeselves, you can
add to the list or vector the named elements
Using this option with |
s |
Selection coefficient for drivers. Only relevant if using a poset as this is included in the fitnessEffects object. This will eventually be deprecated. |
sh |
Selection coefficient for drivers with restrictions not satisfied. A value of 0 means there are no penalties for a driver appearing in a clone when its restrictions are not satisfied. To specify "sh=Inf" (in Diaz-Uriarte, 2015) use sh = -1. Only relevant if using a poset as this is included in the fitnessEffects object. This will eventually be deprecated. |
seed |
The seed for the C++ PRNG. You can pass a value. If you set
it to NULL, then a seed will be generated in R and passed to C++. If
you set it to "auto", then if you are using v.1, the behavior is the
same as if you set it to NULL (a seed will be generated in R and
passed to C++) but if you are using v.2, a random seed will be
produced in C++.
If you need reproducibility, either pass a value or set it to NULL (setting
it to NULL will make the C++ seed reproducible if you use the same
seed in R via When using oncoSimulPop, if you want reproducibility, you might want
to, in addition to setting |
The basic simulation algorithm implemented is the BNB one of Mather et al., 2012, where I have added modifications to fitness based on the restrictions in the order of mutations.
Full details about the algorithm are provided in Mather et al., 2012. The evolutionary models, including references, and the rest of the parameters are explained in Diaz-Uriarte, 2014, especially in the Supplementary Material. The model called "Bozic" is based on Bozic et al., 2010, and the model called "McFarland" in McFarland et al., 2013.
oncoSimulPop simply calls oncoSimulIndiv multiple times. When run on POSIX systems, it can use multiple cores (via mclapply).
The summary
methods for these classes return some of the return
values (see next) as a one-row (for class oncosimul) or multiple row
(for class oncosimulpop) data frame. The print
methods for
these classes simply print the summary.
Changing options errorHitMaxTries
and errorHitWallTime
can be useful when conducting many simulations, as in the call to
oncoSimulPop
: setting them to TRUE means nothing is recorded
for those simulations where ending conditions are not reached but
setting them to FALSE would allow you to record the output; this would
potentially result in a mixture where some simulations would not have
reached the ending condition, but this might sometimes be what you
want. Note, however, that oncoSimulSample
always has both them
to TRUE, as it could not be otherwise.
GenotypesWDistinctOrderEff
provides the information about
order effects that is missing from Genotypes
. When there are
order effects, the Genotypes
matrix can contain genotypes
that are not distinguishable. Suppose there are two genes, the first
and the second. In the Genotype
output you can get two
columns where there is a 1 in both genes: those two columns
correspond to the two possible orders (first gene mutated first, or
first gene mutated after the
second). GenotypesWDistinctOrderEff
disambiguates this. The
same is done by GenotypesLabels
; this is easier to decode for
a human (a string of gene labels) but a little bit harder to parse
automatically. Note that when you use the default print method for
this object, you get, among others, a two-column display with the
GenotypeLabels
information. When order matters, a genotype
shown as “x > y _ z” means that a mutation in “x” happened
before a mutation in “y”; there is also a mutation in “z” (which
could have happened before or after either of “x” or “y”), but
“z” is a gene for which order does not matter. When order does not
matter, a comma "," separates the identifiers of mutated genes.
Detection of cancer can be a deterministic process, where cancer is
always detected (and, thus, simulation ended) when certain conditions
are met (detectionSize
, detectionDrivers
,
fixation
). Alternatively, it can be stochastic process where
probability of detection depends on size. Every so often (see below)
we assess population size, and detect cancer or not probabilistically
(comparing the probability of detection for that size with a random
uniform number). Probability of detection changes with population size
according to the function
1 - e^{-cPDetect ( (populationsize - PDBaseline)/PDBaseline )}
.
You can pass cPDetect
manually (you will need to set n2
and p2
to NA). However, it might be more intuitive to specify
the pair n2
, p2
, such that the probability of detection
is p2 for population size n2 (and from that pair we solve
for the value of cPDetect
). How often do we check? That is
controlled by checkSizePEvery
, the (minimal) time between
successive checks (from among the sampling times given by
sampleEvery
: the interval between successive assessments will
be the smallest multiple integer of sampleEvery
that is
larger than checkSizePEvery
—see vignette for details).
checkSizePEvery
has, by default, a different (and much larger)
value than sampleEvery
both to allow to examine the effects of
sampling, and to avoid many costly random number generations.
Please note that detectionProb
is NOT available with version 1
objects.
For oncoSimulIndiv
a list, of class "oncosimul", with the
following components:
pops.by.time |
A matrix of the population sizes of the clones, with clones in columns and time in row. Not all clones are shown here, only those that were present in at least on of the keepEvery samples. |
NumClones |
Total number of clones in the above matrix. This is not the total number of distinct clones that have appeared over all simulations (which is likely to be larger or much larger). |
TotalPopSize |
Total population size at the end. |
Genotypes |
A matrix of genotypes. For each of the clones in the pops.by.time matrix, its genotype, with a 0 if the gene is not mutated and a 1 if it is mutated. |
MaxNumDrivers |
The largest number of mutated driver genes ever seen in the simulation in any clone. |
MaxDriversLast |
The largest number of mutated drivers in any clone at the end of the simulation. |
NumDriversLargestPop |
The number of mutated driver genes in the clone with largest population size. |
LargestClone |
Population size of the clone with largest number of population size. |
PropLargestPopLast |
Ratio of LargestClone/TotalPopSize |
FinalTime |
The time (in time units) at the end of the simulation. |
NumIter |
The number of iterations of the BNB algorithm. |
HittedWallTime |
TRUE if we reached the limit of max.wall.time. FALSE otherwise. |
TotalPresentDrivers |
The total number of mutated driver genes,
whether or not in the same clone. The number of elements in
|
CountByDriver |
A vector of length number of drivers, with the count of the number of clones that have that driver mutated. |
OccurringDrivers |
The actual number of drivers mutated. |
PerSampleStats |
A 5 column matrix with a row for each sampling period. The columns are: total population size, population size of the largest clone, the ratio of the two, the largest number of drivers in any clone, and the number of drivers in the clone with the largest population size. |
other |
A list that contains statistics for an estimate of the simulation error when using the McFarland model as well as other statistics. For the McFarland model, the relevant value is errorMF, which is -99 unless in the McFarland model. For the McFarland model it is the largest difference of successive death rates. The entries named minDMratio and minBMratio are the smallest ratio, over all simulations, of death rate to mutation rate and birth rate to mutation rate, respectively. The BNB algorithm thrives when those are large. |
For oncoSimulPop
a list of length Nindiv
, and of class
"oncosimulpop"
, where each element of the list is itself a
list, of class oncosimul
, with components as described above.
In v.2, the output is of both class "oncosimul" and "oncosimul2". The oncoSimulIndiv return object differs in
GenotypesWDistinctOrderEff |
A list
of vectors, where each vector corresponds to a genotype in the
|
GenotypesLabels |
The genotypes, as character vectors with the original labels provided (i.e., not the integer codes). As before, mutated genes, for those where order matters, come first, and are separated by the rest by a "_". See details. |
OccurringDrivers |
This is the same as in v.1, but we use the labels, not the numeric id codes. Of course, if you entered integers as labels for the genes, you will see numbers (however, as a character string). |
Please, note that the meaning of the fitness effects in the McFarland model is not the same as in the original paper; the fitness coefficients are transformed to allow for a simpler fitness function as a product of terms. This differs with respect to v.1. See the vignette for details.
Ramon Diaz-Uriarte
Bozic, I., et al., (2010). Accumulation of driver and passenger mutations during tumor progression. Proceedings of the National Academy of Sciences of the United States of America\/, 107, 18545–18550.
Diaz-Uriarte, R. (2015). Identifying restrictions in the order of accumulation of mutations during tumor progression: effects of passengers, evolutionary models, and sampling http://www.biomedcentral.com/1471-2105/16/41/abstract
Gerstung et al., 2011. The Temporal Order of Genetic and Pathway Alterations in Tumorigenesis. PLoS ONE, 6.
McFarland, C.~D. et al. (2013). Impact of deleterious passenger mutations on cancer progression. Proceedings of the National Academy of Sciences of the United States of America\/, 110(8), 2910–5.
Mather, W.~H., Hasty, J., and Tsimring, L.~S. (2012). Fast stochastic algorithm for simulating evolutionary population dynamics. Bioinformatics (Oxford, England)\/, 28(9), 1230–1238.
plot.oncosimul
, examplePosets
,
samplePop
, allFitnessEffects
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 | #################################
#####
##### Examples using v.1
#####
#################################
## use poset p701
data(examplePosets)
p701 <- examplePosets[["p701"]]
## Exp Model
b1 <- oncoSimulIndiv(p701)
summary(b1)
plot(b1, addtot = TRUE)
## McFarland; use a small sampleEvery, but also a reasonable
## keepEvery.
## We also modify mutation rate to values similar to those in the
## original paper.
## Note that detectionSize will play no role
## finalTime is large, since this is a slower process
## initSize is set to 4000 so the default K is larger and we are likely
## to reach cancer. Alternatively, set K = 2000.
m1 <- oncoSimulIndiv(p701,
model = "McFL",
mu = 5e-7,
initSize = 4000,
sampleEvery = 0.025,
finalTime = 15000,
keepEvery = 10,
onlyCancer = FALSE)
plot(m1, addtot = TRUE, log = "")
## Simulating 4 individual trajectories
## (I set mc.cores = 2 to comply with --as-cran checks, but you
## should either use a reasonable number for your hardware or
## leave it at its default value).
p1 <- oncoSimulPop(4, p701,
keepEvery = 10,
mc.cores = 2)
summary(p1)
samplePop(p1)
p2 <- oncoSimulSample(4, p701)
#########################################
######
###### Examples using v.2:
######
#########################################
#### A model similar to the one in McFarland. We use 2070 genes.
set.seed(456)
nd <- 70
np <- 2000
s <- 0.1
sp <- 1e-3
spp <- -sp/(1 + sp)
mcf1 <- allFitnessEffects(noIntGenes = c(rep(s, nd), rep(spp, np)),
drv = seq.int(nd))
mcf1s <- oncoSimulIndiv(mcf1,
model = "McFL",
mu = 1e-7,
detectionSize = 1e8,
detectionDrivers = 100,
sampleEvery = 0.02,
keepEvery = 2,
initSize = 2000,
finalTime = 1000,
onlyCancer = FALSE)
plot(mcf1s, addtot = TRUE, lwdClone = 0.6, log = "")
summary(mcf1s)
plot(mcf1s)
#### Order effects with modules, and 5 genes without interactions
#### with fitness effects from an exponential distribution
oi <- allFitnessEffects(orderEffects =
c("F > D" = -0.3, "D > F" = 0.4),
noIntGenes = rexp(5, 10),
geneToModule =
c("Root" = "Root",
"F" = "f1, f2, f3",
"D" = "d1, d2") )
oiI1 <- oncoSimulIndiv(oi, model = "Exp")
oiI1$GenotypesLabels
oiI1 ## note the order and separation by "_"
oiP1 <- oncoSimulPop(2, oi,
keepEvery = 10,
mc.cores = 2)
summary(oiP1)
## Even if order exists, this cannot reflect it;
## G1 to G10 are d1, d2, f1..,f3, and the 5 genes without
## interaction
samplePop(oiP1)
oiS1 <- oncoSimulSample(2, oi)
## The output contains only the summary of the runs AND
## the sample:
oiS1
## And their sizes do differ
object.size(oiS1)
object.size(oiP1)
######## Using a poset for pancreatic cancer from Gerstung et al.
### (s and sh are made up for the example; only the structure
### and names come from Gerstung et al.)
pancr <- allFitnessEffects(data.frame(parent = c("Root", rep("KRAS", 4), "SMAD4", "CDNK2A",
"TP53", "TP53", "MLL3"),
child = c("KRAS","SMAD4", "CDNK2A",
"TP53", "MLL3",
rep("PXDN", 3), rep("TGFBR2", 2)),
s = 0.05,
sh = -0.3,
typeDep = "MN"))
plot(pancr)
### Use an exponential growth model
pancr1 <- oncoSimulIndiv(pancr, model = "Exp")
pancr1
summary(pancr1)
plot(pancr1)
pancr1$GenotypesLabels
## Pop and Sample
pancrPop <- oncoSimulPop(4, pancr,
keepEvery = 10,
mc.cores = 2)
summary(pancrPop)
pancrSPop <- samplePop(pancrPop)
pancrSPop
pancrSamp <- oncoSimulSample(2, pancr)
pancrSamp
## Using gene-specific mutation rates
muv <- c("U" = 1e-3, "z" = 1e-7, "e" = 1e-6, "m" = 1e-5, "D" = 1e-4)
ni <- rep(0.01, 5)
names(ni) <- names(muv)
femuv <- allFitnessEffects(noIntGenes = ni)
oncoSimulIndiv(femuv, mu = muv)
## Reinitialize the RNG
set.seed(NULL)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.