Description Usage Arguments Value Author(s) Examples
Generates a data.frame
containing either true occurrence times or time
ranges for each of the desired species using a Poisson process. Allows for the
Poisson rate to be (1) a constant, (2) a function of time, (3) a function of
time and an environmental variable, or (4) a vector of numbers. Also allows
as an optional parameter a distribution representing the expected occurrence
number over a species duration (in which case average rate must be constant).
Allows for further flexibility in (nonage dependent) rates by a shift times
vector and environmental matrix parameters. Optionally takes a vector of time
bins representing geologic periods, so that if the user wishes occurrence
times can be presented as a range instead of true points. Finally, allows for
an optional argument  the maximum of the agedependent distribution  that can
make the simulation faster, and for extra arguments the agedependent
preservation function may take. See sample.species
 absolute
timedependent sampling  and sample.adpp
 agedependent sampling  for
more information.
1 2 3 4 5 6 7 8 9 10 11 12 13 
sim 
A 
rr 
Sampling rate (per species per million years) over time. It can be
a 
tMax 
The maximum simulation time, used by 
S 
A vector of species numbers to be sampled. Could be only a subset of
the species if the user wishes. The default is all species in 
envRR 
A matrix containing time points and values of an environmental
variable, like temperature, for each time point. This will be used to create
a sampling rate, so 
rShifts 
Vector of rate shifts. First element must be the starting
time for the simulation ( Note: using this method for stepfunction rates is currently slower than using

returnTrue 
If set to If 
bins 
A vector of time intervals corresponding to geological time ranges. 
dFun 
A density function representing the agedependent preservation model. It must be a density function, and consequently integrate to 1 (though this condition is not verified by the function). Additionally, the function must have the following properties:

dFunMax 
A function that calculates the maximum (density) value of
Note: if it not provided, it will be approximated numerically, leading to longer running times. 
... 
Additional parameters related to 
A data.frame
containing species names/numbers, whether each
species is extant, and either the true occurrence times of species or a range
of occurrence times based on bins
.
Matheus Januario and Bruno do Rosario Petrucci.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394  ###
# we can start with a constant case
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# sampling rate
r < 2
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
# find the occurrence data frame
dt < sample.clade(sim, r, tMax = 10, bins = bins, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[i]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)",
xlim = c(sim$TS[i], sim$TE[i]))
}
###
# sampling can be any function of time in the nonage dependent case, of course
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# sampling rate
r < function(t) {
return(3  0.15*t)
}
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
# find the occurrence data frame
dt < sample.clade(sim, r, tMax = 10, bins = bins, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[i]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)",
xlim = c(sim$TS[i], sim$TE[i]))
}
###
# now we can try a step function rate
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# we will use the less efficient method of creating a step function
# one could instead use ifelse()
# rates vector
rList < c(1, 2, 0.5)
# rate shifts vector
rShifts < c(0, 4, 8)
# make it a function so we can plot it
r < make.rate(rList, 10, fShifts=rShifts)
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
# find the occurrence data frame
dt < sample.clade(sim, rList, rShifts = rShifts,
tMax = 10, bins = bins, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[i]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)",
xlim = c(sim$TS[i], sim$TE[i]))
}
###
# finally, sample.clade also accepts an environmental variable
# get temperature data
data(temp)
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# make temperature the environmental dependency of r
envR < temp
# we can then make sampling dependent on the temperature
r < function(t, env) {
return(0.5*env)
}
# make it a function so we can plot it
rr < make.rate(r, envF = envR)
# let us check that r is high enough to see a pattern
plot(1:10, rr(1:10), type = 'l', main = "Sampling rate",
xlab = "My", ylab = "r")
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
# find the occurrence data frame
dt < sample.clade(sim, r, tMax = 10, envRR = envR,
bins = bins, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[i]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)",
xlim = c(sim$TS[i], sim$TE[i]))
}
# we will now do some tests with agedependent rates. For more details,
# check sample.adpp.
###
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# here we will use the PERT function. It is described in:
# Silvestro et al 2014
# preservation function
dPERT < function(t, s, e, sp, a = 3, b = 3, log = FALSE) {
# check if it is a valid PERT
if (e >= s) {
message("There is no PERT with e >= s")
return(rep(NaN, times = length(t)))
}
# find the valid and invalid times
id1 < which(t <= e  t >= s)
id2 < which(!(t <= e  t >= s))
t < t[id2]
# initialize result vector
res < vector()
# if user wants a log function
if (log) {
# invalid times get Inf
res[id1] < Inf
# valid times calculated with log
res[id2] < log(((s  t) ^ 2)*((e + t) ^ 2)/((s  e) ^ 5*beta(a,b)))
}
# otherwise
else{
res[id1] < 0
res[id2] < ((s  t) ^ 2)*((e + t) ^ 2)/((s  e) ^ 5*beta(a,b))
}
return(res)
}
# function to calculate max of the PERT
dPERTmax < function(s, e, sp) {
return(((s  e) / 2) + e)
}
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
dt < sample.clade(sim, rr = 3, tMax = 10, bins = bins,
dFun = dPERT, dFunMax = dPERTmax, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[[i]]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)", probability = TRUE)
# expected curve
curve(dPERT(x, s = sim$TS[sp], e = sim$TE[sp], sp = sp), from = sim$TE[sp],
to = sim$TS[sp], add = TRUE, col = "red", n = 100)
}
# we provide curves for comparison here, but remember the low sample sizes
# (and bins) may affect the quality of the fit. See vignettes for more
# thorough testing
###
# now, a hatshaped increase through the duration of a species dependent on two
# parameters
# simulate a group
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
# in case first simulation is shortlived
while ((sim$TS[1]  ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
sim < bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}
# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] < 0
# this is necessary since the default is to have NA for extant species
# preservation function in respect to age, with the "mode" of the triangle
# being exactly at the last quarter of the duration of EACH lineage.
dTRImod2<function(t, s, e, sp) {
# make sure it is a valid TRI
if (e >= s) {
message("There is no TRI with e >= s")
return(rep(NaN, times = length(t)))
}
# here is the difference from the function in example 3 and 4
md < par[sp] + par1[sp]
# check that md is valid
if (md < e  md > s) {
message("There is no TRI with md outside [s, e] interval")
return(rep(NaN, times = length(t)))
}
id1 < which(t >= e & t < md)
id2 < which(t == md)
id3 < which(t > md & t <= s)
id4 < which(!(1:length(t) %in% c(id1,id2,id3)))
res < vector()
res[id1] < (2*(t[id1]  e)) / ((s  e)*(md  e))
res[id2] < 2 / (s  e)
res[id3] < (2*(s  t[id3])) / ((s  e)*(s  md))
res[id4] < 0
return(res)
#for more details in this function, see example 3 and 4
}
# maximum function
dTRImaxmod2 < function(s, e, sp) {
return(2 / (s  e))
}
# a random point inside each lineage's duration
par < runif (n = length(sim$TE), min = sim$TE, max = sim$TS)
# a distance between "par" and the lineage's duration middle
par1 < (((sim$TS  sim$TE) / 2) + sim$TE)  par
# the resolution of the fossil dataset:
bins < seq(from = 10, to = 0,
by = 0.1)
# note that we will provide a very high resolution to test the function
dt < sample.clade(sim, rr = 4, tMax = 10, bins = bins,
dFun = dTRImod2, dFunMax = dTRImaxmod2, returnTrue = FALSE)
# extract species identity
ids < unique(dt$Species)
# approximate sampling time (since it is a range)
mids < (dt$MaxT  dt$MinT) / 2 + dt$MinT
# for each species
for (i in 1:length(ids)) {
# get the species number
sp < unique(as.numeric(gsub("spp_", "", ids[i])))
# check the histogram
hist(mids[dt$Species == ids[[i]]],
main = paste0("spp = ", sp, "; duration ~ ",
round(sim$TS[sp]  sim$TE[sp], digits = 2), "my"),
xlab = "Time (My)", probability = TRUE)
# expected curve
curve(dTRImod2(x, e=sim$TE[sp], s=sim$TS[sp], sp = sp),from = sim$TE[sp],
to = sim$TS[sp], add=TRUE, col="red", n = 100)
}
# we provide curves for comparison here, but remember the low sample sizes
# (and bins) may affect the quality of the fit. See vignettes for more
# thorough testing

Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.