sample.adpp: Age-dependent Poisson Process species sampling

Description Usage Arguments Value Author(s) Examples

View source: R/sample.adpp.R

Description

Generates a list of occurrence times for each of the desired species using a Poisson process with constant average rate and occurrences distributed based on a model of species age. Allows for any distribution as the model of occurrences during a species life, given certain requirements (see below). Allows for an optional argument - the maximum of the distribution - that can make the simulation faster. Also allows for extra arguments the age-dependent preservation function may take. For time-varying sampling rates without age-dependency, see sample.species. For a function that unites both cases and returns an organize data frame, see sample.clade.

Usage

1
sample.adpp(sim, rr, dFun, S = NULL, dFunMax = NULL, ...)

Arguments

sim

A sim object, usually an output of bd.sim.

rr

A mean sampling rate (equivalent to the lambda of a Poisson) in the Poisson process. Must be a number greater than or equal to zero.

dFun

A density function representing the age-dependent preservation model. It must be a density function, and consequently integrate to 1 (though this condition is not verified by the function). Additionally, the function must have the following properties:

  • Returns a vector of preservation densities for each time in a given vector t in geological time.

  • Be parametrized in absolute geological time (i.e. should be relative to absolute geological time, in Mya, not the lineage's age).

  • Should be limited between s (i.e. the lineage's speciation/origination in geological time) and e (i.e. the lineage's extinction in geological time), with s > e.

  • Include the arguments t, s, e and sp. The argument sp is used to pass species-specific parameters (see examples), allowing for dFun to be species-inhomogeneous.

S

A vector of species numbers to be sampled. Could be only a subset of the species if the user wishes. The default is all species in sim. Species not included in S will not be sampled by the function.

dFunMax

A function that calculates the maximum (density) value of dFun using its arguments, excluding t. It can also be a number representing the maximum density.

Note: if it not provided, it will be approximated numerically, leading to longer running times.

...

Additional parameters related to dFun and dFunMax.

Value

A list of occurrences for that species, expected to be around (s - e)*rr occurrences, with their distribution in species relative time given by the dFun function provided by the user.

Author(s)

Matheus Januario.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
###
# we can start with a hat-shaped increase through the duration of a species

# simulate a group
sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)

# in case first simulation is short-lived
while ((sim$TS[1] - ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
  sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}

# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] <- 0
# this is necessary since the default is to have NA for extant species

# here we will use the PERT function. It is described in:
# Silvestro et al 2014

# preservation function
dPERT <- function(t, s, e, sp, a = 3, b = 3, log = FALSE) {
  
  # check if it is a valid PERT
  if (e >= s) {
    message("There is no PERT with e >= s")
    return(rep(NaN, times = length(t)))
  }
  
  # find the valid and invalid times
  id1 <- which(t <= e | t >= s)
  id2 <- which(!(t <= e | t >= s))
  t <- t[id2]
  
  # initialize result vector
  res <- vector()
  
  # if user wants a log function
  if (log) {
    # invalid times get -Inf
    res[id1] <- -Inf
    
    # valid times calculated with log
    res[id2] <- log(((s - t) ^ 2)*((-e + t) ^ 2)/((s - e) ^ 5*beta(a,b)))
  }
  # otherwise
  else{
    res[id1] <- 0
    
    res[id2] <- ((s - t) ^ 2)*((-e + t) ^ 2)/((s - e) ^ 5*beta(a,b))
  }
  
  return(res)
}

# function to calculate max of the PERT
dPERTmax <- function(s, e, sp) {
  return(((s - e) / 2) + e)
}

# find occurrences
occs <- sample.adpp(sim = sim, rr = 1, dFun = dPERT, S = 1, dFunMax = dPERTmax)

# check histogram
hist(unlist(occs), probability = TRUE)

# expected curve - probably will not fit great because of low sample size, see
# vignettes for more rigorous tests
curve(dPERT(x, s = sim$TS[1], e = sim$TE[1]), 20, 0, add = TRUE, col = "red")

###
# now we can test the simpler scenario of uniform sampling probablity
# through the duration of a species (= homogeneous poisson process)

# simulate a group
sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)

# in case first simulation is short-lived
while ((sim$TS[1] - ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
  sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}

# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] <- 0
# this is necessary since the default is to have NA for extant species

# preservation function in respect to age
# occurrences are uniformly distributed
custom.uniform <- function(t, s, e, sp) {
  
  # make sure it is a valid uniform
  if (e >= s) {
    message("There is no uniform function with e >= s")
    return(rep(NaN, times = length(t)))
  }
  
  res <- dunif(x = t, min = e, max = s)
  
  return(res)
}

# we will not give a dFunMax function this time. sample.adpp() will try to find
# the maximum density with a very simple numerical simulation
occs <- sample.adpp(sim = sim, rr = 2, dFun = custom.uniform, S = 1)

# check histogram
hist(unlist(occs[[1]]), probability = TRUE)

# expected curve
curve(dunif (x, min = sim$TE[1], max = sim$TS[1]), 10, 0,
      add = TRUE, col = "red")

###
# now, a hat-shaped increase through the duration of a species with more
# parameters than TS and TE

# simulate a group
sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)

# in case first simulation is short-lived
while ((sim$TS[1] - ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
  sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}

# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] <- 0
# this is necessary since the default is to have NA for extant species

# here we will use the triangular distribution. We have some empirical evidence
# that taxa occurrences might present triangular shape
# see Zliobaite et al 2017

# preservation function
dTRI <- function(t, s, e, sp, md) {
  
  # please note ths function is inverted. The correspondence would be:
  # s = b maximum
  # e = a minimum
  # md = c distribution's mode
  
  # make sure it is a valid TRI
  if (e >= s) {
    message("There is no TRI with e >= s")
    return(rep(NaN, times = length(t)))
  }
  
  # another condition we must check
  if (md < e | md > s) {
    message("There is no TRI with md outside [s, e] interval")
    return(rep(NaN, times = length(t)))
  }
  
  # needed to vectorize the function:
  id1 <- which(t >= e & t < md)
  id2 <- which(t == md)
  id3 <- which(t > md & t <= s)
  id4 <- which( !(1:length(t) %in% c(id1,id2,id3)))
  
  # actually vetorizing function
  res <- vector()
  
  # (t >= e & t < md)
  res[id1] <- (2*(t[id1] - e)) / ((s - e)*(md - e))
  
  #(t == md)
  res[id2] <- 2 / (s - e)
  
  #(md < t & t <= s)
  res[id3] <- (2*(s - t[id3])) / ((s - e)*(s - md))
  
  #outside function's limits
  res[id4] <- 0
  
  return(res)
}

# the dFunMax function must have the same parameters then the dFun function,
# even if they do not use them
dTRImax <- function(s, e, sp, md) {
  
  return(2 / (s - e))
}

# note we are providing the mode for the triangular sampling as an ... argument
occs <- sample.adpp(sim = sim, rr = 2.5, dFun = dTRI,
                   dFunMax = dTRImax, S = 1, md = 8)

# please note in the original parametrization, the "md" parameter (mode) is
# in "absolute time", i.e. a specific number in the absolute scale. This is not,
# directly, an "age-dependent" model. We show it here as we will construct
# models related to age in the following examples (4 and 5). This is also
# the reason we only plot the first lineage in this specific example
# (many lineages might not be alive in the "md" moment in time).

# check histrogram
hist(unlist(occs[[1]]), probability = TRUE)

# expected curve
curve(dTRI(x, e = sim$TE[1], s = sim$TS[1], md = 8), 10, 0, add = TRUE,
      col = "red")

###
# we can also have a hat-shaped increase through the duration of a species
# with more parameters than TS and TE, but with the parameters relate to
# the relative age of each lineage

# simulate a group
sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)

# in case first simulation is short-lived
while ((sim$TS[1] - ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
  sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}

# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] <- 0
# this is necessary since the default is to have NA for extant species

# preservation function, with the "mde" of the triangle being exactly at the
# last quarter of the duration of EACH lineage
dTRImod1 <- function(t, s, e, sp) {
  # note that now we don't have the "md" parameter here,
  # but it is calculated inside the function
  
  # check if it is a valid TRI
  if (e >= s) {
    message("There is no TRI with e >= s")
    return(rep(NaN, times = length(t)))
  }
  
  # calculate md
  md <- ((s - e) / 4) + e
  # md is at the last quarter of the duration of the lineage
  
  # please note that the same logic can be used to sample parameters
  # internally in the function, running for instance:
  # md<-runif (n = 1, min = e, max = s)
  
  # check it is a valid md
  if (md < e | md > s) {
    message("There is no TRI with md outside [s, e] interval")
    return(rep(NaN, times=length(t)))
  }
  
  # needed to vectorize function
  id1 <- which(t >= e & t < md)
  id2 <- which(t == md)
  id3 <- which(t>md & t <= s)
  id4 <- which( !(1:length(t) %in% c(id1,id2,id3)))
  
  # vectorize the function
  res<-vector()
  
  res[id1] <- (2*(t[id1] - e)) / ((s - e)*(md - e))
  res[id2] <- 2 / (s - e)
  res[id3] <- (2*(s - t[id3])) / ((s - e)*(s - md))
  res[id4] <- 0
  
  return(res)
}

# function to calculate max
dTRImaxmod1<-function(s, e, sp) {
  return(2 / (s - e))
}

# find occurrences
occs <- sample.adpp(sim = sim, rr = 5,
                    dFun = dTRImod1, S = 1, dFunMax = dTRImaxmod1)

# we do not have the "md" parameter (see example 3) as it corresponds to the
# last quarter of the duration of each lineage

# check histogram
hist(unlist(occs[[1]]), probability = TRUE)

# md of dTRI
mid <- ((sim$TS[1] - sim$TE[1]) / 4) + sim$TE[1]

# expected curve
curve(dTRImod1(x, e = sim$TE[1], s = sim$TS[1]), 10, 0, add = TRUE, 
      col = "red")

###
# finally, a hat-shaped increase through the duration of a species with more
# parameters than TS and TE, but the parameters relate to each specific lineage.
# This is useful when the user wants to use variable parameters for each species
# but wants to keep track of those parameters after the sampling is over

# simulate a group
sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)

# in case first simulation is short-lived
while ((sim$TS[1] - ifelse(is.na(sim$TE[1]), 0, sim$TE[1])) < 10) {
  sim <- bd.sim(n0 = 1, pp = 0.1, qq = 0.1, tMax = 10)
}

# we will need to get exact durations for some examples, so
sim$TE[sim$EXTANT] <- 0
# this is necessary since the default is to have NA for extant species

# preservation function in respect to age, with the "mode" of the triangle
# being exactly at the last quarter of the duration of EACH lineage.
dTRImod2 <- function(t, s, e, sp) {
  
  # make sure it is a valid TRI
  if (e >= s) {
    message("There is no TRI with e >= s")
    return(rep(NaN, times = length(t)))
  }
  
  # here is the difference from the function in example 3 and 4
  md <- par[sp] + par1[sp]
  
  # check that md is valid
  if (md < e | md > s) {
    message("There is no TRI with md outside [s, e] interval")
    return(rep(NaN, times = length(t)))
  }
  
  id1 <- which(t >= e & t < md)
  id2 <- which(t == md)
  id3 <- which(t > md & t <= s)
  id4 <- which(!(1:length(t) %in% c(id1,id2,id3)))
  
  res <- vector()
  
  res[id1] <- (2*(t[id1] - e)) / ((s - e)*(md - e))
  res[id2] <- 2 / (s - e)
  res[id3] <- (2*(s - t[id3])) / ((s - e)*(s - md))
  res[id4] <- 0
  
  return(res)
  #for more details in this function, see example 3 and 4
}

# maximum function
dTRImaxmod2 <- function(s, e, sp) {
  return(2 / (s - e))
}

# get the par and par1 vectors
par <- runif (n = length(sim$TE), min = sim$TE, max = sim$TS)
par1 <- (((sim$TS - sim$TE)/2) + sim$TE) - par

# find occurrence list
occs <- sample.adpp(sim = sim, rr = 10,
                    dFun = dTRImod2, dFunMax = dTRImaxmod2)

# we do not have the "md" parameter (see example 3) as it corresponds to
# the last quarter of the duration of each lineage

# checking each species
for (sp in 1:length(sim$TE)) {
  # check histogram
  hist(unlist(occs[[sp]]), probability = TRUE,
       main = paste0("spp ", sp, " ; duration ~ ",
                     round(sim$TS[sp] - sim$TE[sp], digits = 2)))
  
  # calculate mid
  mid <- par[sp] + par1[sp]
  
  # expected curve
  curve(dTRImod2(x, e = sim$TE[sp], s = sim$TS[sp], sp), 10, 0, add = TRUE,
        col = "red", n = 100)
  abline(v = mid, col = "red")
}

brpetrucci/paleobuddy documentation built on Aug. 8, 2020, 2:03 a.m.