R/sample.clade.R

Defines functions sample.clade

Documented in sample.clade

#' General rate species sampling
#' 
#' Generates occurrence times or time ranges (as most empirical fossil 
#' occurrences) for each of the desired species using a Poisson process. Allows
#' for the Poisson rate to be (1) a constant, (2) a function of time, (3) a 
#' function of time and a time-series (usually environmental) variable, or (4) 
#' a vector of numbers (rates in a step function). Allows for age-dependent 
#' sampling with a parameter for a distribution representing the expected 
#' occurrence number over a species duration. Allows for further flexibility in
#' rates by a shift times vector and environmental matrix parameters. 
#' Optionally takes a vector of time bins representing geologic periods, if the
#' user wishes occurrence times to be represented as a range instead of true 
#' points.
#' 
#' @param sim A \code{sim} object, containing extinction times, speciation 
#' times, parent, and status information (extant or extinct) for each species
#' in the simulation. See \code{?sim}.
#'
#' @param rho Sampling rate (per species per million years) over time. It can 
#' be a \code{numeric} describing a constant rate, a \code{function(t)} 
#' describing the variation in sampling over time \code{t}, a 
#' \code{function(t, env)} describing the variation in sampling over time 
#' following both time AND a time-series, usually an environmental variable 
#' (see \code{envR}), or a \code{vector} containing rates that correspond to 
#' each rate between sampling rate shift times times (see \code{rShifts}). If 
#' \code{adFun} is supplied, it will be used to find the number of occurrences 
#' during the species duration, and a normalized \code{rho*adFun} will 
#' determine their distribution along the species duration. Note that 
#' \code{rho} should always be greater than or equal to zero.
#' 
#' @param tMax The maximum simulation time, used by \code{rexp.var}. A sampling
#' time greater than \code{tMax} would mean the occurrence is sampled after the
#' present, so for consistency we require this argument. This is also required
#' to ensure time follows the correct direction both in the Poisson process and
#' in the output.
#'
#' @param S A vector of species numbers to be sampled. The default is all 
#' species in \code{sim}. Species not included in \code{S} will not be sampled 
#' by the function.
#' 
#' @param envR A data frame containing time points and values of an 
#' environmental variable, like temperature, for each time point. This will be 
#' used to create a sampling rate, so \code{rho} must be a function of time and
#' said variable if \code{envR} is not \code{NULL}. Note \code{paleobuddy} has 
#' two environmental data frames, \code{temp} and \code{co2}. See \code{RPANDA}
#' for more examples.
#'
#' @param rShifts Vector of rate shifts. First element must be the starting
#' time for the simulation (\code{0} or \code{tMax}). It must have the same 
#' length as \code{lambda}. \code{c(0, x, tMax)} is equivalent to 
#' \code{c(tMax, tMax - x, 0)} for the purposes of \code{make.rate}.
#'
#' @param returnTrue If set to \code{FALSE}, it will contain the occurrence
#' times as ranges. In this way, we simulate the granularity presented by
#' empirical fossil records. If \code{returnTrue} is \code{TRUE}, this is 
#' ignored.
#' 
#' @param returnAll If set to \code{TRUE}, returns both the true sampling time 
#' and age ranges. Default is \code{FALSE}.
#' 
#' @param bins A vector of time intervals corresponding to geological time 
#' ranges. It must be supplied if \code{returnTrue} or \code{returnAll} is
#' \code{TRUE}.
#' 
#' @param adFun A density function representing the age-dependent preservation
#' model. It must be a density function, and consequently integrate to 1 
#' (though this condition is not verified by the function). If not provided, a 
#' uniform distribution will be used by default. The function must also 
#' have the following properties:
#'
#' \itemize{
#'
#' \item Return a vector of preservation densities for each time in a given
#' vector \code{t} in geological time. 
#'
#' \item Be parameterized in the absolute geological time associated to 
#' each moment in age (i.e. age works relative to absolute geological 
#' time, in Mya - in other words, the convention is TS > 0). The function 
#' \emph{does not} directly use the lineage's age (which would mean that
#' TS = 0 for all species whenever they are born). Because of this, it is
#' assumed to go from \code{tMax} to \code{0}, as opposed to most functions 
#' in the package.
#'
#' \item Should be limited between \code{s} (i.e. the lineage's 
#' speciation/birth) and \code{e} (i.e. the lineage's extinction/death), 
#' with \code{s} > \code{e}. It is possible to assign parameters in absolute 
#' geological time (see third example) and relative to age as long as this 
#' follows the convention of age expressed in absolute geological time (see 
#' fourth example).
#'
#' \item Include the arguments \code{t}, \code{s}, \code{e} and \code{sp}. 
#' The argument sp is used to pass species-specific parameters (see examples),
#' allowing for \code{dFun} to be species-inhomogeneous.
#' }
#'
#' @param ... Additional parameters used by \code{adFun}. See examples.
#'
#' @details The age-dependent preservation function assumes that all extant
#' species at the end of the simulations have \code{TE = 0} (i.e., the function
#' assumes all extant species got extinct exaclty when the simulation ended.
#' This might create distortion for some \code{adFun} - especially in the case
#' of bell-shaped functions. As interpretations of what age-dependent
#' preservation mean to species alive at the end of the simulation, we
#' recommend users to implement their own preservation functions for the
#' species that are extant at the end of the simulation.
#' @return A \code{data.frame} containing species names/numbers, whether each 
#' species is extant or extinct, and the true occurrence times of each fossil, 
#' a range of occurrence times based on \code{bins}, or both.
#'
#' @author Matheus Januario and Bruno do Rosario Petrucci.
#'
#' @examples
#' 
#' # vector of times
#' time <- seq(10, 0, -0.1)
#' 
#' ###
#' # we can start with a constant case
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # sampling rate
#' rho <- 2
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho, tMax = 10, 
#'                         bins = bins, returnTrue = FALSE)
#' 
#' # draw simulation with fossil occurrences as ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' ###
#' # sampling can be any function of time 
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # sampling rate
#' rho <- function(t) {
#'   return(2 - 0.15*t)
#' }
#' 
#' # plot sampling function
#' plot(x = time, y = rho(time), type = "l", 
#'      ylab = "Preservation rate", 
#'      xlab = "Time since the start of the simulation (My)")
#' # note for these examples we do not reverse time in the plot
#' # see other functions in the package for examples where we do
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho, tMax = 10, 
#'                         bins = bins, returnTrue = FALSE)
#' 
#' # draw simulation with fossil occurrences as ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' ###
#' # now we can try a step function rate
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # we will use the less efficient method of creating a step function
#' # one could instead use ifelse()
#' 
#' # rates vector
#' rList <- c(2, 5, 0.5)
#' 
#' # rate shifts vector
#' rShifts <- c(0, 4, 8)
#' 
#' # make it a function so we can plot it
#' rho <- make.rate(rList, 10, rateShifts = rShifts)
#' 
#' # plot sampling function
#' plot(x = time, y = rho(time), type = "l", 
#'      ylab = "Preservation rate", 
#'      xlab = "Time since the start of the simulation (My)")
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho = rList, rShifts = rShifts, tMax = 10, 
#'                         bins = bins, returnTrue = FALSE)
#' 
#' # draw simulation with fossil occurrences as ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' ###
#' # finally, sample.clade also accepts an environmental variable
#' 
#' # get temperature data
#' data(temp)
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # rho will be temperature dependent
#' envR <- temp
#' 
#' # function describing environmental dependence
#' r_t <- function(t, env) {
#'   return(((env) / 12) ^ 6)
#' }
#' 
#' # make it a function so we can plot it
#' rho <- make.rate(r_t, tMax = tMax, envRate = envR)
#' 
#' # plot sampling function
#' plot(x = time, y = rho(time), type = "l", 
#'      ylab = "Preservation rate", 
#'      xlab = "Time since the start of the simulation (My)")
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho = r_t, envR = envR, tMax = 10, bins = bins)
#' # now we record the true time of fossil occurrences 
#' 
#' # draw simulation with fossil occurrences as time points
#' draw.sim(sim, fossils = fossils)
#' 
#' # note that any techniques used in examples for ?bd.sim to create more
#' # complex mixed scenarios can be used here as well
#' 
#' ###
#' # sampling can also be age-dependent
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # sampling rate
#' rho <- 3
#' 
#' # here we will use the PERT function. It is described in:
#' # Silvestro et al 2014
#' 
#' # age-dependence distribution
#' # note that a and b define the beta distribution used, and can be modified
#' dPERT <- function(t, s, e, sp, a = 3, b = 3, log = FALSE) {
#'   # check if it is a valid PERT
#'   if (e >= s) {
#'    message("There is no PERT with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # find the valid and invalid times
#'   id1 <- which(t <= e | t >= s)
#'   id2 <- which(!(t <= e | t >= s))
#'   t <- t[id2]
#'   
#'   # initialize result vector
#'   res <- vector()
#'   
#'   # if user wants a log function
#'   if (log) {
#'    # invalid times get -Inf
#'    res[id1] <- -Inf
#'   
#'    # valid times calculated with log
#'    res[id2] <- log(((s - t) ^ 2) * ((-e + t) ^ 2) / 
#'                  ((s - e) ^ 5 * beta(a, b)))
#'   }
#'   
#'   # otherwise
#'   else{
#'    res[id1] <- 0
#'   
#'    res[id2] <- ((s - t) ^ 2) * ((-e + t) ^ 2) / ((s - e) ^ 5 * beta(a, b))
#'   }
#'   
#'   return(res)
#' }
#' 
#' # plot it for an example species who lived from 10 to 5 million years ago
#' plot(time, rev(dPERT(t = time, s = 10, e = 5, a = 1)), 
#'      main = "Age-dependence distribution",
#'      xlab = "Species age (My)", ylab = "Density",
#'      xlim = c(0, 5), type = "l")
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho, tMax = 10, adFun = dPERT, bins = bins, 
#'                         returnAll = TRUE)
#' # can use returnAll to get occurrences as both time points and ranges
#' 
#' # draw simulation with fossil occurrences as time points
#' draw.sim(sim, fossils = fossils)
#' # the warning is to let you know the ranges won't be used
#' 
#' # and also as ranges - we take out the column with true time points
#' draw.sim(sim, fossils = fossils[, -3])
#' 
#' ###
#' # we can have more parameters on adFun
#' 
#' # sampling rate
#' rho <- function(t) {
#'  return(1 + 0.5*t)
#' }
#' # since here rho is time-dependent, the function finds the
#' # number of occurrences using rho, and their distribution
#' # using a normalized rho * adFun
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # here we will use the triangular distribution
#' 
#' # age-dependence distribution
#' dTRI <- function(t, s, e, sp, md) {
#'   # make sure it is a valid TRI
#'   if (e >= s) {
#'    message("There is no TRI with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # another condition we must check
#'   if (md < e | md > s) {
#'    message("There is no TRI with md outside [s, e] interval")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # needed to vectorize the function:
#'   id1 <- which(t >= e & t < md)
#'   id2 <- which(t == md)
#'   id3 <- which(t > md & t <= s)
#'   id4 <- which( !(1:length(t) %in% c(id1,id2,id3)))
#'   
#'   # actually vetorizing function
#'   res <- vector()
#'   
#'   # (t >= e & t < md)
#'   res[id1] <- (2*(t[id1] - e)) / ((s - e) * (md - e))
#'   
#'   # (t == md)
#'   res[id2] <- 2 / (s - e)
#'   
#'   # (md < t & t <= s)
#'   res[id3] <- (2*(s - t[id3])) / ((s - e) * (s - md))
#'   
#'   # outside function's limits
#'   res[id4] <- 0
#'   
#'   return(res)
#' }
#' 
#' # set mode at 8
#' md <- 8
#'     
#' # plot it for an example species who lived from 10mya to the present
#' plot(time, rev(dTRI(time, 10, 5, 1, md)),
#'      main = "Age-dependence distribution",
#'      xlab = "Species age (My)", ylab = "Density",
#'      xlim = c(0, 5), type = "l")
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences for the first species
#' fossils <- sample.clade(sim, rho, tMax = 10, S = 1, adFun = dTRI, 
#'                         bins = bins, returnTrue = FALSE, md = md)
#' # note we provide the peak for the triangular sampling as an argument
#' # here that peak is assigned in absolute geological, but
#' # it usually makes more sense to express this in terms
#' # of age (a given percentile of the age, for instance) - see below
#' 
#' # draw simulation with fossil occurrences as ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' ###
#' # we can also have a hat-shaped increase through the duration of a species
#' # with more parameters than TS and TE, but with the parameters relating to
#' # the relative age of each lineage
#' 
#' # sampling rate
#' rho <- function(t) {
#'  return(1 + 0.1*t)
#' }
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # age-dependence distribution, with the "mde" of the triangle
#' # being exactly at the last quarter of the duration of EACH lineage
#' dTRImod1 <- function(t, s, e, sp) {
#'   # note that now we don't have the "md" parameter here,
#'   # but it is calculated inside the function
#'   
#'   # check if it is a valid TRI
#'   if (e >= s) {
#'    message("There is no TRI with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # calculate md
#'   md <- ((s - e) / 4) + e
#'   # md is at the last quarter of the duration of the lineage
#'   
#'   # please note that the same logic can be used to sample parameters
#'   # internally in the function, running for instance:
#'   # md <- runif (n = 1, min = e, max = s)
#'   
#'   # check it is a valid md
#'   if (md < e | md > s) {
#'    message("There is no TRI with md outside [s, e] interval")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # needed to vectorize function
#'   id1 <- which(t >= e & t < md)
#'   id2 <- which(t == md)
#'   id3 <- which(t>md & t <= s)
#'   id4 <- which( !(1:length(t) %in% c(id1,id2,id3)))
#'   
#'   # vectorize the function
#'   res<-vector()
#'   
#'   res[id1] <- (2 * (t[id1] - e)) / ((s - e) * (md - e))
#'   res[id2] <- 2 / (s - e)
#'   res[id3] <- (2 * (s - t[id3])) / ((s - e) * (s - md))
#'   res[id4] <- 0
#'   
#'   return(res)
#' }
#' 
#' # plot for a species living between 10 and 0 mya
#' plot(time, rev(dTRImod1(time, 10, 0, 1)),
#'     main = "Age-dependence distribution",
#'     xlab = "Species age (My)", ylab = "Density",
#'     xlim = c(0, 10), type = "l")
#' 
#' # sample first two species
#' fossils <- sample.clade(sim = sim, rho = rho, tMax = 10, adFun = dTRImod1)
#' 
#' # draw simulation with fossil occurrences as time points
#' draw.sim(sim, fossils = fossils)
#' 
#' # here, we fix md at the last quarter
#' # of the duration of the lineage
#' 
#' ###
#' # the parameters of adFun can also relate to each specific lineage,
#' # which is useful when using variable parameters for each species
#' # to keep track of those parameters after the sampling is over
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#' 
#' # sampling rate
#' rho <- 3
#' 
#' # get the par and par1 vectors
#' 
#' # get mins vector
#' minsPar <- ifelse(is.na(sim$TE), 0, sim$TE)
#' 
#' # a random time inside each species' duration
#' par <- runif(n = length(sim$TE), min = minsPar, max = sim$TS)
#'              
#' # its complement to the middle of the lineage's age.
#' par1 <- (((sim$TS - minsPar) / 2) + minsPar) - par
#' # note that the interaction between these two parameters creates a
#' # deterministic parameter, but inside the function one of them ("par")
#' # is a random parameter
#' 
#' dTRImod2 <- function(t, s, e, sp) {
#'   # make sure it is a valid TRI
#'   if (e >= s) {
#'    message("There is no TRI with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # md depends on parameters
#'   md <- par[sp] + par1[sp]
#'   
#'   # check that md is valid
#'   if (md < e | md > s) {
#'    message("There is no TRI with md outside [s, e] interval")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   id1 <- which(t >= e & t < md)
#'   id2 <- which(t == md)
#'   id3 <- which(t > md & t <= s)
#'   id4 <- which(!(1:length(t) %in% c(id1,id2,id3)))
#'   
#'   res <- vector()
#'   
#'   res[id1] <- (2*(t[id1] - e)) / ((s - e) * (md - e))
#'   res[id2] <- 2 / (s - e)
#'   res[id3] <- (2*(s - t[id3])) / ((s - e) * (s - md))
#'   res[id4] <- 0
#'   
#'   return(res)
#' }
#' 
#' # plot for a species living between 10 and 0 mya
#' plot(time, rev(dTRImod2(time, 10, 0, 1)),
#'     main = "Age-dependence distribution",
#'     xlab = "Species age (My)", ylab = "Density",
#'     xlim = c(0, 10), type = "l")
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho, tMax = 10, adFun = dTRImod2, bins = bins, 
#'                         returnTrue = FALSE)
#' 
#' # draw simulation with fossil occurrences as time ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' ###
#' # we can also have a mix of age-independent and age-dependent
#' # sampling in the same simulation
#' 
#' # set seed
#' set.seed(1)
#' 
#' # simulate a group
#' sim <- bd.sim(n0 = 1, lambda = 0.1, mu = 0.1, tMax = 10)
#'               
#' # sampling rate
#' rho <- 7
#' 
#' # define a uniform to represente age-independence
#' 
#' # age-dependence distribution (a uniform density distribution in age) 
#' # in the format that the function needs
#' custom.uniform <- function(t, s, e, sp) {
#'   # make sure it is a valid uniform
#'   if (e >= s) {
#'    message("There is no uniform function with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   res <- dunif(x = t, min = e, max = s)
#'   
#'   return(res)
#' }
#' 
#' # PERT as above
#' dPERT <- function(t, s, e, sp, a = 3, b = 3, log = FALSE) {
#'   # check if it is a valid PERT
#'   if (e >= s) {
#'    message("There is no PERT with e >= s")
#'    return(rep(NaN, times = length(t)))
#'   }
#'   
#'   # find the valid and invalid times
#'   id1 <- which(t <= e | t >= s)
#'   id2 <- which(!(t <= e | t >= s))
#'   t <- t[id2]
#'   
#'   # initialize result vector
#'   res <- vector()
#'   
#'   # if user wants a log function
#'   if (log) {
#'    # invalid times get -Inf
#'    res[id1] <- -Inf
#'   
#'    # valid times calculated with log
#'    res[id2] <- log(((s - t) ^ 2) * ((-e + t) ^ 2) / 
#'                  ((s - e) ^ 5 * beta(a, b)))
#'   }
#'   
#'   # otherwise
#'   else{
#'    res[id1] <- 0
#'   
#'    res[id2] <- ((s - t) ^ 2) * ((-e + t) ^ 2) / ((s - e) ^ 5 * beta(a, b))
#'   }
#'   
#'   return(res)
#' }
#' 
#' # actual age-dependency defined by a mix
#' dPERTAndUniform <- function(t, s, e, sp) {
#'   return(
#'     ifelse(t > 5, custom.uniform(t, s, e, sp),
#'            dPERT(t, s, e, sp))
#'   )
#' }
#' # starts out uniform, then becomes PERT 
#' # after 5my (in absolute geological time)
#' 
#' # plot it for an example species who lived from 10 to 0 million years ago
#' plot(time, rev(dPERTAndUniform(time, 10, 0, 1)),
#'     main = "Age-dependence distribution",
#'     xlab = "Species age (My)", ylab = "Density",
#'     xlim = c(0, 10), type = "l")
#' 
#' # bins for fossil ranges
#' bins <- seq(from = 10, to = 0, by = -1)
#' 
#' # simulate fossil occurrences data frame
#' fossils <- sample.clade(sim, rho, tMax = 10, adFun = dPERTAndUniform,
#'                         bins = bins, returnTrue = FALSE)
#' 
#' # draw simulation with fossil occurrences as ranges
#' draw.sim(sim, fossils = fossils)
#' 
#' # note how occurrences cluster close to the speciation time of
#' # species 1, but not its extinction time, since around 5mya
#' # the PERT becomes the effective age-dependence distribution
#' 
#' @name sample.clade
#' @rdname sample.clade
#' @export

sample.clade <- function(sim, rho, tMax, S = NULL, envR = NULL, rShifts = NULL,
                         returnTrue = TRUE, returnAll = FALSE, bins = NULL, 
                         adFun = NULL, ...) {
  # check that sim is a valid sim object
  if (!is.sim(sim)) {
    stop("Invalid argument, must be a sim object. See ?sim")
  }
  
  # make S all species if it is NULL
  if (is.null(S)) {
    S <- 1:length(sim$TE)
  }
  
  # if returnTrue is false and returnAll is true, the user is probably confused
  if (!returnTrue && returnAll) {
    stop("returnTrue is false and returnAll is true. 
         That behavior is undefined")
  }
  
  # test whether we return ranges
  returnRanges <- !returnTrue || returnAll

  # adjust bins and check it exists if it needs to
  if (!is.null(bins)) {
    # adjust it
    bins <- sort(bins, decreasing = TRUE)
    
    # if it doesn't include tMax
    if (max(bins) < tMax) {
      stop("Bins must include maximum time of simulation")
    }
  } else if (returnRanges) {
    stop("If returnTrue if false or returnAll is true, bins must be supplied")
  } 
  
  # if rho is not a constant, apply make.rate to it
  if (!is.numeric(rho) || length(rho) > 1) {
    rho <- make.rate(rho, tMax, envR, rShifts)
  }
  
  # make TE
  sim$TE[sim$EXTANT] <- 0
  
  # sample using Poisson process
  
  # independent of age (i.e. occurrences uniformly distributed through the 
  # lineage's age)
  if (is.null(adFun)) { 
    # find occurrences times
    pointEstimates <- sample.time(sim, rho, tMax, S)
    
    # which species left no occurrences
    zeroOccs <- which(lapply(pointEstimates, length) == 0)
    
    # tell the user
    message(paste0(length(zeroOccs), " species left no fossil"))
  } 
  
  # dependent of age (i.e. occurrences distributed through the lineage's age 
  # according to adFun)
  else { 
    # find occurrence times
    pointEstimates <- sample.age.time(sim = sim, rho = rho, tMax = tMax, S = S,
                                 adFun = adFun, ...)
  }

  # names res will have regardless of what to return
  baseNames <- c("Species", "Extant")
  
  # rest will depend on the returns
  resNames <- c(baseNames, if (returnTrue) "SampT",
                if (returnRanges) c("MaxT", "MinT"))
  
  # create data frame
  res <- data.frame(matrix(nrow = 0, ncol = length(resNames)))
  
  # name the columns
  colnames(res) <- resNames
  
  # for each species
  for (sp in S) {
    # get the occurrences for this species
    occs <- pointEstimates[[paste0("t", sp)]]
    
    # only matters if there are occurrences for sp
    if (length(occs) > 0) {
      # if we want true occurrence times, only SampT matters
      if (!returnRanges) {
        # create auxiliary data frame
        aux <- data.frame(Species = rep(sp, length(occs)),
                          Extant = NA, 
                          SampT = occs)
        
        # add it to res
        res <- rbind(res, aux)
      } 
      
      # otherwise, we need to bin it
      else {
        # bin it
        binnedOccs <- binner(occs, bins = bins)
        
        # counter for point estimates
        counter <- 1
        
        # for each bin
        for (k in 1:(length(bins) - 1)) {
          # create aux data frame
          aux <- data.frame(matrix(nrow = binnedOccs[k], 
                                   ncol = length(resNames)))
          colnames(aux) <- colnames(res)
          
          # if there are occurrences in that bin
          if (binnedOccs[k] > 0) {
            # indices that matter for aux
            ind <- 
            # add species and extant to aux
            aux$Species <- rep(sp, times = binnedOccs[k])
            
            # if returnTrue is true, add SampT
            if (returnTrue) {
              aux$SampT <- occs[counter:(counter + binnedOccs[k] - 1)]
            }
            
            # add MaxT and MinT
            aux$MaxT <- rep(bins[k], times = binnedOccs[k])
            aux$MinT <- rep(bins[k + 1], times = binnedOccs[k])
            
            # add row to data frame
            res <- rbind(res, aux)
            
            # increase counter
            counter <- counter + binnedOccs[k]
          }
        }
      }
    }
  }
  
  if (nrow(res) > 0) {
    # make the extant column
    res$Extant <- FALSE
    
    # based on the vector in sim
    res$Extant[res$Species %in% which(sim$EXTANT)] <- TRUE
    
    # and the species column
    res$Species <- paste0("t", res$Species)
  }
  
  # return
  return(res)
}
brpetrucci/paleobuddy documentation built on July 26, 2023, 8:15 p.m.