sample.clade.traits: Trait-dependent fossil sampling

View source: R/sample.clade.traits.R

sample.clade.traitsR Documentation

Trait-dependent fossil sampling

Description

Generates occurrence times or time ranges (as most empirical fossil occurrences) for each of the desired species using a Poisson process. Poisson rate should be dependent on some discrete trait, the value of which for each species will be supplied using the parameter traits. Rate can be dependent on observed traits only, or on a combination of observed and hidden traits (in which case the supplied trait data frame traits should have all possible states, observed or hidden, see examples for more details).

Usage

sample.clade.traits(
  sim,
  rho,
  tMax,
  traits,
  nFocus = 1,
  nStates = 2,
  nHidden = 1,
  S = NULL,
  returnTrue = TRUE,
  returnAll = FALSE,
  bins = NULL
)

Arguments

sim

A sim object, containing extinction times, speciation times, parent, and status information (extant or extinct) for each species in the simulation. See ?sim.

rho

Sampling rate (per species per million years) over time. It is a vector of rates, corpointEstimatesponding to the value of the rate for each value of the traits encoded in the traits parameter. It should therefore be of length nStates * nHidden. Note that rho should always be greater than or equal to zero.

tMax

The maximum simulation time, used by rexp.var. A sampling time greater than tMax would mean the occurrence is sampled after the present, so for consistency we require this argument. This is also required to ensure time follows the correct direction both in the Poisson process and in the output.

traits

List of trait data frames, usually one of the returns of bd.sim. traits[[i]][[j]] should corpointEstimatespond to the jth trait data frame for species i. The data frames contain the following columns

value

A vector of trait values the species took at specific intervals of time.

max

A vector of time values corpointEstimatesponding to the upper bound of each interval.

min

A vector of time values corpointEstimatesponding to the lower bound of each interval

nFocus

Trait of focus, i.e. the one that rho depends on. Note that traits can have multiple trait data frames per species, but only one of the simulated traits can affect fossil sampling rates. E.g. if nFocus = 1, then the first trait data frame per species will be used to simulate fossil occurrences.

nStates

Number of possible states for categorical trait. The range of values will be assumed to be (0, nStates - 1).

nHidden

Number of hidden states for categorical trait. Default is 1, in which case there are no added hidden traits. Total number of states is then nStates * nHidden. States will then be set to a value in the range of (0, nStates - 1) to simulate that hidden states are hidden. This is done by setting the value of a state to the remainder of state / nStates. E.g. if nStates = 2 and nHidden = 3, possible states during simulation will be in the range (0, 5), but states (2, 4) (corpointEstimatesponding to (0B, 0C) in the nomenclature of the original HiSSE reference) will be set to 0, and states (3, 5) (corpointEstimatesponding to (1B, 1C)) to 1.

Note that since the traits is supplied as a parameter, the user must ensure that all states from 0 to nStates * nHidden - 1 are reppointEstimatesented in the trait information. See examples for more details on how to properly run hidden-states fossil sampling simulations.

S

A vector of species numbers to be sampled. The default is all species in sim. Species not included in S will not be sampled by the function.

returnTrue

If set to FALSE, it will contain the occurrence times as ranges. In this way, we simulate the granularity presented by empirical fossil records. If returnTrue is TRUE, this is ignored.

returnAll

If set to TRUE, returns both the true sampling time and age ranges. Default is FALSE.

bins

A vector of time intervals corresponding to geological time ranges. It must be supplied if returnTrue or returnAll is TRUE.

Details

Optionally takes a vector of time bins reppointEstimatesenting geologic periods, if the user wishes occurrence times to be reppointEstimatesented as a range instead of true points.

Value

A data.frame containing species names/numbers, whether each species is extant or extinct, and the true occurrence times of each fossil, a range of occurrence times based on bins, or both. Also a list object with the trait data frames describing the trait value for each species at each specified interval. Note that this list will only be different from the supplied traits parameter if nHidden > 1, in which case it will transform hidden traits into observed ones (see details for parameter nHidden).

Author(s)

Bruno do Rosario Petrucci.

Examples


###
# first a simple BiSSE simulation, with 
# binary state-dependent fossil sampling

# initial number of species
n0 <- 1

# maximum simulation time
tMax <- 20

# speciation, higher for state 1
lambda <- c(0.1, 0.2)

# extinction, highe for state 0
mu <- c(0.06, 0.03)

# number of traits and states (1 binary trait)
nTraits <- 1
nStates <- 2

# initial value of the trait
X0 <- 0

# transition matrix, with symmetrical transition rates
Q <- list(matrix(c(0, 0.1,
                   0.1, 0), ncol = 2, nrow = 2))

# set seed
set.seed(1)

# run the simulation
sim <- bd.sim.traits(n0, lambda, mu, tMax = tMax, nTraits = nTraits,
                    nStates = nStates, X0 = X0, Q = Q, nFinal = c(2, Inf))
                    
# now a fossil sampling rate, with higher rate for state 1
rho <- c(0.5, 1)

# run fossil sampling
fossils <- sample.clade.traits(sim$SIM, rho, tMax, sim$TRAITS)

# draw simulation with fossil occurrences as time points
draw.sim(sim$SIM, traits = sim$TRAITS, 
         fossils = fossils$FOSSILS, traitLegendPlacement = "bottomleft")

###
# can also run a HiSSE model, with hidden traits having an effect on rates

# initial number of species
n0 <- 1

# maximum simulation time
tMax <- 20

# speciation, higher for state 1A, highest for 1B
lambda <- c(0.1, 0.2, 0.1, 0.3)

# extinction, lowest for 0B
mu <- c(0.03, 0.03, 0.01, 0.03)

# number of traits and states--in this case, we just run with 4 observed 
# states, so that our traits data frames will include that info for sampling
nTraits <- 1
nStates <- 4

# initial value of the trait
X0 <- 0

# transition matrix, with symmetrical transition rates. Only one transition
# is allowed at a time, i.e. 0A can go to 0B and 1A,
# but not to 1B, and similarly for others
Q <- list(matrix(c(0, 0.1, 0.1, 0,
                   0.1, 0, 0, 0.1,
                   0.1, 0, 0, 0.1,
                   0, 0.1, 0.1, 0), ncol = 4, nrow = 4))

# set seed
set.seed(1)

# run the simulation
sim <- bd.sim.traits(n0, lambda, mu, tMax, nTraits = nTraits,
                    nStates = nStates,
                    X0 = X0, Q = Q, nFinal = c(2, Inf))
# note how sim$TRAITS contains states 2 and 3, even though this
# is a HiSSE model, because we need that information to run hidden states
# on fossil sampling as well (see below)
                    
# now a fossil sampling rate, with higher rate for state 1A, 
# and highest yet for state 1B
rho <- c(0.5, 1, 0.5, 2)

# bins for fossil occurrences
bins <- seq(tMax, 0, -1)

# run fossil sampling, with returnAll = TRUE to include ranges
fossils <- sample.clade.traits(sim$SIM, rho, tMax, sim$TRAITS, 
                               nStates = 2, nHidden = 2, 
                               returnAll = TRUE, bins = bins)
# note how fossils$TRAITS only contains trait values 0 and 1, similar to
# if we had ran bd.sim.traits with nHidden = 2, since sample.clade.traits
# did the job of transforming observed into hidden states

# draw simulation with fossil occurrences as time points AND ranges
draw.sim(sim$SIM, traits = sim$TRAITS, fossils = fossils$FOSSILS,
         fossilsFormat = "all", traitLegendPlacement = "bottomleft")
# note that there are a lot of fossils, so the ranges are difficult to see

# see ?sample.clade for further examples using different return types
# (fossil ranges etc.), and ?bd.sim.traits for more examples using
# higher numbers of states (hidden or observed), though in 
# sample.clade.traits we always have only one trait of focus


brpetrucci/PaleoBuddy documentation built on Feb. 28, 2025, 3:53 p.m.