fitMCMC_bdRho: Bayesian fit of the birth-death rho model on a phylogeny or a...

View source: R/fitMCMC_bdRho.R

fitMCMC_bdRhoR Documentation

Bayesian fit of the birth-death rho model on a phylogeny or a set of phylogenies

Description

Fits the birth-death rho model (a constant-time homogeneous birth-death model under Bernoulli sampling) to a rooted ultrametric phylogeny using Bayesian inference. The birth-death process is conditioned on the starting time of the process tot_time and the survival of the process at present time. The inference can be done specifying the sampling probability or integrating over it according to a specified sampling probability distribution (either uniform: unif = TRUE or beta distribution: beta = TRUE). This function can fit the birth-death rho model on a stem or crown phylogeny or a set of phylogenies assuming common or specific diversification rates. It is by default parametrised on the net diversification rate and the turnover rate but can be reparametrised on the product y*λ and the net diversification rate r. This function is specifically adapted for diversification analysis on phylogenies on which the sampling probability is unknown.

Usage

fitMCMC_bdRho(
  phylo,
  tot_time,
  y = NULL,
  reparam = FALSE,
  common = TRUE,
  beta = FALSE,
  unif = TRUE,
  a = 0,
  b = 1,
  afix = TRUE,
  bfix = TRUE,
  cond = "crown",
  YULE = FALSE,
  dt = 0,
  rel.tol = 1e-10,
  tuned_dichotomy = TRUE,
  brk = 2000,
  savedBayesianSetup = NULL,
  mcmcSettings = NULL,
  prior = NULL,
  parallel = FALSE,
  save_inter = NULL,
  index_saving = NULL
)

Arguments

phylo

Object of class phylo or multiPhylo. A rooted ultrametric phylogeny of class phylo or a set of rooted ultrametric phylogenies of class multiPhylo. The rooted ultrametric phylogenie(s) can have polytomie(s) (i.e. non binary tree).

tot_time

Numeric vector. The stem or crown age (also called MRCA) of the phylogenie(s) depending on the conditioning of the process specified (see cond argument) and the phylogenie(s) used accordingly. The length of the numeric vector equals the number of phylogenie(s) used. It is of length 1 if a unique phylogeny is used. The stem age of the phylogeny can be computed using max(TreeSim::getx(phylo))+phylo$root.edge (note that the phylo$root.edge needs to be known) and the crown age of the phylogeny can be computed using max(TreeSim::getx(phylo)). If multiple phylogenies are used the following can be used for calculating the stem age: sapply(seq_along(multiPhylo), function(i) max(TreeSim::getx(multiPhylo[[i]]))+multiPhylo[[i]]$root.edge) and the crown age: sapply(seq_along(multiPhylo), function(i) max(TreeSim::getx(multiPhylo[[i]]))). Note that if multiple phylogenies are used, the phylogenies do not need to have the same root age but they need to be conditioned the same way (for all phylogenies either their stem or crown age).

y

Numeric vector. The sampling probabilitie(s) typically calculated as k/N where k is the number of extant sampled tips and N is the global diversity of the clade. The length of the numeric vector equals the number of phylogenie(s) used. If NULL (default option) and reparam = FALSE, the sampling probabilitie(s) are integrated according to the specified sampling probability distribution (corresponding to the model birth-death∫ρ in Lambert et al. 2022).

reparam

Logical. If FALSE (the default option), the log likelihood is calculated using the parameters r and ε. If TRUE and yj = NULL, the log likelihood is parametrised using the product y*λ and the net diversification rate r.

common

Logical. This argument is only used when a set of phylogenies are provided in the phylo argument. If TRUE (default option), common diversification rates are inferred for the set of phylogenies used. If FALSE, each phylogeny will have its own specific diversification rates inferred.

beta

Logical. This argument is only used if y = NULL and reparam = FALSE. If TRUE a beta distribution is assumed on the sampling probabilitie(s). Note that the parameters of the beta distribution can be fixed or inferred.

unif

Logical. This argument is only used if y = NULL and reparam = FALSE. If TRUE (default option) a uniform distribution is assumed on the sampling probabilitie(s). Note that the parameters of the uniform distribution can be fixed or inferred.

a

Numeric. This argument is only used if y = NULL, reparam = FALSE and afix = TRUE. It corresponds to the value of α (α>0) or a (the lower bound 0≤a<1) respectively for the beta or the uniform distribution on the sampling probabilitie(s).

b

Numeric. This argument is only used if y = NULL, reparam = FALSE and bfix = TRUE. It corresponds to the value of β (β>0) or b (the higher bound (0<b≤1 and b>a)) respectively for the beta and the uniform distribution on the sampling probabilitie(s).

afix

Logical. This argument is only used if y = NULL and reparam = FALSE. If TRUE (the default option), the hyperparameter a of the model is fixed. If FALSE, the hyperparameter a of the model is inferred.

bfix

Logical. This argument is only used if y = NULL and reparam = FALSE. If TRUE (the default option), the hyperparameter b of the model is fixed. If FALSE, the hyperparameter b of the model is inferred.

cond

Character. Specifying the conditioning of the birth-death process. Two conditioning are available, either cond = "crown" (the default option) if the phylogeny used is a crown phylogeny or cond = "stem" if the phylogeny used is a stem phylogeny. Note that if a set of phylogenies are used, they will be conditioned the same way according to this argument.

YULE

Logical. If TRUE, the extinction rate μ thus the turnover rate ε are fixed to 0 and the net diversification rate r equals the speciation rate λ. If FALSE (the default option), the turnover rate ε is not fixed to 0 and is thus inferred. This option is not available if the model is reparametrised (reparam = TRUE).

dt

Numeric. This argument is only used if y = NULL and reparam = FALSE. If dt = 0, the integral on the sampling probabilitie(s) is computed using the R stats::integrate function. If dt≥0, the integral of the sampling probabilitie(s) is performed manually using a piece-wise constant approximation. dt represents the length of the interval on which the function integrated is assumed to be constant. For manual integral, advised value of dt are 1e-3 to 1e-5.

rel.tol

Numeric. This argument is only used if y = NULL, reparam = FALSE and dt = 0. This represents the relative accuracy requested when the integral is performed using the stats::integrate function. Typically .Machine$double.eps^0.25 is used but a value of 1e-10 (the default value) has been tested and performs well.

tuned_dichotomy

Logical. This argument is only used if y = NULL and reparam = FALSE. If TRUE, when the log likelihood of the model is equal to non finite value due to approximations, a dichotomy search is performed to find a tuning parameter that will be used for getting a finite value of the log likelihood. If TRUE, the log likelihood will take longer to calculate. Else if FALSE, no dichotomy search is performed; if the log likelihood is equal to non finite value due to approximations, the log likelihood will take this non finite value for the corresponding parameters.

brk

Numeric. This argument is only used if y = NULL, reparam = FALSE and tuned_dichotomy = TRUE. The number of steps used in the dichotomy search. Typically the value 200 is sufficient to avoid non finite values. In some case if the log likelihood is still equal to non finite value, the brk value 2000 will be required for more tuning but it will rarely take a larger value.

savedBayesianSetup

BayesianOutput. A BayesianOutput created by fitMCMC_bdRho. If NULL (the default option), no previous MCMC run is continued and the Bayesian inference start from scratch. If a BayesianOutput is provided the Bayesian inference continue the previous MCMC run.

mcmcSettings

List. A list of settings for the Bayesian inference using the sampler DEzs of the package BayesianTools. Typically, the number of iterations and the starting values will be specified as the following example: mcmcSettings = list(iterations = 3*nbIter, startValue = startValueMatrix) where 3 is the number of chains, nbIter is the number of iterations and, startValueMatrix is a matrix containing parameters starting values for the MCMC chains. In this example this matrix takes 3 rows (one for each chain) and the number of columns equals to the number of parameters to infer. Check BayesianTools::runMCMC() for more details on the settings options for the sampler DEzs.

prior

Prior or function. Either a prior class (see BayesianTools::createPrior()) or a log prior density function.

parallel

Numeric or logical. If FALSE (the default option), the calculation of the likelihood is not parallelised. If >1, the calculation of the likelihood is parallelised. Note that parallelising the computation is not always faster. This should be checked and depends on the number of cores used for the parallelisation.

save_inter

Numeric vector. A vector specifying the timings at which the MCMC chains should be saved for checkpointing. This can be computed using the following example : c(seq(from = proc.time()[3], to = proc.time()[3]+maxTime, by = freqTime),stopTime). It is particularly useful when launching the inference on a cluster where some time restrictions exist.

index_saving

Factor. A factor specifying the name of the MCMC chains saved during the checkpointing. The MCMC chains will be saved as a RDS file in your working directory and will have the following syntax chainMWindex_saving.RDS.

Details

This function will fit different birth-death models depending on the arguments chosen:

  • If a unique phylogeny is used and the corresponding sampling probability is given in y, then the classical birth-death-sampling model is used and the function will infer the net diversification rate and the turnover rate.

  • If a unique phylogeny is used, y = NULL and the model is set for being reparametrised reparam = TRUE, then the reparametrised birth-death-sampling model is used and the function will infer the net diversification rate and the product of the sampling probability and speciation rate y*λ.

  • If a unique phylogeny is used, y = NULL and reparam = FALSE, then the birth-death∫ρ model is used and the function will infer the net diversification rate, the turnover rate, and hyperparameters of the sampling probability distribution a and b depending if they are set to be fixed or not (see afix and bfix arguments). Make sure the desired sampling probability distribution is chosen (see beta and unif arguments). See phi for more details).

  • If a unique phylogeny is used and YULE = TRUE, then the corresponding model will be used with one parameter less since the turnover rate will be fixed to 0 and thus will not be inferred. Note that this option is not available if the model is reparametrised.

  • If a set of phylogenies are used for fitting the model and common = TRUE, then the birth-death∫ρ_mult model is used and the function will infer the same number of parameters as birth-death∫ρ depending on whether the hyperparameters are set to be fixed or not (see afix and bfix arguments).

  • If a set of phylogenies are used for fitting the model and common = FALSE, then the birth-death∫ρ_mult_x model is used and the function will infer the specific net diversification rate per phylogenies, the turnover rate per phylogenies and the hyperparameters depending on whether they are set to be fixed or not (see afix and bfix arguments).

Note that depending on the model chosen, the number of parameters inferred can vary thus the prior and the mcmcSettings should be adapted to the number of parameters and the parameters should be ordered as described above. This function is specifically intended to be used on phylogenies with unknown or highly uncertain global diversity estimates (the sampling probability is not known with accuracy). Note that the sampling probability is never estimated and that this function is not able to evaluate negative rates.

Value

Returns an object of class MCMC_bd. This MCMC_bd object is a list containing the name of the birth-death model performed and an object of class "mcmcSampler" "bayesianOutput" (see the output of BayesianTools::runMCMC()). This second object contains the MCMC chains and the information about the MCMC run. For analysis of the chains, it can be converted to a coda object (BayesianTools::getSample()) or used in line with the appropriate functions e.g. BayesianTools::MAP().

Author(s)

Sophia Lambert

See Also

likelihood_bdRho and fitMCMC_bdK

Examples

# Creating a phylogeny with 0.05 net diversification rate and 0.5 turnover rate.

set.seed(1234)
tree1 <- TESS::tess.sim.age(1, 100, lambda = 0.1, mu = 0.05, MRCA = TRUE, samplingProbability = 0.5)[[1]]
plot(tree1, root.edge = TRUE)

# Creating variables to give to arguments

tot_time <- max(TreeSim::getx(tree1))
Ntips <- ape::Ntip(tree1)
lamb_moments <- log(Ntips)/tot_time
ysim <- 0.5

# Creating setting for MCMC

densityTest4 = function(x) {
  sum(dunif(x[1], min = 0, max = 1, log =TRUE)) + sum(dunif(x[2], min = 0, max = 1, log =TRUE))
}
samplerTest4 = function(n=1){
  s1 = runif(n, min = 0, max = 1)
  s2 = runif(n, min = 0, max = 1)
  return(cbind(s1,s2))
}
priorTest4 <- BayesianTools::createPrior(density = densityTest4, sampler = samplerTest4,
                                         lower = c(0,0), upper = c(1,1), best = NULL)
StartValueDTest4 = c(lamb_moments, runif(2, min = 0, max = 0.1))
StartValueEpsiTest4 = runif(3, min = 0, max = 1)
startValueTest4 = matrix(data = c(StartValueDTest4, StartValueEpsiTest4), nrow = 3, ncol = 2)

# Parameters for the checkpointing

nbIter <- 20000
maxTime <- 60*60*19.3 # 20 hours max (tiny less because of some processing issues)
stopTime <- 60*60*20
freqTime <- 60*60*3.21 # save every 3 hours
previousMCMC = NULL

# Fitting the birth-death∫rho model

res_fitMCMC_M1 <- fitMCMC_bdRho(phylo = tree1,
                                tot_time = tot_time, y = NULL,
                                reparam = FALSE, common = FALSE,
                                beta = FALSE, unif = TRUE,
                                a = 0, b = 1,
                                afix = TRUE, bfix =TRUE,
                                cond = "crown", YULE = FALSE,
                                dt = 0, rel.tol = 1e-10,
                                tuned_dichotomy = TRUE,
                                brk = 2000,
                                savedBayesianSetup = previousMCMC,
                                mcmcSettings = list(iterations = 3*nbIter,
                                                    startValue = startValueTest4),
                                prior = priorTest4,
                                parallel = FALSE, save_inter =
                                  c(seq(from = proc.time()[3], to = maxTime, by = freqTime),stopTime),
                                index_saving = as.factor("M1_tree1"))

plot(res_fitMCMC_M1$mcmc)

# Fitting the classical birth-death-sampling model

res_fitMCMC_M5 <- fitMCMC_bdRho(phylo = tree1,
                                tot_time = tot_time, y = ysim,
                                reparam = FALSE, common = FALSE,
                                beta = FALSE, unif = FALSE,
                                a = NULL, b = NULL,
                                afix = NULL, bfix =NULL,
                                cond = "crown", YULE = FALSE,
                                dt = 0, rel.tol = 1e-10,
                                tuned_dichotomy = TRUE,
                                brk = 2000,
                                savedBayesianSetup = previousMCMC,
                                mcmcSettings = list(iterations = 3*nbIter,
                                                    startValue = startValueTest4),
                                prior = priorTest4,
                                parallel = FALSE, save_inter =
                                  c(seq(from = proc.time()[3], to = maxTime, by = freqTime),stopTime),
                                index_saving = as.factor("M5_tree1"))
plot(res_fitMCMC_M5$mcmc)

sophia-lambert/UDivEvo documentation built on Sept. 27, 2022, 11:05 p.m.