likelihood_bdRho: Likelihood birth-death rho function

View source: R/likelihood_bdRho.R

likelihood_bdRhoR Documentation

Likelihood birth-death rho function

Description

Computes the log likelihood of a rooted ultrametric phylogeny under a constant-time homogeneous birth-death model and Bernoulli sampling. The birth-death process is conditioned on the starting time of the process tottime and the survival of the process at present time. The log likelihood can be calculated specifying the sampling probability or integrating over it according to a specified sampling probability distribution (either uniform: unif = TRUE or beta distribution: beta = TRUE). This function can computes the log likelihood on a stem or crown phylogeny or a set of phylogenies assuming common or specific diversification rates. It is by default parametrised on the net diversification rate and the turnover rate but can be reparametrised on the product y*λ and the net diversification rate r. This function is specifically adapted for diversification analysis on phylogenies on which the sampling probability is unknown.

Usage

likelihood_bdRho(
  tottime,
  nbtips,
  tj,
  yj,
  reparam = FALSE,
  beta,
  unif,
  root,
  dt,
  rel.tol,
  tuned_dichotomy,
  brk
)

Arguments

tottime

Numeric vector. The stem or crown age (also called MRCA) of the phylogenie(s) depending on the conditioning of the process specified (see root argument) and the phylogenie(s) used. The length of the numeric vector equals the number of phylogenie(s) used. It is of length 1 if a unique phylogeny is used. The stem age of the phylogeny can be computed using max(TreeSim::getx(phylo))+phylo$root.edge (note that the phylo$root.edge needs to be known) and the crown age of the phylogeny can be computed using max(TreeSim::getx(phylo)). If multiple phylogenies are used the following can be used for calculating the stem age: sapply(seq_along(multiPhylo), function(i) max(TreeSim::getx(multiPhylo[[i]]))+multiPhylo[[i]]$root.edge) and the crown age: sapply(seq_along(multiPhylo), function(i) max(TreeSim::getx(multiPhylo[[i]]))). Note that if multiple phylogenies are used, the phylogenies do not need to have the same root age but they need to be conditioned the same way (for all phylogenies either their stem or crown age).

nbtips

Integer. The number of extant sampled tips in the phylogeny typically noted k.

tj

List of numeric vector(s). A list of atomic numeric vectors where each vector is specifying the node depths for each phylogeny. If only one phylogeny is used, the list is of length 1. The node depths of a phylogeny can be computed using TreeSim::getx(phylo)[!TreeSim::getx(phylo)==tottime] and using lapply(seq_along(multiPhylo), function(i) TreeSim::getx(multiPhylo[[i]])[!TreeSim::getx(multiPhylo[[i]])==tottime[i]]) for an object of class multiPhylo. This code works both for stem and crown phylogenies. Note that the tottime is not contained in node depths here since it is used for conditioning the process. Since there are k-1 internal nodes in a phylogeny and that the MRCA is not included in the node depths when conditioning in crown, then each numeric vector is of length k-2 if the phylogeny is in crown and of length k-1 if it is in stem with k being the number of extant sampled tips in the phylogeny.

yj

Numeric vector. The sampling probabilitie(s) typically calculated as k/N where k is the number of extant sampled tips and N is the global diversity of the clade. The length of the numeric vector equals the number of phylogenie(s) used. If NULL and reparam = FALSE, the sampling probabilitie(s) are integrated according to the specified sampling probability distribution (corresponding to the model birth-death∫ρ in Lambert et al. 2022).

reparam

Logical. If FALSE (the default option), the log likelihood is calculated using the parameters r and ε. If TRUE and yj = NULL, the log likelihood is parametrised using the product y*λ and the net diversification rate r.

beta

Logical. This argument is only used if yj = NULL and reparam = FALSE. If TRUE a beta distribution is assumed on the sampling probabilitie(s). Note that the parameters of the beta distribution can be fixed or inferred and can be specified in the function generated by the present function UDivEvo::likelihood_bdRho using the arguments a for the parameter α (α>0) and b for the parameter β (β>0). Check the Value section for more details.

unif

Logical. This argument is only used if yj = NULL and reparam = FALSE. If TRUE a uniform distribution is assumed on the sampling probabilitie(s). Note that the parameters of the uniform distribution can be fixed or inferred and can be specified in the function generated by the present function UDivEvo::likelihood_bdRho using the arguments a for the lower bound of the uniform distribution (0≤a<1) and b for the higher bound of the uniform distribution (0<b≤1 and b>a). Check the Value section for more details.

root

Integer. Specifying the conditioning of the birth-death process. root takes the value 0 if the phylogeny used is a stem phylogeny to condition the process on the stem age. It takes the value 1 if the phylogeny is a crown phylogeny to condition the process on the crown age.

dt

Numeric. This argument is only used if yj = NULL and reparam = FALSE. If dt = 0, the integral on the sampling probabilitie(s) is computed using the R stats::integrate function. If dt≥0, the integral of the sampling probabilitie(s) is performed manually using a piece-wise constant approximation. dt represents the length of the interval on which the function integrated is assumed to be constant. For manual integral, advised value of dt are 1e-3 to 1e-5.

rel.tol

Numeric. This argument is only used if yj = NULL, reparam = FALSE and dt = 0. This represents the relative accuracy requested when the integral is performed using stats::integrate function. Typically .Machine$double.eps^0.25 is used but a value of 1e-10 has been tested and performs well.

tuned_dichotomy

Logical. This argument is only used if yj = NULL and reparam = FALSE. If TRUE, when the log likelihood is equal to non finite value due to approximations, a dichotomy search is performed to find a tuning parameter that will be used for getting a finite value of the log likelihood. If TRUE, the log likelihood will take longer to calculate. Else if FALSE, no dichotomy search is performed; if the log likelihood is equal to non finite value due to approximations, the function will return this non finite value.

brk

Numeric. This argument is only used if yj = NULL, reparam = FALSE and tuned_dichotomy = TRUE. The number of steps used in the dichotomy search. Typically the value 200 is sufficient to avoid non finite values. In some case if the log likelihood is still equal to non finite value, the brk value 2000 will be required for more tuning but it will rarely take a larger value.

Details

This function is a closure, it takes all of the above as arguments and creates another function with arguments depending on the birth-death model chosen:

  • If a unique phylogeny is used and the corresponding sampling probability is given in yj, then the classical birth-death-sampling model is used and the function created takes div: the net diversification rate and turn: the turnover rate as arguments.

  • If a unique phylogeny is used, yj = NULL and the log likelihood is set for being reparametrised reparam = TRUE, then the reparametrised birth-death-sampling model is used and the function created takes div: the net diversification rate and ylamb: the product of the sampling probability and speciation rate y*λ as arguments.

  • If a unique phylogeny is used, yj = NULL and reparam = FALSE, then the birth-death∫ρ model is used and the function created takes tun.init: the initial tuning parameters values and seqphy.init: the phylogenies number for which to use the corresponding tuning parameter (1 for a unique phylogeny). This former function creates another function that takes as argument div: the net diversification rate, turn: the turnover rate, and hyperparameters of the sampling probabilitie(s) distribution a: the first argument of the sampling probabilitie(s) distribution chosen and b: the second argument of the sampling probabilitie(s) distribution chosen (see phi for more details).

  • If a set of phylogenies are used for computing the log likelihood then all the parameters (except the one relatives to the hyperparameters of the sampling probabilitie(s) distribution described above): div, turn, ylamb, tun.init and seqphy.init have to be of length the number of phylogenies used.

This function is specifically intended to be used on phylogenies with unknown or highly uncertain global diversity estimates (the sampling probability is not known with accuracy). Note that the sampling probability is never estimated and that this function is not able to evaluate negative rates.

Value

Returns an object of class function. This function can take the following arguments (as described above depending on the chosen model) and will return the value of the log likelihood alone or in a list together with the tuning parameters values (for the birth-death∫ρ model):

div

Numeric vector. The net diversification rate(s) also called r. The length of the numeric vector equals the number of phylogenie(s) used.

turn

Numeric vector. The turnover rate(s) also called ε. The length of the numeric vector equals the number of phylogenie(s) used.

ylamb

Numeric vector. The product of the sampling probabilitie(s) and speciation rates y*λ also called c in Stadler 2009. The length of the numeric vector equals the number of phylogenie(s) used.

tun.init

Numeric vector. The initial tuning parameters values. Typically, it will take the value log(1). The length of the numeric vector equals the number of phylogenie(s) used.

seqphy.init

Integer vector. A regular sequence of the number of phylogenie(s) used, i.e. if 10 phylogenies are used seqphy.init = c(1:10)

a

Numeric. The value of α or a respectively for the beta or the uniform distribution. This value cannot be negative for both distributions and cannot exceed b and 1 for the uniform distribution.

b

Numeric. The value of β or b respectively for the beta or the uniform distribution. This value cannot be negative for both distributions, cannot be inferior to a and cannot exceed 1 for the uniform distribution.

Author(s)

Sophia Lambert

References

Stadler, T. (2009). On incomplete sampling under birth–death models and connections to the sampling-based coalescent. Journal of theoretical biology, 261(1), 58-66.

See Also

fitMCMC_bdRho and likelihood_bdK


sophia-lambert/UDivEvo documentation built on Sept. 27, 2022, 11:05 p.m.