calc_loglike_sp_stratified: Calculate log-likelihood with a transition matrix and...

Description Usage Arguments Value Note Author(s) References See Also Examples

View source: R/BioGeoBEARS_stratified_v1.R


This function is the stratified version of calc_loglike_sp.


    phy, Qmat = NULL, spPmat = NULL,
    min_branchlength = 1e-21, return_what = "loglike",
    probs_of_states_at_root = NULL, rootedge = TRUE,
    sparse = FALSE, printlevel = 0, use_cpp = TRUE,
    input_is_COO = FALSE, spPmat_inputs = NULL,
    cppSpMethod = 3, cluster_already_open = NULL,
    calc_ancprobs = FALSE, null_range_allowed = TRUE,
    fixnode = NULL, fixlikes = NULL, inputs = inputs,
    allareas = allareas, all_states_list = all_states_list,
    return_condlikes_table = FALSE,
    calc_TTL_loglike_from_condlikes_table = TRUE)



A numeric matrix with rows representing tips, and columns representing states/geographic ranges. The cells give the likelihood of the observation data under the assumption that the tip has that state; typically this means that the known geographic range gets a '1' and all other states get a 0.


A phylogeny object. The function converts it to pruningwise order.


A Q transition matrix representing the along-branch model for the evolution of geographic range, using parameters d (dispersal/range expansion), e (extinction/range contraction/local extirpation), and perhaps others (e.g. distance). This matrix can be input in either dense or sparse (COO) format, as specified by input_is_COO.


Default is NULL; users should usually use spPmat_inputs. spPmat is A numeric matrix representing the probability of each ancestor range–>(Left range, Right range) transition at cladogenesis events. There are different ways to represent this matrix. In the simplest representation, this is just a rectangular matrix with numstates rows (representing the ancestral states) and numstates^2 columns (representing all possible descendant pairs). Use of this type of matrix is specified by cppSpMethod=1. It is calculated from a textual speciation matrix (typically spmat in the code) via symbolic_to_relprob_matrix_sp. However, this matrix gets huge and slow for large numbers of states/ranges. cppSpMethod=2 and cppSpMethod=3 implement successively more efficient and faster representation and processing of this matrix in COO-like formats. See rcpp_calc_anclikes_sp_COOprobs for the cppSpMethod=2 method, and rcpp_calc_anclikes_sp_COOweights_faster for the cppSpMethod=3 method (the fastest).


Nodes with branches below this branchlength will not be treated as cladogenesis events; instead, they will be treated as if an OTU had been sampled from an anagenetic lineage, i.e. as if you had a direct ancestor. This is useful for putting fossils into the biogeography analysis, when you have fossil species that range through time. (Note: the proper way to obtain such trees, given that most phylogenetic methods force all OTUs to be tips rather than direct ancestors, is another question subject to active research. However, one method might be to just set a branch-length cutoff, and treat any branches sufficiently small as direct ancestors.)


What should be returned to the user? Options are "loglike" (the log-likelihood of the data under the tree, model, and model parameters), "nodelikes" (the scaled conditional likelihoods at the nodes), "rootprobs" (the relative probability of the geographic ranges/states at the root), or "all" (all of the above in a list). Typically the user will only want to return "loglike" while doing ML optimization, but then return "all" once the ML parameter values have been found.


The prior probability of the states/geographic ranges at the root. The default, NULL, effectively means an equal probability for each state (this is also what LAGRANGE assumes; and running with NULL will reproduce exactly the LAGRANGE parameter inferences and log-likelihood).


Should the root edge be included in the calculation (i.e., calculate to the bottom of the root), if a root edge is present? Default FALSE.


Should sparse matrix exponentiation be performed? This should be faster for very large matrices (> 100-200 states), however, the calculations appear to be less accurate. The function will transform a dense matrix to COO format (see mat2coo) if necessary according to the input_is_COO parameter.


If >= 1, various amounts of intermediate output will be printed to screen. Note: Intermediate outputs from C++ and FORTRAN functions have been commented out, to meet CRAN guidelines.


Should the C++ routines from cladoRcpp be used to speed up calculations? Default TRUE.


Is the input Q matrix a sparse, COO-formatted matrix (TRUE) or a standard dense matrix (FALSE). Default FALSE.


A list of parameters so that spPmat (the speciation transition probability matrix) can be calculated on-the-fly, according to the method in cppSpMethod. See example.


Three C++ methods from cladoRcpp for calculating and using the cladogenesis probability matrix. 1 is slowest but easiest to understand; 3 is fastest. If spPmat_inputs is given, the program will generate the appropriate spPmat on-the-fly, and the user does not have to input the full spPmat manually.


If the user wants to distribute the matrix exponentiation calculations from all the branches across a number of processors/nodes on a cluster, specify the cluster here. E.g. cluster_already_open = makeCluster(rep("localhost",num_cores_to_use), type = "SOCK"). Note: this will work on most platforms, including Macs running R from command line, but will NOT work on Macs running the R GUI, because parallel processing functions like MakeCluster from e.g. library(parallel) for some reason crash The program runs a check for and will just run on 1 node if found.


Should ancestral state estimation be performed (adds an uppass at the end).


Does the state space include the null range? Default is NULL which means running on a single processor.


If the state at a particular node is going to be fixed (e.g. for ML marginal ancestral states), give the node number. (Trial implementation for stratified analysis.)


The state likelihoods to be used at the fixed node. I.e. 1 for the fixed state, and 0 for the others. (Trial implementation for stratified analysis.)


A list of inputs containing the dispersal matrix for each time period, etc.


A list of all the areas in the total analysis


A list of all the stats in the total analysis (0-based coding - ?)


If TRUE, return the table of ALL conditional likelihood results, including at branch subsections (only some should be used in calculating the final log-likelihood of the geography range data on the tree!)


If TRUE, force making of the condlikes table, and use it to calculate the log-likelihood (default=TRUE; matches LAGRANGE).


grand_total_likelihood The total log-likelihood of the data on the tree (default). Or, if return_condlikes_table==TRUE, the function returns calc_loglike_sp_stratified_results, with calc_loglike_sp_stratified_results$condlikes_table and calc_loglike_sp_stratified_results$grand_total_likelihood as list items. This can be useful for debugging stratified analyses, which have a lot of extra book-keeping that is easy to mess up.



(COO = Coordinate list format for a matrix, see


Nicholas Matzke





See Also

calc_loglike_sp, rcpp_calc_anclikes_sp, rcpp_calc_anclikes_sp_COOprobs, rcpp_calc_anclikes_sp_COOweights_faster, mat2coo, rcpp_calc_anclikes_sp_COOweights_faster



Example output

Loading required package: rexpokit
Loading required package: cladoRcpp
Loading required package: ape
Loading required package: phylobase

Attaching package: 'phylobase'

The following object is masked from 'package:ape':


BioGeoBEARS documentation built on May 29, 2017, 8:36 p.m.