calc_loglike_sp: Calculate log-likelihood with a transition matrix and...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

Description

This is the workhorse function of BioGeoBEARS. It calculates the likelihood of the tip data (the geographic ranges observed at the tips) given a phylogenetic tree, a Q transition matrix specifying the model of range evolution along branches, and a speciation probability matrix specifying the probability of the various possible ancestor–>(Left descendant, Right descendant) range evolution events at phylogenetic nodes/speciation events.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  calc_loglike_sp(tip_condlikes_of_data_on_each_state, phy,
    Qmat, spPmat = NULL, min_branchlength = 1e-21,
    return_what = "loglike",
    probs_of_states_at_root = NULL, rootedge = FALSE,
    sparse = FALSE, printlevel = 1, use_cpp = TRUE,
    input_is_COO = FALSE, spPmat_inputs = NULL,
    cppSpMethod = 3, cluster_already_open = NULL,
    calc_ancprobs = FALSE, null_range_allowed = TRUE,
    fixnode = NULL, fixlikes = NULL, stratified = FALSE,
    states_allowed_TF = NULL)

Arguments

tip_condlikes_of_data_on_each_state

A numeric matrix with rows representing tips, and columns representing states/geographic ranges. The cells give the likelihood of the observation data under the assumption that the tip has that state; typically this means that the known geographic range gets a '1' and all other states get a 0.

phy

A phylogeny object. The function converts it to pruningwise order.

Qmat

A Q transition matrix representing the along-branch model for the evolution of geographic range, using parameters d (dispersal/range expansion), e (extinction/range contraction/local extirpation), and perhaps others (e.g. distance). This matrix can be input in either dense or sparse (COO) format, as specified by input_is_COO.

spPmat

Default is NULL; users should usually use spPmat_inputs. spPmat is A numeric matrix representing the probability of each ancestor range–>(Left range, Right range) transition at cladogenesis events. There are different ways to represent this matrix. In the simplest representation, this is just a rectangular matrix with numstates rows (representing the ancestral states) and numstates^2 columns (representing all possible descendant pairs). Use of this type of matrix is specified by cppSpMethod=1. It is calculated from a textual speciation matrix (typically spmat in the code) via symbolic_to_relprob_matrix_sp. However, this matrix gets huge and slow for large numbers of states/ranges. cppSpMethod=2 and cppSpMethod=3 implement successively more efficient and faster representation and processing of this matrix in COO-like formats. See rcpp_calc_anclikes_sp_COOprobs for the cppSpMethod=2 method, and rcpp_calc_anclikes_sp_COOweights_faster for the cppSpMethod=3 method (the fastest).

min_branchlength

Nodes with branches below this branchlength will not be treated as cladogenesis events; instead, they will be treated as if an OTU had been sampled from an anagenetic lineage, i.e. as if you had a direct ancestor. This is useful for putting fossils into the biogeography analysis, when you have fossil species that range through time. (Note: the proper way to obtain such trees, given that most phylogenetic methods force all OTUs to be tips rather than direct ancestors, is another question subject to active research. However, one method might be to just set a branch-length cutoff, and treat any branches sufficiently small as direct ancestors.)

return_what

What should be returned to the user? Options are "loglike" (the log-likelihood of the data under the tree, model, and model parameters), "nodelikes" (the scaled conditional likelihoods at the nodes), "rootprobs" (the relative probability of the geographic ranges/states at the root), or "all" (all of the above in a list). Typically the user will only want to return "loglike" while doing ML optimization, but then return "all" once the ML parameter values have been found.

probs_of_states_at_root

The prior probability of the states/geographic ranges at the root. The default, NULL, effectively means an equal probability for each state (this is also what LAGRANGE assumes; and running with NULL will reproduce exactly the LAGRANGE parameter inferences and log-likelihood).

rootedge

Should the root edge be included in the calculation (i.e., calculate to the bottom of the root), if a root edge is present? Default FALSE.

sparse

Should sparse matrix exponentiation be performed? This should be faster for very large matrices (> 100-200 states), however, the calculations appear to be less accurate. The function will transform a dense matrix to COO format (see mat2coo) if necessary according to the input_is_COO parameter.

printlevel

If >= 1, various amounts of intermediate output will be printed to screen. Note: Intermediate outputs from C++ and FORTRAN functions have been commented out, to meet CRAN guidelines.

use_cpp

Should the C++ routines from cladoRcpp be used to speed up calculations? Default TRUE.

input_is_COO

Is the input Q matrix a sparse, COO-formatted matrix (TRUE) or a standard dense matrix (FALSE). Default FALSE.

spPmat_inputs

A list of parameters so that spPmat (the speciation transition probability matrix) can be calculated on-the-fly, according to the method in cppSpMethod. See example.

cppSpMethod

Three C++ methods from cladoRcpp for calculating and using the cladogenesis probability matrix. 1 is slowest but easiest to understand; 3 is fastest. If spPmat_inputs is given, the program will generate the appropriate spPmat on-the-fly, and the user does not have to input the full spPmat manually.

cluster_already_open

If the user wants to distribute the matrix exponentiation calculations from all the branches across a number of processors/nodes on a cluster, specify the cluster here. E.g. cluster_already_open = makeCluster(rep("localhost",num_cores_to_use), type = "SOCK"). Note: this will work on most platforms, including Macs running R from command line, but will NOT work on Macs running the R GUI R.app, because parallel processing functions like MakeCluster from e.g. library(parallel) for some reason crash R.app. The program runs a check for R.app and will just run on 1 node if found.

calc_ancprobs

Should ancestral state estimation be performed (adds an uppass at the end).

null_range_allowed

Does the state space include the null range?#'

fixnode

If the state at a particular node is going to be fixed (e.g. for ML marginal ancestral states), give the node number.

fixlikes

The state likelihoods to be used at the fixed node. I.e. 1 for the fixed state, and 0 for the others.

stratified

Default FALSE. If TRUE, you are running a stratified analysis, in which case uppass probs should be calculated elsewhere.

states_allowed_TF

Default NULL. If user gives a vector of TRUE and FALSE values, these states will be set to 0 likelihood throughout the calculations.

Details

This likelihood calculation will be repeated many hundreds or thousands of times in any ML (maximum likelihood) or Bayesian estimation procedure. Thus, if the calculation of the log-likelihood of the data under one set of parameter values is too slow, inference takes days or becomes impossible. However, by using fast matrix exponentiation (package rexpokit) and fast C++ routines for calculating the probabilities of range inheritance scenarios at cladogenesis (package cladoRcpp), major speed gains can be achieved. Most of the complexity in the input parameters and the code serves these more rapid alternatives.

However, note that due to the explosion of the geographic range state space with more geographic areas (see numstates_from_numareas), any computational method that explicitly calculates the likelihood of all states will eventually become unusable between 8-20 areas, depending on details. An alternative method, which is fast for large numbers of areas, is BayArea, by Landis, Matzke, Moore, and Huelsenbeck; see Landis et al. (2013). However, BayArea does not currently implement cladogenesis models; it only has continuous-time model for evolutionary change along branches. In effect, this means that the cladogenesis model is sympatric speciation with complete range copying with probability 1.

Value

Return whatever is specified by return_what.

Note

Go BEARS!

(COO = Coordinate list format for a matrix, see http://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_.28COO.29

Author(s)

Nicholas Matzke matzke@berkeley.edu

References

Landis_Matzke_etal_2013_BayArea

Matzke_2012_IBS

ReeSmith2008

See Also

calc_loglike_sp, rcpp_calc_anclikes_sp, rcpp_calc_anclikes_sp_COOprobs, rcpp_calc_anclikes_sp_COOweights_faster, mat2coo, rcpp_calc_anclikes_sp_COOweights_faster

Examples

1
testval=1

Example output

Loading required package: rexpokit
Loading required package: cladoRcpp
Loading required package: ape
Loading required package: phylobase

Attaching package: 'phylobase'

The following object is masked from 'package:ape':

    edges

BioGeoBEARS documentation built on May 29, 2017, 8:36 p.m.