calc_loglike_sp_prebyte: Calculate log-likelihood with a transition matrix and...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/calc_loglike_sp_v01.R

Description

This function is the pre-byte-compiled version of calc_loglike_sp.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
  calc_loglike_sp_prebyte(tip_condlikes_of_data_on_each_state,
    phy, Qmat, spPmat = NULL, min_branchlength = 1e-21,
    return_what = "loglike",
    probs_of_states_at_root = NULL, rootedge = FALSE,
    sparse = FALSE, printlevel = 1, use_cpp = TRUE,
    input_is_COO = FALSE, spPmat_inputs = NULL,
    cppSpMethod = 3, cluster_already_open = NULL,
    calc_ancprobs = FALSE, null_range_allowed = TRUE,
    fixnode = NULL, fixlikes = NULL, stratified = FALSE,
    states_allowed_TF = NULL)

Arguments

tip_condlikes_of_data_on_each_state

A numeric matrix with rows representing tips, and columns representing states/geographic ranges. The cells give the likelihood of the observation data under the assumption that the tip has that state; typically this means that the known geographic range gets a '1' and all other states get a 0.

phy

A phylogeny object. The function converts it to pruningwise order.

Qmat

A Q transition matrix representing the along-branch model for the evolution of geographic range, using parameters d (dispersal/range expansion), e (extinction/range contraction/local extirpation), and perhaps others (e.g. distance). This matrix can be input in either dense or sparse (COO) format, as specified by input_is_COO.

spPmat

Default is NULL; users should usually use spPmat_inputs. spPmat is A numeric matrix representing the probability of each ancestor range–>(Left range, Right range) transition at cladogenesis events. There are different ways to represent this matrix. In the simplest representation, this is just a rectangular matrix with numstates rows (representing the ancestral states) and numstates^2 columns (representing all possible descendant pairs). Use of this type of matrix is specified by cppSpMethod=1. It is calculated from a textual speciation matrix (typically spmat in the code) via symbolic_to_relprob_matrix_sp. However, this matrix gets huge and slow for large numbers of states/ranges. cppSpMethod=2 and cppSpMethod=3 implement successively more efficient and faster representation and processing of this matrix in COO-like formats. See rcpp_calc_anclikes_sp_COOprobs for the cppSpMethod=2 method, and rcpp_calc_anclikes_sp_COOweights_faster for the cppSpMethod=3 method (the fastest).

min_branchlength

Nodes with branches below this branchlength will not be treated as cladogenesis events; instead, they will be treated as if an OTU had been sampled from an anagenetic lineage, i.e. as if you had a direct ancestor. This is useful for putting fossils into the biogeography analysis, when you have fossil species that range through time. (Note: the proper way to obtain such trees, given that most phylogenetic methods force all OTUs to be tips rather than direct ancestors, is another question subject to active research. However, one method might be to just set a branch-length cutoff, and treat any branches sufficiently small as direct ancestors.)

return_what

What should be returned to the user? Options are "loglike" (the log-likelihood of the data under the tree, model, and model parameters), "nodelikes" (the scaled conditional likelihoods at the nodes), "rootprobs" (the relative probability of the geographic ranges/states at the root), or "all" (all of the above in a list). Typically the user will only want to return "loglike" while doing ML optimization, but then return "all" once the ML parameter values have been found.

probs_of_states_at_root

The prior probability of the states/geographic ranges at the root. The default, NULL, effectively means an equal probability for each state (this is also what LAGRANGE assumes; and running with NULL will reproduce exactly the LAGRANGE parameter inferences and log-likelihood).

rootedge

Should the root edge be included in the calculation (i.e., calculate to the bottom of the root), if a root edge is present? Default FALSE.

sparse

Should sparse matrix exponentiation be performed? This should be faster for very large matrices (> 100-200 states), however, the calculations appear to be less accurate. The function will transform a dense matrix to COO format (see mat2coo) if necessary according to the input_is_COO parameter.

printlevel

If >= 1, various amounts of intermediate output will be printed to screen. Note: Intermediate outputs from C++ and FORTRAN functions have been commented out, to meet CRAN guidelines.

use_cpp

Should the C++ routines from cladoRcpp be used to speed up calculations? Default TRUE.

input_is_COO

Is the input Q matrix a sparse, COO-formatted matrix (TRUE) or a standard dense matrix (FALSE). Default FALSE.

spPmat_inputs

A list of parameters so that spPmat (the speciation transition probability matrix) can be calculated on-the-fly, according to the method in cppSpMethod. See example.

cppSpMethod

Three C++ methods from cladoRcpp for calculating and using the cladogenesis probability matrix. 1 is slowest but easiest to understand; 3 is fastest. If spPmat_inputs is given, the program will generate the appropriate spPmat on-the-fly, and the user does not have to input the full spPmat manually.

cluster_already_open

If the user wants to distribute the matrix exponentiation calculations from all the branches across a number of processors/nodes on a cluster, specify the cluster here. E.g. cluster_already_open = makeCluster(rep("localhost",num_cores_to_use), type = "SOCK"). Note: this will work on most platforms, including Macs running R from command line, but will NOT work on Macs running the R GUI R.app, because parallel processing functions like MakeCluster from e.g. library(parallel) for some reason crash R.app. The program runs a check for R.app and will just run on 1 node if found.

calc_ancprobs

Should ancestral state estimation be performed (adds an uppass at the end).

null_range_allowed

Does the state space include the null range? Default is NULL which means running on a single processor.

fixnode

If the state at a particular node is going to be fixed (e.g. for ML marginal ancestral states), give the node number.

fixlikes

The state likelihoods to be used at the fixed node. I.e. 1 for the fixed state, and 0 for the others.

stratified

Default FALSE. If TRUE, you are running a stratified analysis, in which case uppass probs should be calculated elsewhere.

states_allowed_TF

Default NULL. If user gives a vector of TRUE and FALSE values, these states will be set to 0 likelihood throughout the calculations.

Details

Byte-compiling is supposed to speed up functions; this is an attempt to do this on the rexpokit function expokit_dgpadm_Qmat. It is also possible to byte-compile everything during package installation (via ByteCompile: true in the DESCRIPTION file), which is implemented in BioGeoBEARS, so this may be redundant.

calc_loglike_sp_prebyte gets byte-compiled into calc_loglike_sp.

See http://dirk.eddelbuettel.com/blog/2011/04/12/ for discussion of the compile package.

Value

Return whatever is specified by return_what.

Note

Go BEARS!

(COO = Coordinate list format for a matrix, see http://en.wikipedia.org/wiki/Sparse_matrix#Coordinate_list_.28COO.29

Author(s)

Nicholas Matzke matzke@berkeley.edu

References

Matzke_2012_IBS

ReeSmith2008

Landis_Matzke_etal_2013_BayArea

See Also

calc_loglike_sp, rcpp_calc_anclikes_sp, rcpp_calc_anclikes_sp_COOprobs, rcpp_calc_anclikes_sp_COOweights_faster, mat2coo, rcpp_calc_anclikes_sp_COOweights_faster

Examples

1
testval=1

Example output

Loading required package: rexpokit
Loading required package: cladoRcpp
Loading required package: ape
Loading required package: phylobase

Attaching package: 'phylobase'

The following object is masked from 'package:ape':

    edges

BioGeoBEARS documentation built on May 29, 2017, 8:36 p.m.