parametric.bootstrap: Parametric bootstrap for confidence intervals
In mvSLOUCH: Multivariate Stochastic Linear Ornstein-Uhlenbeck Models for Phylogenetic Comparative Hypotheses

parametric.bootstrap

R Documentation

Parametric bootstrap for confidence intervals

Description

The function performs a parametric bootstrap for confidence intervals for estimates of the evolutionary model. The user may specify what parameters are to have their confidence intervals returned. The user is recommended to install the suggested package PCMBaseCpp which significantly speeds up the calculations (see Details).

Usage

parametric.bootstrap(estimated.model, phyltree, 
values.to.bootstrap = NULL, regimes = NULL, 
root.regime = NULL, M.error = NULL, predictors = NULL, 
kY = NULL, numboot = 100, Atype = NULL, Syytype = NULL, 
diagA = NULL, parameter_signs = NULL, start_point_for_optim = NULL,
parscale = NULL, min_bl = 0.0003, maxiter = c(10,50,100), estimateBmethod="ML")

Arguments

`estimated.model`	An estimated by evolutionary model. It can be e.g. the output of `BrownianMotionModel()`, `ouchModel()`, `mvslouchModel()` or `estimate.evolutionary.model()`. In the last case the model under `BestModel` is analyzed.
`phyltree`	The phylogeny in `phylo` format. The tree can be obtained from e.g. a `nexus` file by the `read.nexus()` function from the ape package. The "standard" ape node indexing is assumed: for a tree with `n` tips, the tips should have indices `1:n` and the root index `n+1`. The `root.edge` field is ignored.
`values.to.bootstrap`	A vector of parameter/composite statistic names that the user is interested in. They are extracted from the bootstrapped elements for easy access.
`regimes`	A vector or list of regimes. If vector then each entry corresponds to each of `phyltree`'s branches, i.e. to each row of `phyltree$edge`. If list then each list entry corresponds to a tip node and is a vector for regimes on that lineage. If `NULL`, then a constant regime is assumed on the whole tree.
`root.regime`	The regime at the root of the tree. If not given, then it is taken as the regime that is present on the root's daughter lineages and is the most frequent one in the `regimes` vector. If more than one regime has the same maximum frequency, then alphabetically first one of the maximum ones is taken.
`M.error`	An optional measurement error covariance structure. The measurement errors between species are assumed independent. The program tries to recognize the structure of the passed matrix and accepts the following possibilities : a single number that is a common measurement error for all tips and species, a m element vector with each value corresponding to a variable, measurement errors are independent between variables and each species is assumed to have the same measurement errors, a m x m ((number of variables) x (number of variables)) matrix, all species will have the same measurement error, a list of length n (number of species), each list element is the covariance structure for the appropriate (numbering according to tree) species, either a single number (each variable has same variance), vector (of length m for each variable), or m x m matrix, the order of the list has to correspond to the order of the nodes in the `phyltree` object, NULL no measurement error. From version 2.0.0 of mvSLOUCH it is impossible to pass a single joint measurement error matrix for all the species and traits.
`predictors`	A vector giving the numbers of the columns from the original data which are to be considered predictor ones, i.e. conditioned on in the program output. If not provided then the "X" variables are treated as predictors, but this only for the OUBM models (for the others in this case none are treated as predictors).
`kY`	Number of "Y" (response) variables, for the OUBM models. The first `kY` columns of `mY` are the "OU" ones, while the rest the "BM" ones. In more detail this value determines the number of columns of the (simulated) data matrix to treat as response variables ("OU" ones). For example, a value of 1 means that only the first column is treated as a response variable, while a value of 3 means the first three columns are treated as response variables. Any predictor variables ("BM" ones) the user is interested in setting for a particular model should therefore be placed in the final columns of the data matrix, allowing for selecting select `kY` columns before this as response variables ("OU" ones). If `NULL` then it is extracted from the provided model parameters in `estimated.model`.
`numboot`	The number of bootstraps to perform.
`Atype`	The class of the `A` matrix. It can take one of the following values: `"SingleValueDiagonal"`, `"Diagonal"`, `"UpperTri"`, `"LowerTri"`, `"SymmetricPositiveDefinite"`, `"Symmetric"`, `"DecomposablePositive"`, `"DecomposableNegative"`, `"DecomposableReal"`, `"Invertible"`, `"Any"`. If `NULL` then it is extracted from the provided model parameters in `estimated.model`.
`Syytype`	The class of the Syy matrix, ignored if `evolmodel` equals `"BM"`. Otherwise it can take one of the following values: `"SingleValueDiagonal"`, `"Diagonal"`, `"UpperTri"`, `"LowerTri"`, `"Symmetric"`, `"Any"`. If `NULL` then it is extracted from the provided model parameters in `estimated.model`.
`diagA`	Should the diagonal of `A` be forced to be positive (`"Positive"`), negative (`"Negative"`) or the sign free to vary (`NULL`). However, setting this to a non-`NULL` value when `evolmodel` is `"mvslouch"` might be (but simulations concerning this are not conclusive) slightly detrimental to the optimization process if `Atype` is `"DecomposablePositive"`, `"DecomposableNegative"`, or `"DecomposableReal"`. In these cases `A` is parametrized by its eigendecomposition. Additional exponentiation of the diagonal, to ensure positivity, could (but this is uncertain) make the exploration of the likelihood surface more difficult. In the case of `Atype` being `"SymmetricPositiveDefinite"`, the diagonal is always guaranteed to be positive. If `NULL` then the function checks if it is not in the provided model parameters in `estimated.model`.
`parameter_signs`	WARNING: ONLY use this option if you understand what you are doing! This option is still in an experimental stage so some setups might not work (please report). A list allowing the user to control whether specific entries for each model parameter should be positive, negative, zero or set to a specific (other) value. The entries of the list have to be named, the admissible names are `"signsA"` (for `A` matrix), `"signsB"` (for `B` matrix), `"signsSyy"` (for `Syy` matrix) and `"signsmPsi"` (for `mPsi` matrix) and `"signsvY0"` (for `vY0` matrix). Any other entry in this list will be ignored. Each entry of the list has to be a matrix of appropriate size, i.e. of the size of the parameter to which it corresponds. Inside this matrix the possible values are `"+"` if the given entry is to be positive, `"-"` if the given entry is to be negative, `x`, where `x` is a number, if the entry is to be set to specified value or `NA` if the entry is to be freely estimated. See `estimate.evolutionary.model`, `ouchModel` and `mvslouchModel` for further details, examples and important warnings!
`start_point_for_optim`	A named list with starting parameters for of the parameters for be optimized by `optim()`, currently only `A` and `Syy` for OUOU and OUBM models, i.e. will not work with BM model. One may provide both or only one of them. Make sure that the parameter is consistent with the other parameter restrictions as no check is done and this can result in undefined behaviour. For example one may provide this as (provided dimensions and other parameter restrictions agree) start_point_for_optim=list(A=rbind(c(2,0),(0,4)), Syy=rbind(c(1,0.5),c(0,2))). This starting point is always jittered in each bootstrap replicate as the employed `"Nelder-Mead"` method in `optim()` is deterministic.
`parscale`	A vector to calculate the `parscale` argument for `optim`. It is a named vector with 3 entries, e.g. `c("parscale_A"=3,"logparscale_A"=5,"logparscale_other"=1)`. The entry `parscale_A` is the scale for entries of the `A` matrix, `logparscale_A` is the scale for entries of the `A` matrix that are optimized over on the logarithmic scale, e.g. if eigenvalues are assumed to be positive, then optimization is done over `log(eigenvalue)` for `A`'s eigendecomposition and `logparscale_other` is the scale for entries other then of `A` that are done on the logarithmic scale (e.g. `Syy`'s diagonal, or other entries indicated as positive via `parameter_signs`). If not provided (or if a name of the vector is misspelled), then made equal to the example value provided above. For other elements, then mentioned above, that are optimized over by `optim()`, `1` is used for `optim()`'s `parscale`. It is advised that the user experiments with a couple of different values and reads `optim`'s man page.
`min_bl`	Value to which PCMBase's `PCMBase.Threshold.Skip.Singular` should be set. It indicates that branches of length shorter than `min_bl` should be skipped in likelihood calculations. Short branches can result in singular covariance matrices for the transition density along a branch. The user should adjust this value if a lot of warnings are raised by PCMBase about singularities during the likelihood calculations. Furthermore, mvSLOUCH sets all branches in the tree shorter than `min_bl` to `min_bl`. However, this does not concern tip branches-these cannot be skipped and hence should be long enough so that numerical issues are not raised.
`maxiter`	The maximum number of iterations for different components of the estimation algorithm. A vector of three integers. The first is the number of iterations for phylogenetic GLS evaluations, i.e. conditional on the other parameters, the regime optima, perhaps `B`, and perhaps initial state are estimated by a phylogenetic GLS procedure. After this the other (except of `B` in OUBM model case) parameters are optimized over by `optim()`. This first entry controls the number of iterations of this procedure. The second is the number of iterations inside the iterated GLS for the OUBM model. In the first step regime optima and `B` (and perhaps initial state) are estimated conditional on the other parameters and current estimate of `B`, then the estimate of `B` is update and the same phylogenetic GLS is repeated (second entry of `maxiter` number of times). Finally, the third is the value of `maxiter` passed to `optim()`, apart from the optimization in the Brownian motion and measurement error case. If the bootstrapped model is a Brownian motion one, then this parameter is ignored, if OUOU, then the second entry is ignored.
`estimateBmethod`	Only relevant for OUBM models, should `B` be estimated by maximum likelihood (default value `"ML"`) or generalized least squares (value `"GLS"`).

Details

The likelihood calculations are done by the PCMBase package. However, there is a C++ backend, PCMBaseCpp. If it is not available, then the likelihood is calculated slower using pure R. However, with the calculations in C++ up to a 100-fold increase in speed is possible (more realistically 10-20 times). The PCMBaseCpp package is available from https://github.com/venelin/PCMBaseCpp.

The setting Atype="Any" means that one assumes the matrix A is eigendecomposable. If the estimation algorithm hits a defective A, then it sets the log-likelihood at the minimum value and will try to get out of this dip.

Value

A list with all the bootstrap simulations is returned. The elements of the list are the following.

`paramatric.bootstrap.estimation.replicates`	A list of length equalling `numboot`. Each element is the result of the bootstrap replicate - the estimation results in the format of the output of mvSLOUCH functions, with an additional field `data`, the simulated data.
`bootstrapped.parameters`	If `values.to.bootstrap` is not `NULL` then a list of length equalling length of `values.to.bootstrap`. Each element corresponds to the respective element of `values.to.bootstrap` and contains a list of the bootstrapped values of this element.

Warning

The estimation can take a long time and hence many bootstrap replicates will take even more time.The code can produce (a lot of) warnings and errors during the search procedure, this is nothing to worry about.

Note

The ouch package implements a parametric bootstrap and reading about it could be helpful.

Author(s)

Krzysztof Bartoszek

References

Bartoszek, K. and Pienaar, J. and Mostad. P. and Andersson, S. and Hansen, T. F. (2012) A phylogenetic comparative method for studying multivariate adaptation. Journal of Theoretical Biology 314:204-215.

Butler, M.A. and A.A. King (2004) Phylogenetic comparative analysis: a modeling approach for adaptive evolution. American Naturalist 164:683-695.

Examples

RNGversion(min(as.character(getRversion()),"3.6.1"))
set.seed(12345, kind = "Mersenne-Twister", normal.kind = "Inversion")
### We will first simulate a small phylogenetic tree using functions from ape. 
### For simulating the tree one could also use alternative functions, e.g. sim.bd.taxa 
### from the TreeSim package
phyltree<-ape::rtree(5)

## The line below is not necessary but advisable for speed
phyltree<-phyltree_paths(phyltree)

BMparameters<-list(vX0=matrix(0,nrow=3,ncol=1),
Sxx=rbind(c(1,0,0),c(0.2,1,0),c(0.3,0.25,1)))

### Now simulate the data.
BMdata<-simulBMProcPhylTree(phyltree,X0=BMparameters$vX0,Sigma=BMparameters$Sxx)
BMdata<-BMdata[phyltree$tip.label,,drop=FALSE]

### Recover the parameters of the Brownian motion.
BMestim<-BrownianMotionModel(phyltree,BMdata)

### And finally obtain bootstrap confidence intervals for some parameters
BMbootstrap<-parametric.bootstrap(estimated.model=BMestim,phyltree=phyltree,
values.to.bootstrap=c("vX0","StS"),M.error=NULL,numboot=2)
RNGversion(as.character(getRversion()))

## Not run: ##It takes too long to run this
### Define a vector of regimes.
regimes<-c("small","small","large","small","small","large","large","large")

### Define SDE parameters to be able to simulate data under the mvOUBM model.
OUBMparameters<-list(vY0=matrix(c(1,-1),ncol=1,nrow=2),A=rbind(c(9,0),c(0,5)),
B=matrix(c(2,-2),ncol=1,nrow=2),mPsi=cbind("small"=c(1,-1),"large"=c(-1,1)),
Syy=rbind(c(1,0.25),c(0,1)),vX0=matrix(0,1,1),Sxx=matrix(1,1,1),
Syx=matrix(0,ncol=1,nrow=2),Sxy=matrix(0,ncol=2,nrow=1))

### Now simulate the data.
OUBMdata<-simulMVSLOUCHProcPhylTree(phyltree,OUBMparameters,regimes,NULL)
OUBMdata<-OUBMdata[phyltree$tip.label,,drop=FALSE]

### Try to recover the parameters of the mvOUBM model.
OUBMestim<-mvslouchModel(phyltree,OUBMdata,2,regimes,Atype="DecomposablePositive",
Syytype="UpperTri",diagA="Positive",maxiter=c(10,50,100))

### And finally bootstrap with particular interest in the evolutionary and optimal
### regressions

OUBMbootstrap<-parametric.bootstrap(estimated.model=OUBMestim,phyltree=phyltree,
values.to.bootstrap=c("evolutionary.regression","optimal.regression"),
regimes=regimes,root.regime="small",M.error=NULL,predictors=c(3),kY=2,
numboot=5,Atype="DecomposablePositive",Syytype="UpperTri",diagA="Positive",
maxiter=c(10,50,100))


### We now demonstrate an alternative setup
### Define SDE parameters to be able to simulate data under the OUOU model.
OUOUparameters<-list(vY0=matrix(c(1,-1,0.5),nrow=3,ncol=1),
A=rbind(c(9,0,0),c(0,5,0),c(0,0,1)),mPsi=cbind("small"=c(1,-1,0.5),"large"=c(-1,1,0.5)),
Syy=rbind(c(1,0.25,0.3),c(0,1,0.2),c(0,0,1)))

### Now simulate the data.
OUOUdata<-simulOUCHProcPhylTree(phyltree,OUOUparameters,regimes,NULL)
OUOUdata<-OUOUdata[phyltree$tip.label,,drop=FALSE]

### Try to recover the parameters of the OUOU model.
estimResults<-estimate.evolutionary.model(phyltree,OUOUdata,regimes=regimes,
root.regime="small",M.error=NULL,repeats=3,model.setups=NULL,predictors=c(3),kY=2,
doPrint=TRUE,pESS=NULL,maxiter=c(10,50,100))

### And finally bootstrap with particular interest in the evolutionary regression
OUOUbootstrap<-parametric.bootstrap(estimated.model=estimResults,phyltree=phyltree,
values.to.bootstrap=c("evolutionary.regression"),
regimes=regimes,root.regime="small",M.error=NULL,predictors=c(3),kY=NULL,
numboot=5,Atype=NULL,Syytype=NULL,diagA=NULL)

## End(Not run)

mvSLOUCH documentation built on Nov. 21, 2023, 1:08 a.m.