specPOUMM: Specifying a POUMM fit

Description Usage Arguments Value Functions

Description

Specification and validation of POUMM/PMM settings.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
specifyPOUMM(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE, validateSpec = TRUE)

specifyPOUMM_ATS(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE, sigmaeFixed = 0)

specifyPOUMM_ATSG0(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE, sigmaeFixed = 0)

specifyPOUMM_ATSSeG0(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE)

specifyPMM(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE)

specifyPMM_SSeG0(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE)

specifyPOUMM_ATH2tMeanSe(z = NULL, tree = NULL, zMin = -10,
  zMean = 0, zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1,
  tMean = 2, tMax = 10, parMapping = NULL, parLower = NULL,
  parUpper = NULL, g0Prior = NULL, parInitML = NULL,
  control = NULL, parPriorMCMC = NULL, parInitMCMC = NULL,
  parScaleMCMC = NULL, nSamplesMCMC = 1e+05,
  nAdaptMCMC = nSamplesMCMC, thinMCMC = 100, accRateMCMC = 0.01,
  gammaMCMC = 0.50001, nChainsMCMC = 3, samplePriorMCMC = TRUE,
  parallelMCMC = FALSE)

specifyPOUMM_ATH2tMeanSeG0(z = NULL, tree = NULL, zMin = -10,
  zMean = 0, zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1,
  tMean = 2, tMax = 10, parMapping = NULL, parLower = NULL,
  parUpper = NULL, g0Prior = NULL, parInitML = NULL,
  control = NULL, parPriorMCMC = NULL, parInitMCMC = NULL,
  parScaleMCMC = NULL, nSamplesMCMC = 1e+05,
  nAdaptMCMC = nSamplesMCMC, thinMCMC = 100, accRateMCMC = 0.01,
  gammaMCMC = 0.50001, nChainsMCMC = 3, samplePriorMCMC = TRUE,
  parallelMCMC = FALSE)

specifyPMM_H2tMeanSe(z = NULL, tree = NULL, zMin = -10, zMean = 0,
  zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1, tMean = 2,
  tMax = 10, parMapping = NULL, parLower = NULL, parUpper = NULL,
  g0Prior = NULL, parInitML = NULL, control = NULL,
  parPriorMCMC = NULL, parInitMCMC = NULL, parScaleMCMC = NULL,
  nSamplesMCMC = 1e+05, nAdaptMCMC = nSamplesMCMC, thinMCMC = 100,
  accRateMCMC = 0.01, gammaMCMC = 0.50001, nChainsMCMC = 3,
  samplePriorMCMC = TRUE, parallelMCMC = FALSE)

specifyPMM_H2tMeanSeG0(z = NULL, tree = NULL, zMin = -10,
  zMean = 0, zMax = 10, zVar = 4, zSD = sqrt(zVar), tMin = 0.1,
  tMean = 2, tMax = 10, parMapping = NULL, parLower = NULL,
  parUpper = NULL, g0Prior = NULL, parInitML = NULL,
  control = NULL, parPriorMCMC = NULL, parInitMCMC = NULL,
  parScaleMCMC = NULL, nSamplesMCMC = 1e+05,
  nAdaptMCMC = nSamplesMCMC, thinMCMC = 100, accRateMCMC = 0.01,
  gammaMCMC = 0.50001, nChainsMCMC = 3, samplePriorMCMC = TRUE,
  parallelMCMC = FALSE)

Arguments

z, tree

a numeric vector and a phylo object on which the fit is to be done. These arguments are used in order to guess meaningful values for the parLower, parUpper and parPriorMCMC arguments. See also, zMin,zMean,...,tMax below.

zMin, zMean, zMax, zVar, zSD, tMin, tMean, tMax

summary statistics of the observed tip-values (z) and root-tip distances (t). Some of these values are used for constructing default parameter values and limits; These arguments are given default values which will most likely be meaningless in your specific use-case. The default values will be overwritten with the corresponding statistics from the z and tree arguments if these were specified. If none of tree and z, nor these parameters are specified, then the arguments parLower, parUpper, parPriorMCMC must be specified explicitly.

parMapping

An R-function that can handle, both, a numeric vector or a numeric matrix as argument. This function should transform the input vector or each row-vector (if the input is matrix) into a (row-)vector of the POUMM parameters alpha, theta, sigma, sigmae, g0. For a vector input the function should return a vector with named elements alpha, theta, sigma, sigmae, g0. For a matrix input the function should return a matrix with the same number of rows and columns alpha, theta, sigma, sigmae, g0. Only finite non-negative values are allowed for alpha, sigma, and sigmae. Returning Inf, -Inf, NA or NaN for any of these parameters will result in an error during likelihood calculation. Only finite numerical values are allowed for theta. The parameter g0 is treated in a special way and can assume either a finite numerical value or one of NA or NaN. If g0 = finite value, this value is used together with the corresponding values of alpha, theta, sigma, and sigmae for likelihood calcuation. If g0 = NA (meaing value Not Avaiable), the value of g0 is calculated analytically during likelihood calculation in order to maximise one of the following:

  1. if a normal prior for g0 was specified (see g0Prior), pdf(z | α, θ, σ, σ_e, g0, tree) x prior(g0).

  2. otherwise, pdf(z | α, θ, σ, σ_e, g0, tree).

If g0 = NaN (meaning Not a Number), then the likelihood is marginalized w.r.t. the g0's prior distribution (see g0Prior), i.e. the likelihood returned is: pdf(z | α, θ, σ, σ_e, tree) = Integral(pdf(z|α,θ,σ,σ_e,g0) x pdf(g0) d g0; g0 from -∞ to +∞) In this case (g0=NaN), if g0Prior is not specified, it is assumed that g0Prior is the stationary OU normal distribution with mean, theta, and variance, varOU(Inf, alpha, sigma).
Examples:

 
 # Default for POUMM: identity for alpha, theta, sigma, sigmae, NA for g0.
 parMapping = function(par) {
   if(is.matrix(par)) {
     atsseg0 <- cbind(par[, 1:4, drop = FALSE], NA) 
     colnames(atsseg0) <- c("alpha", "theta", "sigma", "sigmae", "g0")
   } else {
     atsseg0 <- c(par[1:4], NA) 
     names(atsseg0) <- c("alpha", "theta", "sigma", "sigmae", "g0")
   }
   atsseg0
 }
parLower, parUpper

two named numeric vectors of the same length indicating the boundaries of the search region for the ML-fit. Calling parMapping on parLower and parUpper should result in appropriate values of the POUMM parameters alpha, theta, sigma sigmae and g0. Examples:

# Default for POUMM:
parLower = c(alpha = 0, theta = zMin - 2 * (zMax - zMin), sigma = 0, sigmae = 0)
parUpper = c(alpha = 69.31 / tMean, theta = zMax + 2 * (zMax - zMin), 
             sigma = sigmaOU(H2 = .99, alpha = 69.31 / tMean, sigmae = 2 * zSD,
                                    t = tMean), 
             sigmae = 2 * zSD)
g0Prior

Either NULL or a list with named numeric or character members "mean" and "var". Specifies a prior normal distribution for the parameter g0. If characters, the members "mean" and "var" are evaluated as R-expressions - useful if these are functions of some of other parameters. Note that if g0Prior is not NULL and g0 is not NaN (either a fixed number or NA), then the likelihood maximization takes into account the prior for g0, that is, the optimization is done over the product p(g0) x lik(data|g0, other parameters and tree). This can be helpful to prevent extremely big or low estimates of g0. To avoid this behavior and always maximize the likelihood, use g0Prior = NULL.

parInitML

A named vector (like parLower and parUpper) or a list of such vectors - starting points for optim.

control

List of parameters passed on to optim in the ML-fit, default list(factr=1e9), see ?optim.

parPriorMCMC

A function of a numeric parameter-vector returning the log-prior for this parameter vector. Example:

# Default for POUMM:
 parPriorMCMC = function(par) {
   dexp(par[1], rate = tMean / 6.931, TRUE) + 
     dnorm(par[2], zMean, 10 * zSD, TRUE) +
     dexp(par[3],  rate = sqrt(tMean / (zVar * 0.6931)), TRUE) + 
     dexp(par[4], rate = 2 / zSD, TRUE)
 }
parInitMCMC

a function(chainNo, fitML) returning an initial state of an MCMC as a vector. The argument fitML can be used to specify an initial state, close to a previously found likelihood optimum. Example:

 
 # Default for POUMM:
 parInitMCMC = function(chainNo, fitML) {
   if(!is.null(fitML)) {
     parML <- fitML$par
   } else {
     parML <- NULL
   }
   
   init <- rbind(
     c(alpha = 0, theta = 0, sigma = 1, sigmae = 0),
     parML,
     c(alpha = 0, theta = 0, sigma = 1, sigmae = 1)
   )
   
   init[(chainNo - 1) %% nrow(init) + 1, ]
 }
parScaleMCMC

Numeric matrix indicating the initial jump-distribution matrix for the MCMC fit. Default for POUMM is diag(4);

nSamplesMCMC

Integer indicating the length of each MCMC chain. Defaults to 1e5.

nAdaptMCMC

Logical indicating whether adaptation of the MCMC jump distribution should be done with respect to the target acceptance rate (accRateMCMC) or integer indicating how many initial MCMC iterations should be used for adaptation of the jump-distribution matrix (see details in ?POUMM). Defaults to nSamplesMCMC meaning continuous adaptation throughout the MCMC.

thinMCMC

Integer indicating the thinning interval of the mcmc-chains. Defaults to 100.

accRateMCMC

numeric between 0 and 1 indicating the target acceptance rate of the adaptive Metropolis sampling (see details in ?POUMM). Default 0.01.

gammaMCMC

controls the speed of adaption. Should be in the interval (0.5,1]. A lower gamma leads to faster adaption. Default value is 0.50001.

nChainsMCMC

integer indicating the number of chains to run. Defaults to 3 chains, from which the first one is a sample from the prior distribution (see samplePriorMCMC).

samplePriorMCMC

Logical indicating if sampling from the prior should be done for the first chain (see nChainsMCMC). This is useful to compare mcmc's for an overlap between prior and posterior distributions. Default is TRUE.

parallelMCMC

Logical indicating whether the MCMC chains should be run in parallel. Setting this option to TRUE results in using foreach::foreach() %dopar% { } construct for the MCMC fit. In order for parallel execution to be done, you should create a computing cluster and register it as parallel back-end (see example in package vignette and the web-page https://github.com/tobigithub/R-parallel/wiki/R-parallel-Setups).

validateSpec

Logical indicating whether the passed parameters should be validated. This parameter is used internally and should always be TRUE.

sigmaeFixed

fixed value for the sigmae parameter (used in specifyPOUMM_ATS and specifyPOUMM_ATSG0).

Value

A named list to be passed as a spec argument to POUMM.

Functions


POUMM documentation built on Nov. 16, 2018, 1:17 a.m.