buildLaplace | R Documentation |
Build a Laplace or AGHQ approximation algorithm for a given NIMBLE model.
buildLaplace(
model,
paramNodes,
randomEffectsNodes,
calcNodes,
calcNodesOther,
control = list()
)
buildAGHQ(
model,
nQuad = 1,
paramNodes,
randomEffectsNodes,
calcNodes,
calcNodesOther,
control = list()
)
model |
a NIMBLE model object, such as returned by |
paramNodes |
a character vector of names of parameter nodes in the
model; defaults are provided by |
randomEffectsNodes |
a character vector of names of continuous
unobserved (latent) nodes to marginalize (integrate) over using Laplace
approximation; defaults are provided by |
calcNodes |
a character vector of names of nodes for calculating the
integrand for Laplace approximation; defaults are provided by
|
calcNodesOther |
a character vector of names of nodes for calculating
terms in the log-likelihood that do not depend on any
|
control |
a named list for providing additional settings used in Laplace
approximation. See |
nQuad |
number of quadrature points for AGHQ (in one dimension). Laplace approximation is AGHQ with 'nQuad=1'. Only odd numbers of nodes really make sense. Often only one or a few nodes can achieve high accuracy. A maximum of 35 nodes is supported. Note that for multivariate quadratures, the number of nodes will be (number of dimensions)^nQuad. |
buildLaplace
buildLaplace
creates an object that can run Laplace approximation and
for a given model or part of a model. buildAGHQ
creates an object
that can run adaptive Gauss-Hermite quadrature (AGHQ, sometimes called
"adaptive Gaussian quadrature") for a given model or part of a model.
Laplace approximation is AGHQ with one quadrature point, hence
'buildLaplace' simply calls 'buildAGHQ' with 'nQuad=1'. These methods
approximate the integration over continuous random effects in a
hierarchical model to calculate the (marginal) likelihood.
buildAGHQ
and buildLaplace
will by default (unless changed
manually via 'control$split') determine from the model which random effects
can be integrated over (marginalized) independently. For example, in a GLMM
with a grouping factor and an independent random effect intercept for each
group, the random effects can be marginalized as a set of univariate
approximations rather than one multivariate approximation. On the other hand,
correlated or nested random effects would require multivariate marginalization.
Maximum likelihood estimation is available for Laplace approximation ('nQuad=1') with univariate or multivariate integrations. With 'nQuad > 1', maximum likelihood estimation is available only if all integrations are univariate (e.g., a set of univariate random effects). If there are multivariate integrations, these can be calculated at chosen input parameters but not maximized over parameters. For example, one can find the MLE based on Laplace approximation and then increase 'nQuad' (using the 'updateSettings' method below) to check on accuracy of the marginal log likelihood at the MLE.
Beware that quadrature will use 'nQuad^k' quadrature points, where 'k' is the dimension of each integration. Therefore quadrature for 'k' greater that 2 or 3 can be slow. As just noted, 'buildAGHQ' will determine independent dimensions of quadrature, so it is fine to have a set of univariate random effects, as these will each have k=1. Multivariate quadrature (k>1) is only necessary for nested, correlated, or otherwise dependent random effects.
The recommended way to find the maximum likelihood estimate and associated
outputs is by calling runLaplace
or runAGHQ
. The
input should be the compiled Laplace or AGHQ algorithm object. This would be
produced by running compileNimble
with input that is the result
of buildLaplace
or buildAGHQ
.
For more granular control, see below for methods findMLE
and
summary
. See function summaryLaplace
for an easier way
to call the summary
method and obtain results that include node
names. These steps are all done within runLaplace
and
runAGHQ
.
The NIMBLE User Manual at r-nimble.org also contains an example of Laplace approximation.
buildLaplace
and buildAGHQ
make good tries at deciding what
to do with the input model and any (optional) of the node arguments. However,
random effects (over which approximate integration will be done) can be
written in models in multiple equivalent ways, and customized use cases may
call for integrating over chosen parts of a model. Hence, one can take full
charge of how different parts of the model will be used.
Any of the input node vectors, when provided, will be processed using
nodes <- model$expandNodeNames(nodes)
, where nodes
may be
paramNodes
, randomEffectsNodes
, and so on. This step allows
any of the inputs to include node-name-like syntax that might contain
multiple nodes. For example, paramNodes = 'beta[1:10]'
can be
provided if there are actually 10 scalar parameters, 'beta[1]' through
'beta[10]'. The actual node names in the model will be determined by the
exapndNodeNames
step.
In many (but not all) cases, one only needs to provide a NIMBLE model object
and then the function will construct reasonable defaults necessary for
Laplace approximation to marginalize over all continuous latent states
(aka random effects) in a model. The default values for the four groups of
nodes are obtained by calling setupMargNodes
, whose arguments
match those here (except for a few arguments which are taken from control
list elements here).
setupMargNodes
tries to give sensible defaults from
any combination of paramNodes
, randomEffectsNodes
,
calcNodes
, and calcNodesOther
that are provided. For example,
if you provide only randomEffectsNodes
(perhaps you want to
marginalize over only some of the random effects in your model),
setupMargNodes
will try to determine appropriate choices for the
others.
setupMargNodes
also determines which integration dimensions are
conditionally independent, i.e., which can be done separately from each
other. For example, when possible, 10 univariate random effects will be split
into 10 univariate integration problems rather than one 10-dimensional
integration problem.
The defaults make general assumptions such as that
randomEffectsNodes
have paramNodes
as parents. However, The
steps for determining defaults are not simple, and it is possible that they
will be refined in the future. It is also possible that they simply don't
give what you want for a particular model. One example where they will not
give desired results can occur when random effects have no prior
parameters, such as 'N(0,1)' nodes that will be multiplied by a scale
factor (e.g. sigma) and added to other explanatory terms in a model. Such
nodes look like top-level parameters in terms of model structure, so
you must provide a randomEffectsNodes
argument to indicate which
they are.
It can be helpful to call setupMargNodes
directly to see exactly how
nodes will be arranged for Laplace approximation. For example, you may want
to verify the choice of randomEffectsNodes
or get the order of
parameters it has established to use for making sense of the MLE and
results from the summary
method. One can also call
setupMargNodes
, customize the returned list, and then provide that
to buildLaplace
as paramNodes
. In that case,
setupMargNodes
will not be called (again) by buildLaplace
.
If setupMargNodes
is emitting an unnecessary warning, simply use
control=list(check=FALSE)
.
If any paramNodes
(parameters) or randomEffectsNodes
(random
effects / latent states) have constraints on the range of valid values
(because of the distribution they follow), they will be used on a
transformed scale determined by parameterTransform
. This means the
Laplace approximation itself will be done on the transformed scale for
random effects and finding the MLE will be done on the transformed scale
for parameters. For parameters, prior distributions are not included in
calculations, but they are used to determine valid parameter ranges and
hence to set up any transformations. For example, if sigma
is a
standard deviation, you can declare it with a prior such as sigma ~
dhalfflat()
to indicate that it must be greater than 0.
For default determination of when transformations are needed, all parameters
must have a prior distribution simply to indicate the range of valid
values. For a param p
that has no constraint, a simple choice is
p ~ dflat()
.
Note that there are two numerical optimizations when finding maximum
likelihood estimates with a Laplace or (1D) AGHQ algorithm: (1) maximizing
the joint log-likelihood of random effects and data given a parameter value
to construct the approximation to the marginal log-likelihood at the given
parameter value; (2) maximizing the approximation to the marginal
log-likelihood over the parameters. In what follows, the prefix 'inner'
refers to optimization (1) and 'outer' refers to optimization (2). Currently
both optimizations default to using method "BFGS"
. However, one can
use other optimizers or simply run optimization (2) manually from R; see the
example below. In some problems, choice of inner and/or outer optimizer can
make a big difference for obtaining accurate results, especially for standard
errors. Hence it is worth experimenting if one is in doubt.
control
list argumentsThe control
list allows additional settings to be made using named
elements of the list. Most (or all) of these can be updated later using the
'updateSettings' method. Supported elements include:
split
. If TRUE (default), randomEffectsNodes
will be
split into conditionally independent sets if possible. This
facilitates more efficient Laplace or AGHQ approximation because each
conditionally independent set can be marginalized independently. If
FALSE, randomEffectsNodes
will be handled as one multivariate
block, with one multivariate approximation. If split
is a
numeric vector, randomEffectsNodes
will be split by calling
split
(randomEffectsNodes
, control$split
). The
last option allows arbitrary control over how
randomEffectsNodes
are blocked.
check
. If TRUE (default), a warning is issued if
paramNodes
, randomEffectsNodes
and/or calcNodes
are provided but seem to have missing or unnecessary
elements based on some default inspections of the model. If
unnecessary warnings are emitted, simply set check=FALSE
.
innerOptimControl
. A list (either an R list or a
'optimControlNimbleList') of control parameters for the inner
optimization of Laplace approximation using nimOptim
. See
'Details' of nimOptim
for further information. Default
is 'nimOptimDefaultControl()'.
innerOptimMethod
. Optimization method to be used in
nimOptim
for the inner optimization. See 'Details' of
nimOptim
. Currently nimOptim
in NIMBLE supports:
"Nelder-Mead
", "BFGS
", "CG
", "L-BFGS-B
",
"nlminb
", and user-provided optimizers. By default, method
"BFGS
" is used for both univariate and multivariate cases. For
"nlminb"
or user-provided optimizers, only a subset of
elements of the innerOptimControlList
are supported. (Note
that control over the outer optimization method is available as an
argument to 'findMLE'). Choice of optimizers can be important and so
can be worth exploring.
innerOptimStart
. Method for determining starting values for
the inner optimization. Options are:
"zero"
(default): use all zeros;
"last"
: use the result of the last inner optimization;
"last.best"
: use the result of the best inner
optimization so far for each conditionally independent part of the
approximation;
"constant"
: always use the same values, determined by
innerOptimStartValues
;
"random"
: randomly draw new starting values from the
model (i.e., from the prior);
"model"
: use values for random effects stored in the
model, which are determined from the first call.
Note that "model"
and "zero"
are shorthand for
"constant"
with particular choices of
innerOptimStartValues
. Note that "last"
and
"last.best"
require a choice for the very first values, which will
come from innerOptimStartValues
. The default is
innerOptimStart="zero"
and may change in the future.
innerOptimStartValues
. Values for some of
innerOptimStart
approaches. If a scalar is provided, that
value is used for all elements of random effects for each
conditionally independent set. If a vector is provided, it must be
the length of *all* random effects. If these are named (by node
names), the names will be used to split them correctly among each
conditionally independent set of random effects. If they are not
named, it is not always obvious what the order should be because it
may depend on the conditionally independent sets of random
effects. It should match the order of names returned as part of
'summaryLaplace'.
innerOptimWarning
. If FALSE (default), do not emit warnings
from the inner optimization. Optimization methods may sometimes emit a
warning such as for bad parameter values encountered during the
optimization search. Often, a method can recover and still find the
optimum. In the approximations here, sometimes the inner optimization
search can fail entirely, yet the outer optimization see this as one failed
parameter value and can recover. Hence, it is often desirable to silence
warnings from the inner optimizer, and this is done by default. Set
innerOptimWarning=TRUE
to see all warnings.
useInnerCache
. If TRUE (default), use caching system for
efficiency of inner optimizations. The caching system records one set of
previous parameters and uses the corresponding results if those parameters
are used again (e.g., in a gradient call). This should generally not be
modified.
outerOptimControl
. A list of control parameters for maximizing
the Laplace log-likelihood using nimOptim
. See 'Details' of
nimOptim
for further information.
computeMethod
. There are three approaches available for
internal details of how the approximations, and specifically derivatives
involved in their calculation, are handled. These are labeled simply 1, 2,
and 3, and the default is 2. The relative performance of the methods will
depend on the specific model. Users wanting to explore efficiency can try
switching from method 2 (default) to methods 1 or 3 and comparing
performance. The first Laplace approximation with each method will be
(much) slower than subsequent Laplace approximations. Further details are
not provided at this time.
gridType
(relevant only nQuad>1
). For multivariate AGHQ,
a grid must be constructed based on the Hessian at the inner mode. Options
include "cholesky" (default) and "spectral" (i.e., eigenvectors and
eigenvalues) for the corresponding matrix decompositions on which the grid
can be based.
# end itemize
The object returned by buildLaplace
is a nimbleFunction object with
numerous methods (functions). Here these are described in three tiers of user
relevance.
The most relevant methods to a user are:
calcLogLik(p, trans=FALSE)
. Calculate the approximation to the
marginal log-likelihood function at parameter value p
, which (if
trans
is FALSE) should match the order of paramNodes
. For
any non-scalar nodes in paramNodes
, the order within the node is
column-major. The order of names can be obtained from method
getNodeNamesVec(TRUE)
. Return value is the scalar (approximate,
marginal) log likelihood.
If trans
is TRUE, then p
is the vector of parameters on
the transformed scale, if any, described above. In this case, the
parameters on the original scale (as the model was written) will be
determined by calling the method pInverseTransform(p)
. Note that
the length of the parameter vector on the transformed scale might not
be the same as on the original scale (because some constraints of
non-scalar parameters result in fewer free transformed parameters than
original parameters).
calcLaplace(p, trans)
. This is the same as calcLogLik
but
requires that the approximation be Laplace (i.e nQuad
is 1),
and results in an error otherwise.
findMLE(pStart, method, hessian)
. Find the maximum likelihood
estimates of parameters using the approximated marginal likelihood.
This can be used if nQuad
is 1 (Laplace case) or if
nQuad>1
and all marginalizations involve only univariate
random effects. Arguments include pStart
: initial parameter
values (defaults to parameter values currently in the model);
method
: (outer) optimization method to use in nimOptim
(defaults to "BFGS", although some problems may benefit from other
choices); and hessian
: whether to calculate and return the
Hessian matrix (defaults to TRUE
, which is required for
subsequent use of 'summary' method). Second derivatives in the
Hessian are determined by finite differences of the gradients
obtained by automatic differentiation (AD). Return value is a
nimbleList of type optimResultNimbleList
, similar to what is
returned by R's optim. See help(nimOptim)
. Note that
parameters ('par') are returned for the natural parameters, i.e. how
they are defined in the model. But the 'hessian', if requested, is
computed for the parameters as transformed for optimization if
necessary. Hence one must be careful interpreting 'hessian' if any
parameters have constraints, and the safest next step is to use the
'summary' method or 'summaryLaplace' function.
summary(MLEoutput, originalScale, randomEffectsStdError,
jointCovariance)
. Summarize the maximum likelihood estimation
results, given object MLEoutput
that was returned by
findMLE
. The summary can include a covariance matrix for the
parameters, the random effects, or both), and these can be returned on
the original parameter scale or on the (potentially) transformed
scale(s) used in estimation. It is often preferred instead to call
function (not method) 'summaryLaplace' because this will attach
parameter and random effects names (i.e., node names) to the results.
In more detail, summary
accepts the following optional arguments:
originalScale
. Logical. If TRUE, the function returns
results on the original scale(s) of parameters and random effects;
otherwise, it returns results on the transformed scale(s). If there
are no constraints, the two scales are identical. Defaults to TRUE.
randomEffectsStdError
. Logical. If TRUE, standard
errors of random effects will be calculated.
Defaults to FALSE.
jointCovariance
. Logical. If TRUE, the joint
variance-covariance matrix of the parameters and the random effects
will be returned. If FALSE, the variance-covariance matrix of the
parameters will be returned. Defaults to FALSE.
The object returned by summary
is an AGHQuad_summary
nimbleList with elements:
params
. A nimbleList that contains estimates and
standard errors of parameters (on the original or transformed
scale, as chosen by originalScale
).
randomEffects
. A nimbleList that contains estimates of
random effects and, if requested
(randomEffectsStdError=TRUE
) their standard errors, on
original or transformed scale. Standard errors are calculated
following the generalized delta method of Kass and Steffey (1989).
vcov
. If requested (i.e.
jointCovariance=TRUE
), the joint variance-covariance
matrix of the parameters and random effects, on original or
transformed scale. If jointCovariance=FALSE
, the
covariance matrix of the parameters, on original or transformed
scale.
scale
. "original"
or "transformed"
, the
scale on which results were requested.
Additional methods to access or control more details of the Laplace approximation include:
updateSettings
. This provides a single function through which
many of the settings described above (mostly for the control
list)
can be later changed. Options that can be changed include:
innerOptimMethod
, innerOptimStart
,
innerOptimStartValues
, useInnerCache
, nQuad
,
gridType
, innerOptimControl
, outerOptimControl
, and
computeMethod
. For innerOptimStart
, method "zero" cannot be
specified but can be achieved by choosing method "constant" with
innerOptimStartValues=0
. Only provided options will be modified. The
exceptions are innerOptimControl
, outerOptimControl
, which
are replaced only replace_innerOptimControl=TRUE
or
replace_outerOptimControl=TRUE
, respectively.
getNodeNamesVec(returnParams)
. Return a vector (>1) of names
of parameters/random effects nodes, according to returnParams =
TRUE/FALSE
. Use this if there is more than one node.
getNodeNameSingle(returnParams)
. Return the name of a
single parameter/random effect node, according to returnParams =
TRUE/FALSE
. Use this if there is only one node.
checkInnerConvergence(message)
. Checks whether all internal
optimizers converged. Returns a zero if everything converged and one
otherwise. If message = TRUE
, it will print more details about
convergence for each conditionally independent set.
gr_logLik(p, trans)
. Gradient of the (approximated)
marginal log-likelihood at parameter value p
. Argument trans
is similar to that in calcLaplace
. If there are multiple parameters,
the vector p
is given in the order of parameter names returned by
getNodeNamesVec(returnParams=TRUE)
.
gr_Laplace(p, trans)
. This is the same as gr_logLik
.
otherLogLik(p)
. Calculate the calcNodesOther
nodes, which returns the log-likelihood of the parts of the model that are
not included in the Laplace or AGHQ approximation.
gr_otherLogLik(p)
. Gradient (vector of derivatives with
respect to each parameter) of otherLogLik(p)
. Results should
match gr_otherLogLik_internal(p)
but may be more efficient after
the first call.
Some methods are included for calculating the (approximate) marginal log posterior density by including the prior distribution of the parameters. This is useful for finding the maximum a posteriori probability (MAP) estimate. Currently these are provided for point calculations without estimation methods.
calcPrior_p(p)
. Log density of prior distribution.
calcPrior_pTransformed(pTransform)
. Log density of prior distribution on transformed scale, includes the Jacobian.
calcPostLogDens(p)
. Marginal log posterior density in terms of the parameter p.
calcPostLogDens_pTransformed (pTransform)
. Marginal log posterior density in terms of the transformed
parameter, which includes the Jacobian transformation.
gr_postLogDens_pTransformed(pTransform)
. Graident of marginal log posterior density on the transformed scale.
Other available options that are used in the derivative for more flexible include logDetJacobian(pTransform)
and
gr_logDeJacobian(pTransform)
, as well as gr_prior(p)
.
Finally, methods that are primarily for internal use by other methods include:
gr_logLik_pTransformed
. Gradient of the Laplace
approximation (calcLogLik_pTransformed(pTransform)
) at transformed
(unconstrained) parameter value pTransform
.
pInverseTransform(pTransform)
. Back-transform the transformed
parameter value pTransform
to original scale.
derivs_pInverseTransform(pTransform, order)
. Derivatives of
the back-transformation (i.e. inverse of parameter transformation) with
respect to transformed parameters at pTransform
. Derivative order
is given by order
(any of 0, 1, and/or 2).
reInverseTransform(reTrans)
. Back-transform the transformed
random effects value reTrans
to original scale.
derivs_reInverseTransform(reTrans, order)
. Derivatives of the
back-transformation (i.e. inverse of random effects transformation) with
respect to transformed random effects at reTrans
. Derivative order
is given by order
(any of 0, 1, and/or 2).
optimRandomEffects(pTransform)
. Calculate the optimized
random effects given transformed parameter value pTransform
. The
optimized random effects are the mode of the conditional distribution of
random effects given data at parameters pTransform
, i.e. the
calculation of calcNodes
.
inverse_negHess(p, reTransform)
. Calculate the inverse of the
negative Hessian matrix of the joint (parameters and random effects)
log-likelihood with respect to transformed random effects, evaluated at
parameter value p
and transformed random effects
reTransform
.
hess_logLik_wrt_p_wrt_re(p, reTransform)
. Calculate the
Hessian matrix of the joint log-likelihood with respect to parameters and
transformed random effects, evaluated at parameter value p
and
transformed random effects reTransform
.
one_time_fixes()
. Users never need to run this. Is is called
when necessary internally to fix dimensionality issues if there is only
one parameter in the model.
calcLogLik_pTransformed(pTransform)
. Laplace approximation at
transformed (unconstrained) parameter value pTransform
. To
make maximizing the Laplace likelihood unconstrained, an automated
transformation via parameterTransform
is performed on
any parameters with constraints indicated by their priors (even
though the prior probabilities are not used).
gr_otherLogLik_internal(p)
. Gradient (vector of
derivatives with respect to each parameter) of otherLogLik(p)
.
This is obtained using automatic differentiation (AD) with single-taping.
First call will always be slower than later calls.
cache_outer_logLik(logLikVal)
. Save the marginal log likelihood value
to the inner Laplace mariginlization functions to track the outer maximum internally.
reset_outer_inner_logLik()
. Reset the internal saved maximum marginal log likelihood.
get_inner_cholesky(atOuterMode = integer(0, default = 0))
. Returns the cholesky
of the negative Hessian with respect to the random effects. If atOuterMode = 1
then returns
the value at the overall best marginal likelihood value, otherwise atOuterMode = 0
returns the last.
get_inner_mode(atOuterMode = integer(0, default = 0))
. Returns the mode of the random effects
for either the last call to the innner quadrature functions (atOuterMode = 0
), or the last best
value for the marginal log likelihood, atOuterMode = 1
.
Wei Zhang, Perry de Valpine, Paul van Dam-Bates
Kass, R. and Steffey, D. (1989). Approximate Bayesian inference in conditionally independent hierarchical models (parametric empirical Bayes models). Journal of the American Statistical Association, 84(407), 717-726.
Liu, Q. and Pierce, D. A. (1994). A Note on Gauss-Hermite Quadrature. Biometrika, 81(3) 624-629.
Jackel, P. (2005). A note on multivariate Gauss-Hermite quadrature. London: ABN-Amro. Re.
Skaug, H. and Fournier, D. (2006). Automatic approximation of the marginal likelihood in non-Gaussian hierarchical models. Computational Statistics & Data Analysis, 56, 699-709.
pumpCode <- nimbleCode({
for (i in 1:N){
theta[i] ~ dgamma(alpha, beta)
lambda[i] <- theta[i] * t[i]
x[i] ~ dpois(lambda[i])
}
alpha ~ dexp(1.0)
beta ~ dgamma(0.1, 1.0)
})
pumpConsts <- list(N = 10, t = c(94.3, 15.7, 62.9, 126, 5.24, 31.4, 1.05, 1.05, 2.1, 10.5))
pumpData <- list(x = c(5, 1, 5, 14, 3, 19, 1, 1, 4, 22))
pumpInits <- list(alpha = 0.1, beta = 0.1, theta = rep(0.1, pumpConsts$N))
pump <- nimbleModel(code = pumpCode, name = "pump", constants = pumpConsts,
data = pumpData, inits = pumpInits, buildDerivs = TRUE)
# Build Laplace approximation
pumpLaplace <- buildLaplace(pump)
## Not run:
# Compile the model
Cpump <- compileNimble(pump)
CpumpLaplace <- compileNimble(pumpLaplace, project = pump)
# Calculate MLEs of parameters
MLEres <- CpumpLaplace$findMLE()
# Calculate estimates and standard errors for parameters and random effects on original scale
allres <- CpumpLaplace$summary(MLEres, randomEffectsStdError = TRUE)
# Change the settings and also illustrate runLaplace
CpumpLaplace$updateSettings(innerOptimMethod = "nlminb", outerOptimMethod = "nlminb")
newres <- runLaplace(CpumpLaplace)
# Illustrate use of the component log likelihood and gradient functions to
# run an optimizer manually from R.
# Use nlminb to find MLEs
MLEres.manual <- nlminb(c(0.1, 0.1),
function(x) -CpumpLaplace$calcLogLik(x),
function(x) -CpumpLaplace$gr_Laplace(x))
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.