optimPibbleCollapsed | R Documentation |
See details for model. Should likely be followed by function
uncollapsePibble
. Notation: N
is number of samples,
D
is number of multinomial categories, and Q
is number
of covariates.
optimPibbleCollapsed(
Y,
upsilon,
ThetaX,
KInv,
AInv,
init,
n_samples = 2000L,
calcGradHess = TRUE,
b1 = 0.9,
b2 = 0.99,
step_size = 0.003,
epsilon = 1e-06,
eps_f = 1e-10,
eps_g = 1e-04,
max_iter = 10000L,
verbose = FALSE,
verbose_rate = 10L,
decomp_method = "cholesky",
optim_method = "lbfgs",
eigvalthresh = 0,
jitter = 0,
multDirichletBoot = -1,
useSylv = TRUE,
ncores = -1L,
seed = -1L
)
Y |
D x N matrix of counts |
upsilon |
(must be > D) |
ThetaX |
D-1 x N matrix formed by Theta*X (Theta is Prior mean for regression coefficients) |
KInv |
D-1 x D-1 precision matrix (inverse of Xi) |
AInv |
N x N precision matrix given by |
init |
D-1 x N matrix of initial guess for eta used for optimization |
n_samples |
number of samples for Laplace Approximation (=0 very fast as no inversion or decomposition of Hessian is required) |
calcGradHess |
if n_samples=0 should Gradient and Hessian still be calculated using closed form solutions? |
b1 |
(ADAM) 1st moment decay parameter (recommend 0.9) "aka momentum" |
b2 |
(ADAM) 2nd moment decay parameter (recommend 0.99 or 0.999) |
step_size |
(ADAM) step size for descent (recommend 0.001-0.003) |
epsilon |
(ADAM) parameter to avoid divide by zero |
eps_f |
(ADAM) normalized function improvement stopping criteria |
eps_g |
(ADAM) normalized gradient magnitude stopping criteria |
max_iter |
(ADAM) maximum number of iterations before stopping |
verbose |
(ADAM) if true will print stats for stopping criteria and iteration number |
verbose_rate |
(ADAM) rate to print verbose stats to screen |
decomp_method |
decomposition of hessian for Laplace approximation 'eigen' (more stable-slightly, slower) or 'cholesky' (less stable, faster, default) |
optim_method |
(default:"lbfgs") or "adam" |
eigvalthresh |
threshold for negative eigenvalues in decomposition of negative inverse hessian (should be <=0) |
jitter |
(default: 0) if >=0 then adds that factor to diagonal of Hessian before decomposition (to improve matrix conditioning) |
multDirichletBoot |
if >0 then it overrides laplace approximation and samples eta efficiently at MAP estimate from pseudo Multinomial-Dirichlet posterior. |
useSylv |
(default: true) if N<D-1 uses Sylvester Determinant Identity to speed up calculation of log-likelihood and gradients. |
ncores |
(default:-1) number of cores to use, if ncores==-1 then uses default from OpenMP typically to use all available cores. |
seed |
(random seed for Laplace approximation – integer) |
Notation: Let Z_j
denote the J-th row of a matrix Z.
Model:
Y_j \sim Multinomial(\pi_j)
\pi_j = \Phi^{-1}(\eta_j)
\eta \sim T_{D-1, N}(\upsilon, \Theta X, K, A)
Where A = I_N + X \Gamma X'
, K is a (D-1)x(D-1) covariance
matrix, \Gamma
is a Q x Q covariance matrix, and \Phi^{-1}
is ALRInv_D
transform.
Gradient and Hessian calculations are fast as they are computed using closed form solutions. That said, the Hessian matrix can be quite large [N*(D-1) x N*(D-1)] and storage may be an issue.
Note: Warnings about large negative eigenvalues can either signal
that the optimizer did not reach an optima or (more commonly in my experience)
that the prior / degrees of freedom for the covariance (given by parameters
upsilon
and KInv
) were too specific and at odds with the observed data.
If you get this warning try the following.
Try restarting the optimization using a different initial guess for eta
Try decreasing (or even increasing )step_size
(by increments of 0.001 or 0.002)
and increasing max_iter
parameters in optimizer. Also can try
increasing b1
to 0.99 and decreasing eps_f
by a few orders
of magnitude
Try relaxing prior assumptions regarding covariance matrix. (e.g., may want
to consider decreasing parameter upsilon
closer to a minimum value of
D)
Try adding small amount of jitter (e.g., set jitter=1e-5
) to address
potential floating point errors.
List containing (all with respect to found optima)
LogLik - Log Likelihood of collapsed model (up to proportionality constant)
Gradient - (if calcGradHess
=true)
Hessian - (if calcGradHess
=true) of the POSITIVE LOG POSTERIOR
Pars - Parameter value of eta at optima
Samples - (D-1) x N x n_samples array containing posterior samples of eta based on Laplace approximation (if n_samples>0)
Timer - Vector of Execution Times
logInvNegHessDet - the log determinant of the covariacne of the Laplace approximation, useful for calculating marginal likelihood
logMarginalLikelihood - A calculation of the log marginal likelihood based on the laplace approximation
S. Ruder (2016) An overview of gradient descent optimization algorithms. arXiv 1609.04747
JD Silverman K Roche, ZC Holmes, LA David, S Mukherjee. Bayesian Multinomial Logistic Normal Models through Marginally Latent Matrix-T Processes. 2022, Journal of Machine Learning
uncollapsePibble
sim <- pibble_sim()
# Fit model for eta
fit <- optimPibbleCollapsed(sim$Y, sim$upsilon, sim$Theta%*%sim$X, sim$KInv,
sim$AInv, random_pibble_init(sim$Y))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.