library(knitr) opts_chunk$set(echo = FALSE, cache = TRUE)
A method to evaluate the likelihood or posterior for a linear mixed model in order to
while being sure that we're not missing the global optimum.
\
Approximately Exact Calculations for Linear Mixed Models, Lavine and Hodges, 2015.
[ y = X \beta + Z u + \epsilon ]
[ \epsilon \sim \textrm{N}(0, R); \quad u \sim \textrm{N}(0, G) ]
- Integrate out the fixed effects, maximize the restricted likelihood ($RL$) to estimate $G$ and $R$.
- Treat $G$, $R$ as known. Plug into ML estimate for $\beta$ and $u$.
\
Main burden of calculation: optimize the restricted likelihood.
lmer, nlme
{.build}
- Nelder-Mead: iterative simplex
- BOBYQA: quadratic approximation
BFGS: quasi-Newton
Pros: works for any $f$, derivative-free, fast
- Cons: may not converge, can miss global optima
Two variance model:
$$ \epsilon \sim \textrm{N}(0, \sigma^2_e \Sigma_e); \quad u \sim \textrm{N}(0, \sigma^2_s \Sigma_s) $$
\
A more tractable objective function:
$$ \log \pi (\sigma^2_e, \sigma^2_s) \propto \sum_j \left[ c_j \log (a_j \sigma^2_s + b_j \sigma^2_e ) + \frac{d_j}{a_j \sigma^2_s + b_j \sigma^2_e} \right] $$
\
$$ {a_j, b_j, c_j, d_j} > 0 $$
$$ c_j \log(\textrm{linear term}) + \frac{d_j}{\textrm{linear term}} $$
x <- 1:6000 minv <- c(30, 300, 3000) maxes <- rep(NA, 3) par(mfrow = c(1, 3)) for(i in 1:3) { fx <- -0.5 * (log(x) + minv[i]/x) plot(x, fx, type = "n", ylab = "f", xlab = "linear term", main = bquote(d[j] ~ "=" ~ .(minv[i]))) lines(x, fx, col = "goldenrod", lwd = 2) abline(v = which.max(fx), col = "cadetblue", lwd = 2) maxes[i] <- max(fx) }
$$ f(\sigma^2_e, \sigma^2_s) \propto \sum_j \left[ c_j \log (a_j \sigma^2_s + b_j \sigma^2_e ) + \frac{d_j}{(a_j \sigma^2_s + b_j \sigma^2_e)} \right] $$
\
$$ \frac{\partial f(\sigma^2_e, \sigma^2_s)}{\partial \sigma^2_e} \propto \sum_j \frac{a_j (a_j c_j \sigma^2_s + b_j c_j \sigma^2_e - d_j)}{(a_j \sigma^2_s + b_j \sigma^2_e)^2} $$
\
\
$$ \begin{aligned} 0 &= a_j c_j \sigma^2_s + b_j c_j \sigma^2_e - d_j \ \ \sigma^2_s &= \frac{d_j}{a_j c_j} - \frac{b_j}{a_j}\sigma^2_e \end{aligned} $$
npix <- 400 a_j <- 12.9 b_j <- 1 c_j <- 1 d_j <- 5 x <- seq(0, 57, length.out = npix) y <- seq(0, 3, length.out = npix) fterm <- function(x, y, a_j, b_j) { linearterm <- a_j * y + b_j * x ifelse(linearterm == 0, NA, -0.5 * (c_j * log(linearterm) + d_j/linearterm)) } z <- outer(x, y, FUN = fterm, a_j = a_j, b_j = b_j) xyzdf <- data.frame(x = rep(x, npix), y = rep(y, each = npix), z = c(z)) # plot surface in original coordinates library(ggplot2); library(RColorBrewer) mypal <- colorRampPalette(brewer.pal(6 , "YlOrRd")) xyzdf$zbinned <- cut(xyzdf$z,breaks = c(-Inf, seq(-2.5, -1.3, length.out = 14)), right = FALSE) p <- ggplot(xyzdf) + aes(x = x, y = y, fill = zbinned) + scale_fill_manual(values = rev(mypal(15)))+ labs(x = expression(sigma[e]^2), y = expression(sigma[s]^2), fill = "f") + theme_bw() q <- p + geom_tile(alpha = 0) + geom_abline(intercept = d_j / (a_j * c_j), slope = -b_j / a_j, col = "cadetblue", lwd = 1.5) q
$a_j = 12.9; \quad b_j = 1; \quad c_j = 1; \quad d_j = 5$
$f$ is maxed along $\sigma^2_s = \frac{d_j}{a_j c_j} - \frac{b_j}{a_j}\sigma^2_e$
q + annotate("point", pch = "+", x = 1, y = .05, size = 13) + annotate("point", pch = "-", x = 7, y = .3, size = 16)
w <- p + geom_tile() + geom_abline(intercept = d_j / (a_j * c_j), slope = -b_j / a_j, col = "cadetblue", lwd = 1.5) w
w + annotate("rect", xmin = 15, xmax = 25, ymin = .65, ymax = 1.2, alpha = .2)
b1 <- w + annotate("rect", xmin = 15, xmax = 25, ymin = .65, ymax = 1.2, alpha = .2) + annotate("point", x = 15, y = .65) + annotate("text", x = 17, y = .59, label = "A") b1
b2 <- b1 + annotate("point", x = 25, y = 1.2) + annotate("text", x = 27, y = 1.14, label = "B") b2
Bounds on $f$ in box $b_a: \left(f(B), f(A) \right)$
b2b <- w + annotate("rect", xmin = 2, xmax = 12, ymin = .1, ymax = .65, alpha = .2) b2b
b2b + annotate("point", x = 2, y = .1) + annotate("text", x = 0, y = .12, label = "A") + annotate("point", x = 12, y = .65) + annotate("text", x = 14, y = .59, label = "B") + annotate("point", x = 2, y = .23) + annotate("text", x = 4, y = .3, label = "C")
Bounds on $f$ in box $b_s: \left(min\left(f(A), f(B)\right), f(C) \right)$
w + annotate("rect", xmin = 15, xmax = 25, ymin = .65, ymax = 1.2, alpha = .2)
b3 <- w + annotate("rect", xmin = 15, xmax = 19.5, ymin = .65, ymax = .925 - 0.0275, alpha = .2) + annotate("rect", xmin = 20.5, xmax = 25, ymin = .65, ymax = .925 - 0.0275, alpha = .2) + annotate("rect", xmin = 15, xmax = 19.5, ymin = .925 + 0.0275, ymax = 1.2, alpha = .2) + annotate("rect", xmin = 20.5, xmax = 25, ymin = .925 + 0.0275, ymax = 1.2, alpha = .2) b3
b4 <- b3 + annotate("point", x = 15, y = .65, col = "cadetblue") + annotate("point", x = 20, y = .65, col = "cadetblue") + annotate("point", x = 15, y = .925, col = "cadetblue") + annotate("point", x = 20, y = .925, col = "cadetblue") b4
b4 + annotate("point", x = 20, y = .925, col = "navajowhite") + annotate("point", x = 25, y = .925, col = "navajowhite") + annotate("point", x = 20, y = 1.2, col = "navajowhite") + annotate("point", x = 25, y = 1.2, col = "navajowhite")
- Each term is maximized along a line in Q1.
- Slope < 0 and intercept > 0.
- Within $b$ the top-right and bottom-left form the bounds.
- above the line, $\left(f(TR), f(BL)\right)$
- below the line, $\left(f(BL), f(TR)\right)$
- straddle the line, $\left(min(f(TR), f(BL)), f(line)\right)$
- For narrower bounds, subdivide $b$ (branch).
w2 <- p + geom_tile(alpha = 0) + annotate("rect", xmin = 15, xmax = 25, ymin = .65, ymax = 1.2, alpha = .2) + annotate("point", x = 25, y = 1.2) + annotate("text", x = 27, y = 1.14, label = "B") + annotate("point", x = 15, y = .65) + annotate("text", x = 17, y = .59, label = "A") w2 + annotate("text", x = 11, y = .03, label = "j = 1", size = 4) + geom_abline(intercept = d_j / (a_j * c_j), slope = -b_j / a_j, col = "cadetblue", lwd = 1.5)
$$ j = 1; \quad \left(f_1(B), f_1(A) \right) $$
w2 + annotate("text", x = 11, y = .03, label = "j = 1", size = 4, col = "grey") + geom_abline(intercept = d_j / (a_j * c_j), slope = -b_j / a_j, col = "lightgrey", lwd = 1.5) + annotate("text", x = 53, y = .6, size = 4, label = "j = 2") + geom_abline(intercept = 1.8, slope = -.018, col = "cadetblue", lwd = 1.5)
$$ j = 1; \quad \left(f_1(B), f_1(A) \right) \ j = 2; \quad \left(f_2(A), f_2(B) \right) $$
w2 + annotate("text", x = 11, y = .03, label = "j = 1", size = 4, col = "grey") + geom_abline(intercept = d_j / (a_j * c_j), slope = -b_j / a_j, col = "lightgrey", lwd = 1.5) + annotate("text", x = 53, y = .6, size = 4, label = "j = 2") + geom_abline(intercept = 1.8, slope = -.018, col = "cadetblue", lwd = 1.5)
$$ \left.\begin{aligned} j &= 1; \quad \left(f_1(B), f_1(A) \right)\ j &= 2; \quad \left(f_2(A), f_2(B) \right) \end{aligned} \right} \qquad \underbrace{f_1(B) + f_2(A)}{L^b}, \underbrace{f_1(A) + f_2(B)}{U^b} $$
Within box $b$, the bounds on $f$ can be formed by
$$ L^b \equiv \sum_j L^b_j \ U^b \equiv \sum_j U^b_j $$
where each term in the sum is $f_j(p)$ evaluated at the appropriate point $p$.
Because $f$ is continuous, $U^b - L^b \rightarrow 0$ as $b$ shrinks in both directions.
To evaluate $f$ arbitrarily well everywhere within active box $B$,
- Specify $\epsilon$.
- With $y, X, Z$ compute ${a_j, b_j, c_j, d_j}$ for all $j$.
- For every active box $b$
- evaluate $f_j(p)$ for all $j$ to get $L^b, U^b$
- if $U^b - L^b < \epsilon$, make $b$ inactive
- else subdivide $b$ into 4 active boxes and repeat (3).
\
For optimization only:
Question: what will be the estimated cost of moving people to HMO plans?
Data on cost of $i = 1, \ldots, 341$ plans in $j = 1, \ldots, 45$ states
$$ \begin{aligned} y_{ij} &= \alpha_j + \epsilon_{ij} \ \alpha_j &= \beta_0 + \beta_1 x_{1j} + \beta_2 x_{2j} + \delta_j \end{aligned} $$
$$ \epsilon \sim \textrm{N}(0, \sigma^2_e \Sigma_e); \quad \delta \sim \textrm{N}(0, \sigma^2_s \Sigma_s) $$
A method to evaluate the likelihood or posterior for a linear mixed model in order to
while being sure that we're not missing global optima.
- Efficiency/Speed
- Utilize structure of $f$
- Pre-allocate memory for (
library(data.table)
)- Sequential runs / alternation.
- Generalization
- Three variance? Four Variance?
www.github.com/andrewpbray/lmmoptim
\
Thanks for having us!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.