We want to get a sense of the shape of a single term's contribution to the log posterior function.

[ \log \pi(\sigma^2_e, \sigma^2_s | y) = B - \frac{1}{2}\sum_j \bigg[ c_j \log(a_j \sigma^2_s + b_j \sigma^2_e) + \frac{d_j}{(a_j \sigma^2_s + b_j \sigma^2_e)} \bigg] ]

HMO example

To get a sense of the shape that each term takes, we fit a simple random effects model.

library(knitr)
opts_chunk$set(echo = FALSE, warning = FALSE, message = FALSE)
library(ggplot2); library(dplyr); library(devtools);
devtools::install_github("andrewpbray/lmmoptim")
library(lmmoptim)

data(hmo)
data(hmo_states)
hmobig <- left_join(hmo, hmo_states, by = "state")

y <- hmobig$indPrem
n <- length(y)
mod <- lm(indPrem ~ expPerAdm + (region=="NE") + state, data = hmobig, x = TRUE )
x <- mod$x[, 1:3]
z <- mod$x[, -(1:3), drop = FALSE]
SigE <- diag(n)
SigS <- diag(44)

lines <- findlines(x, z, y, SigE, SigS)

In the figure below to the left, we plot only the first term in the sum, with the maximum value achieved along the dotted gray line. The value of function is mapped to color, with lowest values being red and high values being light yellow. The function steeply decends near the origin, so that area is not color mapped so as to not saturate the color palette. The value of that the function evaluates to along the solid blue line is plotted separately below to the right, where it is a function of the linear term, $(a_j \sigma^2_s + b_j \sigma^2_e)$.

plotTerm <- function(lines, which.line = 1, npix = 100) {
  c_j <- lines[which.line, "multiplier.log"]
  d_j <- lines[which.line, "multiplier.inv"]
  x <- seq(0, lines[which.line, "int.sigsqe"], length.out = npix)
  y <- seq(0, lines[which.line, "int.sigsqs"], length.out = npix)
  fterm <- function(x, y, lines, which.line) {
    linearterm <- lines[which.line, "a"] * y + lines[which.line, "b"] * x
    ifelse(linearterm == 0, NA,
           -0.5 * (lines[which.line, "multiplier.log"] * log(linearterm) +
                   lines[which.line, "multiplier.inv"]/linearterm))
  }
  z <- outer(x, y, FUN = fterm, lines = lines, which.line = which.line)

  par(mfrow = c(1, 2))
  # plot surface in original coordinates
  image(x, y, z, zlim = c(.05 * min(as.numeric(z), na.rm = T),
                          max(as.numeric(z), na.rm = T)))
  abline(lines[which.line, "int.sigsqs"],
         -lines[which.line, "int.sigsqs"]/lines[which.line, "int.sigsqe"],
         col = "gray", lwd = 2, lty = 2)
  perpslope <- tan(pi/2 - atan(lines[which.line, "int.sigsqs"]/lines[which.line, "int.sigsqe"]))
  abline(0, 1/perpslope, col = "steelblue", lwd = 2)

  # plot trace of linear component
  linearterm <- lines[which.line, "a"] * y + x
  plot(linearterm, -0.5 * (c_j * log(linearterm) + d_j/linearterm), type = "n",
       ylab = "f", xlab = "linear term")
  lines(linearterm, -0.5 * (c_j * log(linearterm) + d_j/linearterm),
        col = "steelblue", lwd = 2)
  abline(v = linearterm[which.max(-0.5 * (c_j * log(linearterm) + d_j/linearterm))],
         col = "tomato")
}

plotTerm(lines, which.line = 1, npix = 200)

We can see that the function drops off very sharply towards the origin and very gradually away from it. It's maximum is indicated with the vertical red line.

Each term in the sum will have it's own set of line parameters, ${ a_j, b_j}$, and function shape parameters, ${c_j, d_j }$. $c_j$ is one for all terms except that corresponding to the prior, while the values of $d_j$ range the first three orders of magnitude (at least in the HMO model). Below we plot the shape of the function corresponding to three different values of $d_j$ and indicate the max with a vertical red line (note the differing y axes).

x <- 1:6000
minv <- c(30, 300, 3000)
maxes <- rep(NA, 3)
par(mfrow = c(1, 3))
for(i in 1:3) {
  fx <- -0.5 * (log(x) + minv[i]/x)
  plot(x, fx, type = "n", ylab = "f", xlab = "linear term",
       main = bquote(d[j] ~ "=" ~ .(minv[i])))
  lines(x, fx, col = "steelblue", lwd = 2)
  abline(v = which.max(fx), col = "tomato")
  maxes[i] <- max(fx)
}

Some take away lessons from this:

  1. The maximum values of each of these terms are similar, r round(maxes, 2).
  2. At higher $d_j$, the function can achieve very low values, but only over a shrinking domain near the origin. If we avoid the origin on these terms, we we should be good.
  3. At lower $d_j$, the range of the function is more restricted, but it is more variable.

Can we be more efficient by addressing the terms in order by descending $d_j$? Is there any sensible way to rescale each term so that some of these parameters are standardized across the terms, then optimize, then undo the scaling?



andrewpbray/lmmoptim documentation built on May 10, 2019, 11:10 a.m.