In bsaul/geex: An API for M-Estimation

library(geex)
library(lme4)

Setup

A user asked a reasonable question about score contributions summing to zero for a logistic-Normal model.

Here we analyze 500 pairs of 2 units generated from:

[ \Pr(Y_{ij} = 1 | X, b) = \mbox{logit}^{-1}(\beta_0 + \beta_1 x_{ij} + b_i) = h(\beta, x_{ij}, b_i) ]

where $X_{ij} \sim N(0, 1)$ and $b_i \sim f(b_i) = N(0, \sigma^2)$. Let $\theta = {\beta_0, \beta_1, \sigma^2 } = {2, 2, 1}$.

The loglihood for this model is:

[ l(\theta; Y, X) = \sum_{i = 1}^m { \int \prod_{j = 1}^{n_i} h(\beta, x_{ij}, b_i)^y{ij}[1 - h(\beta, x_{ij}, b_i)]^{1 - y{ij}} f(b_i) } ]

Take the derivative of $l(\theta; Y, X)$, set equal to zero, and solve for $\theta$ to find $\hat{\theta}$.

set.seed(1)
#generate data; 500 pairs
n <- 500
m <- 2
beta <- c(2, 2)
id <- rep(1:n, each=m)
x <- rnorm(m*n)
b <- rep(rnorm(n), each=m)
y <- rbinom(m*n, 1, plogis(cbind(1, x) %*% beta + b))
d <- data.frame(y,x,id)

$\theta$ can be estimated from the lme4::glmer function.

#fit logistic-normal random effects model
fit <- lme4::glmer(y~x+(1|id), family=binomial, data=d)
print(summary(fit))

Computing score equations

Note that geex does most of this work internally.

eefun <- function(data, lme4model){
  f <- grab_psiFUN(lme4model, data = data)
  function(theta) f(theta)
}

splitdt <- split(d, f = d$id)

theta_hat <- unlist(lme4::getME(fit, c('beta', 'theta')))

psi <- lapply(splitdt, function(dt){
  eefun(dt, fit)
})

# Evaluate psi functions

psi_eval <- lapply(psi, function(f){
  f(theta_hat)
})

Do these evaluated score equations sum to zero?

colSums(matrix(unlist(psi_eval), ncol = 3, byrow  = TRUE))

No.

As another check, how does the observed loglihood compare to lme4::glmer?

# geex internal functions objFun_glmerMod_binomial and binomial integrand copied
# here since they are not exported
objFun_glmerMod_binomial <- function(parms, response, xmatrix, linkinv)
{
  log(stats::integrate(binomial_integrand, lower = -Inf, upper = Inf,
                parms    = parms,
                response = response,
                xmatrix  = xmatrix,
                linkinv  = linkinv)$value)

}

binomial_integrand <- function(b, response, xmatrix, parms, linkinv){
  if(class(xmatrix) != 'matrix'){
    xmatrix <- as.matrix(xmatrix)
  }
  pr  <- linkinv( drop(outer(xmatrix %*% parms[-length(parms)], b, '+') ) )
  hh  <- stats::dbinom(response, 1, prob = pr)
  hha <- apply(hh, 2, prod)
  hha * stats::dnorm(b, mean = 0, sd = parms[length(parms)])
}

# Compute the per-unit contribution the loglihood
objFun <- lapply(splitdt, function(dt){
  X <- cbind(1, dt$x)
  objFun_glmerMod_binomial(parms = theta_hat, response = dt$y, xmatrix = X, linkinv = plogis )
})

sum(unlist(objFun))
logLik(fit)

Not bad.

What happens if instead of using $\hat{\theta}_{glmer}$ to compute the scores we use estimated parameters obtained from the finding the roots of the score functions as they are defined within the grab_eeFUN function?

NOTE: the evaluation of m_estimate is slow!

# NOT run to speed creation of documentation
attempt <- m_estimate(
  estFUN = eefun,
  data  = d,
  units = 'id',
  root_control = setup_rootFUN(start = theta_hat), # start root finding at lme4's beta hat.
  compute_vcov = FALSE,
  outer_args = list(lme4model = fit)
)

## parameter estimates obtained from geex
attempt

psi_eval2 <- lapply(psi, function(f){
  f(roots(attempt))
})

colSums(matrix(unlist(psi_eval2), ncol = 3, byrow  = TRUE))