R/pool-Wald.R
In lavaan.mi: Fit Structural Equation Models to Multiply Imputed Data

Documented in lavTestWald.mi

### Terrence D. Jorgensen & Yves Rosseel
### Last updated: 30 May 2024
### Pooled Wald test for multiple imputations
### Borrowed source code from lavaan/R/lav_test_Wald.R


##' Wald Test for Multiple Imputations
##'
##' Wald test for testing a linear hypothesis about the parameters of lavaan
##' models fitted to multiple imputed data sets. Statistics for constraining
##' one or more free parameters in a model can be calculated from the pooled
##' point estimates and asymptotic covariance matrix of model parameters
##' using Rubin's (1987) rules, or by pooling the Wald  test statistics
##' across imputed data sets (Li, Meng, Raghunathan, & Rubin, 1991).
##'
##' The constraints are specified using the `"=="` operator.
##' Both the left-hand side and the right-hand side of the equality can contain
##' a linear combination of model parameters, or a constant (like zero).
##' The model parameters must be specified by their user-specified labels from
##' the \code{link[lavaan]{model.syntax}}. Names of defined parameters
##' (using the ":=" operator) can be included too.
##'
##' @aliases lavTestWald.mi
##' @importFrom lavaan parTable lavListInspect
##' @importFrom stats pchisq pf
##'
##' @param object An object of class [lavaan.mi-class].
##' @param constraints A `character` string (typically between single
##'   quotes) containing one or more equality constraints.
##'   See examples for more details
##' @param pool.method `character` indicating which pooling method to use.
##'   `"D1"` or `"Rubin"` (default) indicates Rubin's (1987) rules
##'   will be applied to the point estimates and the asymptotic covariance
##'   matrix of model parameters, and those pooled values will be used to
##'   calculate the Wald test in the usual manner. `"D2"`, `"LMRR"`,
##'   or `"Li.et.al"` indicate that the complete-data Wald test statistic
##'   should be calculated using each imputed data set, which will then be
##'   pooled across imputations, as described in Li, Meng, Raghunathan, & Rubin
##'   (1991) and Enders (2010, chapter 8).
##' @param asymptotic `logical`. If `FALSE` (default), the pooled test
##'   will be returned as an *F*-distributed statistic with numerator
##'   (`df1`) and denominator (`df2`) degrees of freedom.
##'   If `TRUE`, the pooled *F* statistic will be multiplied by its
##'   `df1` on the assumption that its `df2` is sufficiently large
##'   enough that the statistic will be asymptotically \eqn{\chi^2} distributed
##'   with `df1`.
##' @param scale.W `logical`. If `FALSE`, the pooled
##'   asymptotic covariance matrix of model parameters is calculated as the
##'   weighted sum of the within-imputation and between-imputation components.
##'   Otherwise, the pooled asymptotic covariance matrix of model parameters is
##'   calculated by scaling the within-imputation component by the
##'   average relative increase in variance (ARIV; see Enders, 2010, p. 235),
##'   which is *only* consistent when requesting the *F* test (i.e.,
##'   `asymptotic = FALSE`.  Ignored (irrelevant) if `pool.method = "D2"`.
##' @param omit.imps `character` vector specifying criteria for omitting
##'   imputations from pooled results.  Can include any of
##'   `c("no.conv", "no.se", "no.npd")`, the first 2 of which are the
##'   default setting, which excludes any imputations that did not
##'   converge or for which standard errors could not be computed.  The
##'   last option (`"no.npd"`) would exclude any imputations which
##'   yielded a nonpositive definite covariance matrix for observed or
##'   latent variables, which would include any "improper solutions" such
##'   as Heywood cases. Specific imputation numbers can also be included in this
##'   argument, in case users want to  apply their own custom omission criteria
##'   (or simulations can use different numbers of imputations without
##'   redundantly refitting the model).
##' @param verbose `logical`. If `TRUE`, print the restriction
##'   matrix and the estimated restricted values.
##' @param warn `logical`. If `TRUE`, print warnings if they occur.
##'
##' @return
##'   A vector containing the Wald test statistic (either an `F` or
##'   \eqn{\chi^2} statistic, depending on the `asymptotic` argument),
##'   the degrees of freedom (numerator and denominator, if
##'   `asymptotic = FALSE`), and a *p* value. If
##'   `asymptotic = FALSE`, the relative invrease in variance (RIV, or
##'   average for multiparameter tests: ARIV) used to calculate the denominator
##'   *df* is also returned as a missing-data diagnostic, along with the
##'   fraction missing information (FMI = ARIV / (1 + ARIV)).
##'
##' @author
##'   Terrence D. Jorgensen (University of Amsterdam;
##'   \email{TJorgensen314@@gmail.com})
##'
##'   Based on source code for [lavaan::lavTestWald()] by Yves Rosseel
##'
##' @references
##'   Enders, C. K. (2010). *Applied missing data analysis*.
##'   New York, NY: Guilford.
##'
##'   Li, K.-H., Meng, X.-L., Raghunathan, T. E., & Rubin, D. B. (1991).
##'   Significance levels from repeated *p*-values with multiply-imputed
##'   data. *Statistica Sinica, 1*(1), 65--92. Retrieved from
##'   <https://www.jstor.org/stable/24303994>
##'
##'   Rubin, D. B. (1987). *Multiple imputation for nonresponse in surveys*.
##'   New York, NY: Wiley. \doi{10.1002/9780470316696}
##'
##' @seealso [lavaan::lavTestWald()]
##'
##' @examples
##' data(HS20imps) # import a list of 20 imputed data sets
##'
##' ## specify CFA model from lavaan's ?cfa help page
##' HS.model <- '
##'   visual  =~ x1 + b1*x2 + x3
##'   textual =~ x4 + b2*x5 + x6
##'   speed   =~ x7 + b3*x8 + x9
##' '
##'
##' fit <- cfa.mi(HS.model, data = HS20imps)
##'
##' ## Testing whether a single parameter equals zero yields the 'chi-square'
##' ## version of the Wald z statistic from the summary() output, or the
##' ## 'F' version of the t statistic from the summary() output, depending
##' ## whether asymptotic = TRUE or FALSE
##' lavTestWald.mi(fit, constraints = "b1 == 0")             # default D1 statistic
##' lavTestWald.mi(fit, constraints = "b1 == 0", pool.method = "D2") # D2 statistic
##'
##' ## The real advantage is simultaneously testing several equality
##' ## constraints, or testing more complex constraints:
##' con <- '
##'    2*b1 == b3
##'    b2 - b3 == 0
##' '
##' lavTestWald.mi(fit, constraints = con) # default F statistic
##' lavTestWald.mi(fit, constraints = con, asymptotic = TRUE) # chi-squared
##'
##'
##' @export
lavTestWald.mi <- function(object, constraints = NULL, pool.method = c("D1","D2"),
                           asymptotic = FALSE, scale.W = !asymptotic,
                           omit.imps = c("no.conv","no.se"),
                           verbose = FALSE, warn = TRUE) {
  ## this also checks the class:
  useImps <- imps2use(object = object, omit.imps = omit.imps)
  m <- length(useImps)

  pool.method <- tolower(pool.method[1])
  if (pool.method %in% c("d2", "lmrr", "li.et.al")) pool.method <- "D2"
  if (pool.method %in% c("d1", "rubin")) pool.method <- "D1"
  if (!pool.method %in% c("D1","D2")) stop('Invalid choice of "pool.method" argument.')

  message('\nWald test calculated using se = "',
          lavListInspect(object, "options")$se, '"\n')

  if (pool.method == "D2") {

    oldCall <- object@lavListCall

    if (!is.null(oldCall$parallel)) {
      if (oldCall$parallel == "snow") {
        oldCall$parallel <- "no"
        oldCall$ncpus <- 1L
        if (warn) warning("Unable to pass lavaan::lavTestWald() arguments ",
                          "when parallel='snow'. Switching to parallel='no'.",
                          " Unless using Windows, parallel='multicore' works.")
      }
    }

    ## call lavaanList() again to run lavTestWald() on each imputation
    oldCall$FUN <- function(obj) {
      out <- try(lavaan::lavTestWald(object = obj, constraints = constraints,
                                     verbose = FALSE), silent = TRUE)
      if (inherits(out, "try-error")) return(NULL)
      do.call(c, out[1:2])
    }
    FIT <- eval(as.call(oldCall))
    ## check if there are any results
    noStats <- sapply(FIT@funList, is.null)
    if (all(noStats)) stop("No success using lavTestWald() on any imputations.",
                           call. = FALSE)

    ## template to fill in pooled values

    ## at a minimum, pool the total Wald pool.method
    chiList <- sapply(FIT@funList[ intersect(useImps, which(!noStats)) ],
                      "[[", i = 1)
    DF <- FIT@funList[[ intersect(useImps, which(!noStats))[1] ]][[2]]
    out <- calculate.D2(chiList, DF = DF, asymptotic)
    class(out) <- c("lavaan.vector","numeric")
    ## add header
    attr(out, "header") <- paste("Wald statistic pooled using the",
                                 pool.method, "pooling method")
    return(out)
  } # else pool.method == "D1", making 'scale.W=' relevant


  ## "borrowed" lavTestWald()
  if (is.null(constraints) || nchar(constraints) == 0L) stop("constraints are empty")

  # remove == constraints from parTable, save as list
  PT <- parTable(object)
  partable <- as.list(PT[PT$op != "==", ])

  # parse constraints
  FLAT <- lavaan::lavParseModelString( constraints )
  CON <- attr(FLAT, "constraints")
  LIST <- list()
  if (length(CON) > 0L) {
    lhs <- unlist(lapply(CON, "[[", i = "lhs"))
    op <- unlist(lapply(CON, "[[", i = "op"))
    rhs <- unlist(lapply(CON, "[[", i = "rhs"))
    LIST$lhs <- c(LIST$lhs, lhs) # FIXME: why concatenate with NULL?
    LIST$op  <- c(LIST$op,  op)
    LIST$rhs <- c(LIST$rhs, rhs)
  } else stop("no equality constraints found in constraints argument")

  # theta = free parameters only (equality-constrained allowed)
  theta <- coef_lavaan_mi(object, omit.imps = omit.imps) #object@optim$x

  # build constraint function
  ceq.function <- lavaan::lav_partable_constraints_ceq(partable = partable,
                                                       con = LIST, debug = FALSE)
  # compute jacobian restrictions
  JAC <- try(lavaan::lav_func_jacobian_complex(func = ceq.function, x = theta),
             silent = TRUE)
  if (inherits(JAC, "try-error")) { # eg. pnorm()
    JAC <- lavaan::lav_func_jacobian_simple(func = ceq.function, x = theta)
  }
  if (verbose) {cat("Restriction matrix (jacobian):\n"); print(JAC); cat("\n")}

  # linear restriction
  theta.r <- ceq.function( theta )
  if (verbose) {cat("Restricted theta values:\n"); print(theta.r); cat("\n")}

  # get VCOV
  VCOV <- vcov_lavaan_mi(object, scale.W = scale.W, omit.imps = omit.imps)
  # restricted vcov
  info.r  <- JAC %*% VCOV %*% t(JAC)

  # Wald test statistic
  test.stat <- as.numeric(t(theta.r) %*% solve( info.r ) %*% theta.r)

  # number of constraints (k in Enders (2010, p. 235) eqs. 8.23-25)
  DF <- nrow(JAC)

  if (asymptotic) {
    out <- c("chisq" = test.stat, df = DF,
             pvalue = pchisq(test.stat, df = DF, lower.tail = FALSE))
  } else {
    W <- vcov_lavaan_mi(object, type = "within" , omit.imps = omit.imps)
    B <- vcov_lavaan_mi(object, type = "between", omit.imps = omit.imps)
    #FIXME: only valid for linear constraints?
    ## restricted B & W components of VCOV
    W.r  <- JAC %*% W %*% t(JAC)
    B.r  <- JAC %*% B %*% t(JAC)
    ## relative increase in variance due to missing data
    W.inv <- MASS::ginv(W.r)
    ariv <- (1 + 1/m) * sum(diag(B.r %*% W.inv)) / DF
    ## calculate denominator DF for F statistic
    a <- DF*(m - 1)
    if (a > 4) {
      v2 <- 4 + (a - 4) * (1 + (1 - 2/a)*(1 / ariv))^2 # Enders (eq. 8.24)
    } else {
      v2 <- a*(1 + 1/DF) * (1 + 1/ariv)^2 / 2 # Enders (eq. 8.25)
    }
    out <- c("F" = test.stat / DF, df1 = DF, df2 = v2,
             pvalue = pf(test.stat / DF, df1 = DF, df2 = v2, lower.tail = FALSE),
             ariv = ariv, fmi = ariv / (1 + ariv))
  }

  class(out) <- c("lavaan.vector","numeric")
  ## add header
  attr(out, "header") <- paste("Wald statistic pooled using the",
                               pool.method, "pooling method")
  out
}