R/semlbci.R
In semlbci: Likelihood-Based Confidence Interval in Structural Equation Models

Documented in semlbci

#' @title Likelihood-Based Confidence Interval
#'
#' @description Find the likelihood-based confidence intervals (LBCIs) for
#'  selected free parameters in an SEM output.
#'
#' @details [semlbci()] finds the positions of the selected parameters
#'  in the parameter table and then calls [ci_i_one()] once for each
#'  of them. For the technical details, please see [ci_i_one()] and
#'  the functions it calls to find a confidence bound, currently
#'  [ci_bound_wn_i()]. [ci_bound_wn_i()] uses the approach proposed by
#'  Wu and Neale (2012) and illustrated by Pek and Wu (2015).
#'
#' It supports updating an output of [semlbci()] by setting
#' `semlbci_out`. This allows forming LBCIs for some parameters after
#' those for some others have been formed.
#'
#' If possible, parallel processing should be used (see `parallel` and
#' `ncpus`), especially for a model with many parameters.
#'
#' If the search for some of the confidence bounds failed, with `NA` for the
#' bounds, try increasing `try_k_more_times`.
#'
#' The SEM output will first be checked by [check_sem_out()] to see
#' whether the model and the estimation method are supported. To skip this
#' test (e.g., for testing or experimenting with some models and estimators),
#' set `check_fit` to `FALSE`.
#'
#' Examples and technical details can be found at Cheung
#' and Pesigan (2023), the website of the `semlbci`
#' package (https://sfcheung.github.io/semlbci/),
#' and the technical appendices at
#' (https://sfcheung.github.io/semlbci/articles/).
#'
#' It currently supports [lavaan::lavaan-class] outputs only.
#'
#' @return A `semlbci`-class object similar to the parameter table
#'  generated by [lavaan::parameterEstimates()], with the LBCIs for
#'  selected parameters added. Diagnostic information, if requested,
#'  will be included in the attributes. See [print.semlbci()] for options
#'  available.
#'
#' @param sem_out The SEM output. Currently supports
#'  [lavaan::lavaan-class] outputs only.
#'
#' @param pars The positions of the parameters for which the LBCIs are
#'  to be searched. Use the position as appeared on the parameter
#'  tables of the `sem_out`. If `NULL`, the default, then LBCIs for
#'  all free parameters will be searched. Can also be a vector of
#'  strings to indicate the parameters on the parameter table. The
#'  parameters should be specified in [lavaan::lavaan()] syntax. The
#'  vector of strings will be converted by [syntax_to_i()] to
#'  parameter positions. See [syntax_to_i()] on how to specify the parameters.
#'
#' @param include_user_pars Logical. Whether all user-defined parameters
#'  are automatically included when `pars` is not set. Default is `TRUE`.
#'  If `pars` is explicitly set, this argument will be ignored.
#'
#' @param remove_variances Logical. Whether variances and error variances
#'  will be removed. Default is `TRUE`, removing all variances and error
#'  variances even if specified in `pars`.
#'
#' @param remove_intercepts Logical. Whether intercepts will be removed.
#'  Default is `TRUE`, removing all intercepts (parameters with operator `~1`).
#'  Intercepts are not yet supported in standardized solution and so will
#'  always be removed if `standardized = TRUE`.
#'
#' @param ciperc The proportion of coverage for the confidence
#'  interval. Default is .95, requesting a 95 percent confidence
#'  interval.
#'
#' @param standardized If `TRUE`, the LBCI is for the standardized estimates.
#'
#' @param method The method to be used to search for the confidence
#'  bounds. Supported methods are`"wn"` (Wu-Neale-2012), the default,
#' and `"ur"` (root finding by [stats::uniroot()]).
#'
#' @param robust Whether the LBCI based on robust likelihood ratio
#'  test is to be found. Only `"satorra.2000"` in [lavaan::lavTestLRT()]
#'  is supported for now, implemented by the method proposed by Falk
#'  (2018). If `"none"`, the default, then likelihood ratio test based
#'  on maximum likelihood estimation will be used.
#'
#' @param try_k_more_times How many more times to try if failed.
#'                         Default is 2.
#'
#' @param semlbci_out An `semlbci-class` object. If provided, parameters already
#'                    with LBCIs formed will be excluded from `pars`.
#'
#' @param check_fit If `TRUE` (default), the input (`sem_out`) will
#'                    be checked by [check_sem_out()]. If not
#'                    supported, an error will be raised. If `FALSE`,
#'                    the check will be skipped and the LBCIs will be
#'                    searched even for a model or parameter not
#'                    supported. Set to `TRUE` only for testing.
#'
#' @param ... Arguments to be passed to [ci_bound_wn_i()].
#'
#' @param parallel If `TRUE`, will use parallel processing to do the search.
#'
#' @param ncpus The number of workers, if `parallel` is `TRUE`.
#'  Default is 2. This number should not be larger than the number CPU
#'  cores.
#'
#' @param use_pbapply If `TRUE` and `pbapply`
#'  is installed, [pbapply::pbapply()] will be used to display a
#'  progress bar when finding the intervals. Default is `TRUE`.
#'  Ignored if `pbapply` is not installed.
#'
#' @param loadbalancing Whether load
#' balancing is used when `parallel`
#' is `TRUE` and `use_pbapply` is
#' `TRUE`.
#'
#' @author Shu Fai Cheung <https://orcid.org/0000-0002-9871-9448>
#'
#' @references
#'
#' Cheung, S. F., & Pesigan, I. J. A. (2023). *semlbci*:
#' An R package for forming likelihood-based confidence
#' intervals for parameter estimates, correlations,
#' indirect effects, and other derived parameters.
#' *Structural Equation Modeling: A Multidisciplinary Journal*,
#' *30*(6), 985--999.
#' \doi{10.1080/10705511.2023.2183860}
#'
#' Falk, C. F. (2018). Are robust standard errors the best approach
#' for interval estimation with nonnormal data in structural equation
#' modeling? *Structural Equation Modeling: A Multidisciplinary
#' Journal, 25*(2), 244-266.
#' \doi{10.1080/10705511.2017.1367254}
#'
#' Pek, J., & Wu, H. (2015). Profile likelihood-based confidence
#'  intervals and regions for structural equation models.
#'  *Psychometrika, 80*(4), 1123-1145.
#'  \doi{10.1007/s11336-015-9461-1}
#'
#' Wu, H., & Neale, M. C. (2012). Adjusted confidence intervals for a
#'  bounded parameter. *Behavior Genetics, 42*(6), 886-898.
#'  \doi{10.1007/s10519-012-9560-z}
#'
#'
#' Pritikin, J. N., Rappaport, L. M., & Neale, M. C. (2017). Likelihood-based
#' confidence intervals for a parameter with an upper or lower bound.
#' *Structural Equation Modeling: A Multidisciplinary Journal, 24*(3), 395-401.
#' \doi{10.1080/10705511.2016.1275969}
#'
#' @seealso [print.semlbci()], [confint.semlbci()], [ci_i_one()], [ci_bound_wn_i()]
#'
#' @examples
#'
#' library(lavaan)
#' mod <-
#' "
#' m ~ a*x
#' y ~ b*m
#' ab := a * b
#' "
#' fit_med <- sem(mod, simple_med, fixed.x = FALSE)
#' p_table <- parameterTable(fit_med)
#' p_table
#' lbci_med <- semlbci(fit_med,
#'                     pars = c("m ~ x",
#'                              "y ~ m",
#'                              "ab :="))
#' lbci_med
#'
#' @export

semlbci <- function(sem_out,
                    pars = NULL,
                    include_user_pars = TRUE,
                    remove_variances = TRUE,
                    remove_intercepts = TRUE,
                    ciperc = .95,
                    standardized = FALSE,
                    method = c("wn", "ur"),
                    robust = c("none", "satorra.2000"),
                    try_k_more_times = 2,
                    semlbci_out = NULL,
                    check_fit = TRUE,
                    ...,
                    parallel = FALSE,
                    ncpus = 2,
                    use_pbapply = TRUE,
                    loadbalancing = TRUE) {
    if (!inherits(sem_out, "lavaan")) {
        stop("sem_out is not a supported object.")
      }
    if (!is.null(semlbci_out)) {
        if (!inherits(semlbci_out, "semlbci")) {
            stop("semlbci_out is not an output of semlbci().")
          }
      }
    method <- match.arg(method)
    robust <- match.arg(robust)
    sem_out_name <- deparse(substitute(sem_out))

    # Use satorra.2000 automatically if
    # - a scaled test is used and
    # - the method is "ur"

    scaled <- any(names(lavaan::lavInspect(sem_out, "test")) %in%
                        c("satorra.bentler",
                          "yuan.bentler",
                          "yuan.bentler.mplus",
                          "mean.var.adjusted",
                          "scaled.shifted"))
    if (scaled && (method == "ur")) {
        robust <- "satorra.2000"
      }

    # Check sem_out
    if (check_fit) {
        sem_out_check <- check_sem_out(sem_out = sem_out, robust = robust)
        if (sem_out_check > 0) {
            msg <- paste0(paste(attr(sem_out_check, "info"), collapse = "\n"), "\n")
            warning(msg, immediate. = TRUE)
          }
        if (sem_out_check < 0) {
            msg <- paste0(paste(attr(sem_out_check, "info"), collapse = "\n"), "\n")
            stop(msg)
          }
      }

    ptable <- as.data.frame(lavaan::parameterTable(sem_out))
    pars_lbci_yes <- rep(FALSE, nrow(ptable))
    if (!is.null(semlbci_out)) {
        # Check whether semlbci_out and sem_out match
        tmp0 <- ptable[, c("id", "lhs", "op", "rhs", "group", "label")]
        tmp1 <- as.data.frame(semlbci_out)[, c("id", "lhs", "op", "rhs", "group", "label")]
        tmp0 <- tmp0[order(tmp0$id), ]
        tmp1 <- tmp1[order(tmp1$id), ]
        rownames(tmp0) <- NULL
        rownames(tmp1) <- NULL
        if (!identical(tmp0, tmp1)) {
            stop("semlbci_out and the parameter table of sem_out do not match.")
          }
        # Find pars with both lbci_lb and lbci_ub
        pars_lbci_yes <- !sapply(semlbci_out$lbci_lb, is.na) &
                         !sapply(semlbci_out$lbci_ub, is.na)
      }
    i <- ptable$free > 0
    i_id <- ptable$id
    i_id_free <- i_id[i]
    i_id_user <- i_id[(ptable$free == 0) & (ptable$op == ":=")]
    # pars must be the row numbers as in the lavaan parameterTable.
    # Support operators: "~", "~~", "=~", ":="
    pars_tmp <- fix_pars(pars = pars,
                         sem_out = sem_out,
                         standardized = standardized,
                         include_user_pars = include_user_pars,
                         remove_variances = remove_variances,
                         remove_intercepts = remove_intercepts,
                         pars_lbci_yes = pars_lbci_yes,
                         i_id = i_id,
                         i_id_free = i_id_free,
                         i_id_user = i_id_user)
    pars <- pars_tmp$pars
    i_selected <- pars_tmp$i_selected

    if (length(pars) == 0) {
        if (is.null(semlbci_out)) {
            stop("The number of parameters selected is zero.")
          } else {
            warning("No parameter selected. semlbci_out returned.")
            return(semlbci_out)
          }
      }
    npar <- sum(i)
    # KEEP for reference: environment(set_constraint) <- parent.frame()
    if (method == "wn") {
        f_constr <- eval(set_constraint(sem_out = sem_out, ciperc = ciperc),
                        envir = parent.frame())
      } else {
        f_constr <- NULL
      }

    # Compute the scaling factors
    if (robust == "satorra.2000") {
        sf_full_list <- lapply(pars,
                               scaling_factor3,
                               sem_out = sem_out,
                               standardized = standardized,
                               std_method = "internal",
                               sem_out_name = sem_out_name)
      } else {
        sf_full_list <- rep(NA, length(pars))
      }

    if (parallel) {
        # Parallel
        cl <- parallel::makeCluster(ncpus)
        # Rebuild the environment
        pkgs <- .packages()
        pkgs <- rev(pkgs)
        parallel::clusterExport(cl, "pkgs", envir = environment())
        parallel::clusterEvalQ(cl, {sapply(pkgs,
                        function(x) library(x, character.only = TRUE))
                      })
        parallel::clusterExport(cl, ls(envir = parent.frame()),
                                       envir = environment())
        if (requireNamespace("pbapply", quietly = TRUE) &&
            use_pbapply) {
            if (loadbalancing) {
                pboptions_old <- pbapply::pboptions(use_lb = TRUE)
                # Restore pboptions on exit
                on.exit(pbapply::pboptions(pboptions_old))
              }
            # Use pbapply
            args_final <- utils::modifyList(list(...),
                                    list(npar = npar,
                                        sem_out = sem_out,
                                        standardized = standardized,
                                        debug = FALSE,
                                        f_constr = f_constr,
                                        method = method,
                                        ciperc = ciperc,
                                        robust = robust,
                                        try_k_more_times = try_k_more_times,
                                        sem_out_name = sem_out_name))
            pars2 <- rep(pars, each = 2)
            sf_full_list2 <- rep(sf_full_list, each = 2)
            which2 <- rep(c("lbound", "ubound"), times = length(pars))
            plist <- mapply(function(x, y, z) list(i = x, sf_full = y, which = z),
                            x = pars2, y = sf_full_list2, z = which2,
                            SIMPLIFY = FALSE)
            tmpfct <- function(x) {
                force(args_final)
                args_final1 <- utils::modifyList(args_final, x)
                do.call(semlbci::ci_i_one, args_final1)
              }
            out_raw2 <- pbapply::pblapply(plist,
                                          tmpfct,
                                          cl = cl)
          } else {
            # No progress bar
            args_final <- utils::modifyList(list(...),
                                    list(npar = npar,
                                        sem_out = sem_out,
                                        standardized = standardized,
                                        debug = FALSE,
                                        f_constr = f_constr,
                                        method = method,
                                        ciperc = ciperc,
                                        robust = robust,
                                        try_k_more_times = try_k_more_times,
                                        sem_out_name = sem_out_name))
            pars2 <- rep(pars, each = 2)
            sf_full_list2 <- rep(sf_full_list, each = 2)
            which2 <- rep(c("lbound", "ubound"), times = length(pars))
            out_raw2 <- parallel::clusterMap(cl,
                              semlbci::ci_i_one,
                              i = pars2,
                              sf_full = sf_full_list2,
                              which = which2,
                              MoreArgs = args_final,
                              SIMPLIFY = FALSE,
                              .scheduling = ifelse(loadbalancing,
                                                   yes = "dynamic",
                                                   no = "static"))
          }
        parallel::stopCluster(cl)
      } else {
        # Serial
        # Progress bar not support when ran in serial.
        args_final <- utils::modifyList(list(...),
                                list(npar = npar,
                                    sem_out = sem_out,
                                    standardized = standardized,
                                    debug = FALSE,
                                    f_constr = f_constr,
                                    method = method,
                                    ciperc = ciperc,
                                    robust = robust,
                                    try_k_more_times = try_k_more_times,
                                    sem_out_name = sem_out_name))
        pars2 <- rep(pars, each = 2)
        sf_full_list2 <- rep(sf_full_list, each = 2)
        which2 <- rep(c("lbound", "ubound"), times = length(pars))
        if (requireNamespace("pbapply", quietly = TRUE) &&
            use_pbapply) {
              out_raw2 <- pbapply::pbmapply(
                            ci_i_one,
                            i = pars2,
                            sf_full = sf_full_list2,
                            which = which2,
                            MoreArgs = args_final,
                            SIMPLIFY = FALSE)
            } else {
              out_raw2 <- mapply(
                            ci_i_one,
                            i = pars2,
                            sf_full = sf_full_list2,
                            which = which2,
                            MoreArgs = args_final,
                            SIMPLIFY = FALSE)
            }
      }
    tmpfct2 <- function(x, y) {
        out <- mapply(c, x, y, SIMPLIFY = FALSE)
        out$method <- out$method[[1]]
        out$sf_full <- out$sf_full[[1]]
        out
      }
    tmp1 <- seq(1, 2 * length(pars), 2)
    tmp2 <- tmp1 + 1
    out_raw <- mapply(tmpfct2,
                      out_raw2[tmp1],
                      out_raw2[tmp2],
                      SIMPLIFY = FALSE)
    out <- do.call(rbind, lapply(out_raw, function(x) x$bounds))
    if (is.null(semlbci_out)) {
        # Build the output
        out_p <- ptable[, c("id", "lhs", "op", "rhs", "group", "label")]
        if (standardized) {
            pstd <- lavaan::standardizedSolution(sem_out)
            if (lavaan::lavTech(sem_out, "ngroups") == 1) {
                pstd$group <- 1
              }
            pstd$group[pstd$op == ":="] <- 0
            out_p <- merge(out_p, pstd[, c("lhs", "op", "rhs", "group", "est.std")],
                    by = c("lhs", "op", "rhs", "group"), all.x = TRUE, sort = FALSE)
            out_p <- out_p[order(out_p$id), ]
          } else {
            out_p$est <- ptable[, c("est")]
          }
        out_p$lbci_lb <- NA
        out_p$lbci_ub <- NA
      } else {
        # Use existing output
        out_p <- semlbci_out
      }

    out_p[i_selected, "lbci_lb"] <- out[, 1]
    out_p[i_selected, "lbci_ub"] <- out[, 2]

    # Collect diagnostic info
    if (is.null(semlbci_out)) {
        # Create the holder
        q <- length(i_id)
        lb_diag <- as.list(rep(NA, q))
        ub_diag <- as.list(rep(NA, q))
        lb_time <- as.numeric(rep(NA, q))
        ub_time <- as.numeric(rep(NA, q))
        lb_out <- as.list(rep(NA, q))
        ub_out <- as.list(rep(NA, q))
        ci_method <- as.character(rep(NA, q))
        scaling_factor <- as.list(rep(NA, q))
      } else {
        # Retrieve from existing output
        q <- length(i_id)
        lb_diag <- attr(semlbci_out, "lb_diag")
        ub_diag <- attr(semlbci_out, "ub_diag")
        lb_time <- attr(semlbci_out, "lb_time")
        ub_time <- attr(semlbci_out, "ub_time")
        lb_out <- attr(semlbci_out, "lb_out")
        ub_out <- attr(semlbci_out, "ub_out")
        ci_method <- attr(semlbci_out, "ci_method")
        scaling_factor <- attr(semlbci_out, "scaling_factor")
      }
    # Update the output
    lb_diag[pars] <- lapply(out_raw, function(x) x$diags$lb_diag)
    ub_diag[pars] <- lapply(out_raw, function(x) x$diags$ub_diag)
    lb_time[pars] <- sapply(out_raw, function(x) x$times$lb_time)
    ub_time[pars] <- sapply(out_raw, function(x) x$times$ub_time)
    lb_out[pars] <- lapply(out_raw, function(x) x$ci_bound_i_out$lb_out)
    ub_out[pars] <- lapply(out_raw, function(x) x$ci_bound_i_out$ub_out)
    ci_method[pars] <- sapply(out_raw, function(x) x$method)
    scaling_factor[pars] <- lapply(out_raw, function(x) x$sf_full)
    p_names <- sapply(pars, i_to_name, sem_out = sem_out)
    names(lb_diag)[pars] <- p_names
    names(ub_diag)[pars] <- p_names
    names(lb_time)[pars] <- p_names
    names(ub_time)[pars] <- p_names
    names(lb_out)[pars] <- p_names
    names(ub_out)[pars] <- p_names
    names(ci_method)[pars] <- p_names
    names(scaling_factor)[pars] <- p_names

    attr(out_p, "lb_diag") <- lb_diag
    attr(out_p, "ub_diag") <- ub_diag
    attr(out_p, "lb_time") <- lb_time
    attr(out_p, "ub_time") <- ub_time
    attr(out_p, "lb_out") <- lb_out
    attr(out_p, "ub_out") <- ub_out
    attr(out_p, "ci_method") <- ci_method
    attr(out_p, "scaling_factor") <- scaling_factor
    attr(out_p, "call") <- match.call()

    # Append diagnostic info

    out_p[pars, "status_lb"] <- sapply(lb_diag[pars], function(x) x$status)
    out_p[pars, "status_ub"] <- sapply(ub_diag[pars], function(x) x$status)
    out_p[pars, "ci_org_lb"] <- sapply(lb_diag[pars], function(x) x$ci_org_limit)
    out_p[pars, "ci_org_ub"] <- sapply(ub_diag[pars], function(x) x$ci_org_limit)
    out_p[pars, "ratio_lb"] <- sapply(lb_diag[pars], function(x) x$ci_limit_ratio)
    out_p[pars, "ratio_ub"] <- sapply(ub_diag[pars], function(x) x$ci_limit_ratio)
    out_p[pars, "post_check_lb"] <-
                            sapply(lb_diag[pars], function(x) x$fit_post_check)
    out_p[pars, "post_check_ub"] <-
                            sapply(ub_diag[pars], function(x) x$fit_post_check)
    out_p[pars, "time_lb"] <- as.numeric(lb_time[pars])
    out_p[pars, "time_ub"] <- as.numeric(ub_time[pars])
    out_p[pars, "cl_lb"] <- sapply(lb_diag[pars], function(x) x$ciperc_final)
    out_p[pars, "cl_ub"] <- sapply(ub_diag[pars], function(x) x$ciperc_final)
    out_p[pars, "method"] <- ci_method[pars]
    if (robust != "none") {
        out_p[pars, "robust"] <- robust
      }

    class(out_p) <- c("semlbci", class(out_p))
    out_p
  }

#' @noRd
# Process the `pars` argument

fix_pars <- function(pars,
                     sem_out,
                     standardized,
                     include_user_pars,
                     remove_variances,
                     remove_intercepts,
                     pars_lbci_yes,
                     i_id,
                     i_id_free,
                     i_id_user) {
    if (!is.null(pars)) {
        # Parameters supplied
        if (is.character(pars)) {
            pars <- pars_op(pars, sem_out = sem_out)
            # Convert syntax to row numbers.
            pars <- syntax_to_i(pars, sem_out)
          }
        if (standardized) {
            # Always exclude variances fixed in the standardized solution
            pars <- remove_v1(pars, sem_out)
          }
        # Remove parameters already with LBCIs in semlbci_out.
        pars <- pars[!pars %in% i_id[pars_lbci_yes]]
        # Ids in the vector of free parameters
        i_selected <- i_id[pars]
      } else {
        # Start with all free parameters
        pars <- i_id_free
        if (standardized) {
            # Intercepts and means not supported for standardized solution
            remove_intercepts <- TRUE
            # Always exclude variances fixed in the standardized solution
            pars <- remove_v1(pars, sem_out)
            # Include parameters free in the standardized solution,
            # e.g., fixed loadings.
            pars <- sort(unique(c(pars, free_in_std(i_id, sem_out))))
          }
        if (include_user_pars && length(i_id_user) > 0) {
            pars <- c(pars, i_id_user)
          }
        if (remove_variances) {
            pars <- remove_variances(pars, sem_out)
          }
        if (remove_intercepts) {
            pars <- remove_intercepts(pars, sem_out)
          }
        # Remove parameters already with LBCIs in semlbci_out.
        pars <- pars[!pars %in% i_id[pars_lbci_yes]]
        # Ids in the vector of free parameters
        i_selected <- i_id[pars]
      }
    list(pars = pars,
         i_selected = i_selected)
  }