R/semEff-fun.R

Defines functions predEff getMedEffTable getTotEffTable getIndEffTable getDirEffTable getEffTable getAllInd getMedEff getTotEff getIndEff getDirEff getEff summary.semEff print.semEff semEff

Documented in getAllInd getDirEff getDirEffTable getEff getEffTable getIndEff getIndEffTable getMedEff getMedEffTable getTotEff getTotEffTable predEff print.semEff semEff summary.semEff

#' @title SEM Effects
#' @description Automatically calculate direct, indirect, total, and mediator
#'   effects for endogenous (response) variables in a 'piecewise' structural
#'   equation model (SEM).
#' @param sem A piecewise SEM, comprising a list of fitted model objects or of
#'   boot objects (containing bootstrapped model effects). Alternatively, a
#'   `"psem"` object from
#'   [`piecewiseSEM::psem()`](https://rdrr.io/cran/piecewiseSEM/man/psem.html).
#'   If list is unnamed, response variable names will be used.
#' @param predictors,mediators Names of variables for/through which to calculate
#'   effects. If `NULL` (default), all predictors/mediators in the SEM will be
#'   used.
#' @param excl.other.med Logical, whether to exclude other SEM mediators from
#'   calculating indirect effects, i.e., those not specified in the `mediators`
#'   argument. Useful for examining individual effect pathways with only the
#'   specified mediators, rather than including all paths involving them
#'   (default). Ignored if `mediators = NULL`.
#' @param use.raw Logical, whether to use 'raw' (unstandardised) effects for all
#'   calculations (if present in `sem`).
#' @param ci.conf A numeric value specifying the confidence level for confidence
#'   intervals on effects.
#' @param ci.type The type of confidence interval to return (defaults to `"bca"`
#'   – see Details). See [boot::boot.ci()] for further specification details.
#' @param digits The number of decimal places to return for numeric values (for
#'   summary tables).
#' @param bci.arg A named list of any additional arguments to [boot::boot.ci()],
#'   excepting argument `index`.
#' @param ... Arguments to [bootEff()].
#' @details The eponymous function of this package calculates all direct,
#'   indirect, total, and mediator effects for a 'piecewise' structural equation
#'   model (SEM), that is, one where parameter estimation is local rather than
#'   global (Lefcheck, 2016; Shipley, 2000, 2009). The SEM simply takes the form
#'   of a list of fitted models, or bootstrapped estimates from such models,
#'   describing hypothesised causal pathways from predictors to response
#'   ('endogenous') variables. These are either direct, or operate indirectly
#'   via other response variables ('mediators'). This list should represent a
#'   directed ('acyclic') causal model, which should be named exactly for each
#'   response variable and ordered from 'upstream' or 'causal' variables through
#'   to 'downstream' (i.e. those at the end of the pathway). If `sem` is a list
#'   of fitted models, effects will first be bootstrapped using [bootEff()]
#'   (this may take a while!).
#'
#'   Direct effects are calculated as fully standardised model coefficients for
#'   each response variable (see [stdEff()] for details), while indirect effects
#'   are the product of these direct effects operating along causal pathways in
#'   the SEM. The total effects of any given predictor on a response are then
#'   the sum of its direct and (all) its indirect effects. 'Mediator effects'
#'   are also calculated, as the sum of all indirect paths which operate through
#'   each individual mediator – useful to assess the relative importance of
#'   different mediators in affecting the response. All of these effect types
#'   can be calculated automatically for all (default) or for a specified subset
#'   of predictors and/or mediators in the SEM. As indirect, total, and mediator
#'   effects are not directly bootstrapped using the fitted models for response
#'   variables (i.e. via [bootEff()]), their equivalent 'bootstrapped' estimates
#'   are calculated instead using each bootstrapped direct effect.
#'
#'   Confidence intervals for all effects are returned in summary tables for
#'   each response (see [bootCI()]), with BC*a* intervals calculated by default
#'   using the bootstrapped estimates for each effect type (Cheung, 2009; Hayes
#'   & Scharkow, 2013; MacKinnon et al., 2004). Effects for which the confidence
#'   intervals do not contain zero are highlighted with a star (i.e.
#'   'significant' at the `ci.conf` level). Bootstrap standard errors (standard
#'   deviations of the samples) and biases (sample means minus original
#'   estimates) are also included. Correlated errors (and confidence intervals)
#'   are also returned if their bootstrapped values are present in `sem`, or if
#'   they are specified to argument `cor.err` or as part of a `"psem"` object
#'   (see [bootEff()]). These represent residual relationships among response
#'   variables, unaccounted for by the hypothesised SEM paths. Use `summary()`
#'   for effect summary tables and `print()` to return a table of variable names
#'   and associated details.
#'
#'   All calculated effects and bootstrapped effects are also returned in lists
#'   for each response variable, with all except mediator effects also including
#'   the model intercept(s) – required for prediction (these will be zero for
#'   ordinary linear models with fully standardised effects). Effects can be
#'   conveniently extracted with [getEff()] and related functions.
#' @return A list object of class `"semEff"` for which several methods and
#'   extractor functions are available. Contains:
#'   1. Summary tables of effects and confidence intervals
#'   2. All effects
#'   3. All bootstrapped effects
#'   4. All indirect effects (individual, not summed)
#' @references Cheung, M. W. L. (2009). Comparison of methods for constructing
#'   confidence intervals of standardized indirect effects. *Behavior Research
#'   Methods*, *41*(2), 425-438. \doi{10/fnx7xk}
#'
#'   Hayes, A. F., & Scharkow, M. (2013). The Relative Trustworthiness of
#'   Inferential Tests of the Indirect Effect in Statistical Mediation Analysis:
#'   Does Method Really Matter? *Psychological Science*, *24*(10), 1918-1927.
#'   \doi{10/bbhr}
#'
#'   Lefcheck, J. S. (2016). piecewiseSEM: Piecewise structural equation
#'   modelling in `R` for ecology, evolution, and systematics. *Methods in
#'   Ecology and Evolution*, *7*(5), 573-579. \doi{10/f8s8rb}
#'
#'   MacKinnon, D. P., Lockwood, C. M., & Williams, J. (2004). Confidence Limits
#'   for the Indirect Effect: Distribution of the Product and Resampling
#'   Methods. *Multivariate Behavioral Research*, *39*(1), 99. \doi{10/chqcnx}
#'
#'   Shipley, B. (2000). A New Inferential Test for Path Models Based on
#'   Directed Acyclic Graphs. *Structural Equation Modeling: A Multidisciplinary
#'   Journal*, *7*(2), 206-218. \doi{10/cqm32d}
#'
#'   Shipley, B. (2009). Confirmatory path analysis in a generalized multilevel
#'   context. *Ecology*, *90*(2), 363-368. \doi{10/bqd43d}
#' @examples
#' # SEM effects
#' (shipley.sem.eff <- semEff(shipley.sem.boot))
#' summary(shipley.sem.eff)
#'
#' # Effects for selected variables
#' summary(shipley.sem.eff, response = "Live")
#' # summary(semEff(shipley.sem.boot, predictor = "lat"))
#' # summary(semEff(shipley.sem.boot, mediator = "DD"))
#'
#' # Effects calculated using original SEM (models)
#' # (not typically recommended – better to use saved boot objects)
#' # system.time(
#' #  shipley.sem.eff <- semEff(shipley.sem, R = 1000, seed = 13,
#' #                            ran.eff = "site")
#' # )
#' @export
semEff <- function(sem, predictors = NULL, mediators = NULL, 
                   excl.other.med = FALSE, use.raw = FALSE, ci.conf = 0.95, 
                   ci.type = "bca", digits = 3, bci.arg = NULL, ...) {
  
  p <- predictors; m <- mediators
  
  
  # Prep
  
  # Bootstrap SEM (if necessary)
  is.B <- all(sapply(sem, isBoot))
  if (!is.B) sem <- bootEff(sem, ...)
  if (!isList(sem)) sem <- list(sem)
  
  # Use raw (unstandardised) effects (if present)?
  if (use.raw) {
    sem <- lapply(sem, function(i) {
      r <- isRaw(names(i$t0))
      if (any(r)) {
        i$t0[!r] <- i$t0[r]
        i$t[, !r] <- i$t[, r]
      }
      return(i)
    })
  }
  
  # Function to (recursively) replace parts of names/colnames in object/list
  subNam <- function(p, r, x, ...) {
    
    # Function
    subNam <- function(x) {
      if (is.character(x)) {
        x <- gsub(p, r, x, ...)
      } else {
        if (!is.null(names(x))) {
          names(x) <- gsub(p, r, names(x), ...)
        } else if (is.matrix(x)) {
          colnames(x) <- gsub(p, r, colnames(x), ...)
        }
      }
      return(x)
    }
    
    # Apply recursively
    rsubNam <- function(x, sN) {
      x <- sN(x)
      if (isList(x) || isBoot(x)) {
        for (i in 1:length(x)) {
          x[[i]] <- rsubNam(x[[i]], sN)
        }
      }
      return(x)
    }
    rsubNam(x, subNam)
    
  }
  
  # Replace any periods in variable names with underscores
  # (periods are used to separate variable names for indirect effects)
  sem <- subNam("[.]", "_", sem)
  p <- subNam("[.]", "_", p)
  m <- subNam("[.]", "_", m)
  
  # Response names
  ce <- grepl("~~", names(sem))
  r <- names(sem)[!ce]
  
  # Mediator names (default to all endogenous predictors)
  ap <- unique(unlist(lapply(sem[r], function(i) names(i$t0))))
  ap <- ap[!isInt(ap) & !isPhi(ap) & !isR2(ap) & !isRaw(ap)]
  am <- r[r %in% ap]
  m <- if (length(am) > 0) {
    if (is.null(m)) m <- am
    if (!any(m %in% am))
      stop("Mediator(s) not in SEM.")
    am[am %in% m]
  }
  
  # Predictor names (default to all predictors)
  ex <- ap[!ap %in% r]
  ex <- names(sort(sapply(ex, function(i) {
    lengths(regmatches(i, gregexpr(":", i)))
  })))
  ap <- c(ex, am)
  if (is.null(p)) p <- ap
  if (!any(p %in% ap))
    stop("Predictor(s) not in SEM.")
  p <- ap[ap %in% p]
  
  
  # Calculate all direct, total indirect, total, and mediator effects
  
  # Helper function to create data frames without modifying names
  dF <- function(...) {
    data.frame(..., check.names = FALSE, fix.empty.names = FALSE)
  }
  
  # Function to extract direct effects for each predictor
  dirEff <- function(D) {
    sapply(p, function(i) {
      D <- if (is.matrix(D)) D[, colnames(D) == i] else unname(D[names(D) == i])
      if (length(D) > 0) D else 0
    }, simplify = FALSE)
  }
  
  # Function to calculate all indirect effects for each predictor
  indEff <- function(D) {
    
    # Function to extract effect names
    effNam <- function(x) {
      en <- rMapply(function(i) {
        if (is.matrix(i)) colnames(i) else names(i)
      }, x)
      unique(unlist(en))
    }
    
    # Function to multiply effects to calculate indirect effects
    # (for each endogenous predictor on a response, multiply its effect by all
    # direct effects on that predictor)
    multEff <- function(x) {
      
      # Function
      multEff <- function(x) {
        if (is.matrix(x)) {
          x <- x[, colnames(x) %in% am, drop = FALSE]
          Map(function(i, j) {
            eb <- sem[[j]]$t
            i * eb[, colnames(eb) %in% ap, drop = FALSE]
          }, data.frame(x), colnames(x))
        } else {
          x <- x[names(x) %in% am]
          Map(function(i, j) {
            e <- sem[[j]]$t0
            i * e[names(e) %in% ap]
          }, x, names(x))
        }
      }
      
      # Apply recursively
      rMapply(multEff, x, SIMPLIFY = FALSE)
      
    }
    
    # Function to collapse a nested list of vectors/matrices into a single
    # vector/list of vectors
    unlist2 <- function(x) {
      if (all(rapply(x, is.matrix))) {
        x <- rapply(x, function(i) as.list(dF(i)), how = "list")
        lapply(rapply(x, enquote), eval)
      } else unlist(x)
    }
    
    # Calculate indirect effects
    # (start with last response variable and move backwards through others)
    lapply(D, function(i) {
      en <- effNam(i)
      if (any(am %in% en)) {
        I <- list()
        I[[1]] <- multEff(i)
        for (j in 1:length(i)) {
          en <- effNam(I[j])
          I[[j + 1]] <- if (any(am %in% en)) multEff(I[j])
        }
        unlist2(I)
      } else NA
    })
    
  }
  
  # Function to subset indirect effects for specified predictors/mediators
  subVar <- function(I, p, m) {
    pt <- function(p, m) {
      paste(
        sapply(p, function(i) {
          sapply(m, function(j) {
            paste(
              paste0("^", j, "[.]", i, "$")
              , paste0("^", j, "[.].*[.]", i, "$")
              , paste0("[.]", j, "[.]", i, "$")
              , paste0("[.]", j, "[.].*[.]", i, "$")
              , sep = "|"
            )
          })
        })
        , collapse = "|"
      )
    }
    s <- grepl(pt(p, m), names(I))
    om <- am[!am %in% m]
    if (excl.other.med && length(om) > 0) {
      s <- s & !grepl(pt(p, om), names(I))
    }
    I <- I[s]
    return(I)
  }
  
  # Function to sum all indirect effects for each predictor
  totInd <- function(I) {
    sapply(p, function(i) {
      I <- subVar(I, i, m)
      if (length(I) > 0) Reduce("+", I) else 0
    }, simplify = FALSE)
  }
  
  # Function to calculate total effects (direct + total indirect)
  totEff <- function(D, I) {
    rMapply(function(i, j) i + j, D, I, SIMPLIFY = FALSE)
  }
  
  # Function to sum all indirect effects through each mediator
  totIndM <- function(I) {
    if (length(m) > 0) {
      sapply(m, function(i) {
        I <- subVar(I, p, i)
        if (length(I) > 0) Reduce("+", I) else 0
      }, simplify = FALSE)
    } else 0
  }
  
  # Calculate and compile all effects for original estimates
  D <- lapply(sem[r], "[[", 1)
  I <- indEff(D)
  ED <- lapply(D, function(i) list("Direct" = dirEff(i)))
  EI <- lapply(I, function(i) list("Indirect" = totInd(i)))
  ET <- lapply(totEff(ED, EI), function(i) setNames(i, "Total"))
  EM <- lapply(I, function(i) list("Mediators" = totIndM(i)))
  E <- Map(c, ED, EI, ET, EM)
  
  # Calculate and compile all effects for bootstrapped estimates
  DB <- lapply(sem[r], "[[", 2)
  IB <- indEff(DB)
  EDB <- lapply(DB, function(i) list("Direct" = dirEff(i)))
  EIB <- lapply(IB, function(i) list("Indirect" = totInd(i)))
  ETB <- lapply(totEff(EDB, EIB), function(i) setNames(i, "Total"))
  EMB <- lapply(IB, function(i) list("Mediators" = totIndM(i)))
  EB <- Map(c, EDB, EIB, ETB, EMB)
  
  
  # Calculate bootstrapped confidence intervals for all effects
  
  # Function to calculate CIs (& bias/standard errors)
  bootCI2 <- function(e, eb, r) {
    
    # Boot object for response var. (replace estimates)
    B <- sem[[r]]
    B$t0 <- e
    B$t <- matrix(eb)
    
    # Change default CI type for parametric bootstrapping
    if (B$sim == "parametric" && ci.type[1] == "bca") {
      message("Percentile confidence intervals used for parametric bootstrap samples.")
      ci.type <- "perc"
    }
    
    # Calculate CIs (& bias/standard errors)
    if (!is.na(e)) {
      if (e != 0) {
        bi <- mean(eb, na.rm = TRUE) - e
        se <- sd(eb, na.rm = TRUE)
        ci <- suppressWarnings(
          do.call(
            boot::boot.ci,
            c(list(B, ci.conf, ci.type), bci.arg)
          )
        )
        ci <- tail(as.vector(ci[[4]]), 2)
        c(bi, se, ci)
      } else rep(0, 4)
    } else rep(NA, 4)
    
  }
  
  # Calculate CIs and append to effects
  CI <- rMapply(bootCI2, E, EB, r, SIMPLIFY = FALSE)
  ECI <- rMapply(c, E, CI, SIMPLIFY = FALSE)
  
  
  # Compile and output effects
  
  # Helper function to add a top border to a data frame
  tB <- function(d) {
    b <- mapply(function(i, j) {
      n1 <- nchar(j)
      n2 <- max(sapply(i, nchar), n1, 3)
      b <- if (n1 > 1) rep("-", n2) else ""
      paste(b, collapse = "")
    }, d, names(d))
    rbind(b, d)
  }
  
  # Extract all effects/bootstrapped effects
  # (remove zeros, add intercepts)
  extEff <- function(E) {
    sapply(r, function(i) {
      sapply(names(E[[i]]), function(j) {
        e <- E[[i]][[j]]
        is.B <- any(sapply(e, length) > 1)
        e <- if (is.B) {
          e[sapply(e, function(k) length(k) > 1)]
        } else {
          unlist(e)[unlist(e) != 0]
        }
        if (length(e) > 0) {
          if (is.B) e <- as.matrix(dF(e))
          if (j != "Mediators") {
            B <- sem[[i]]
            if (is.B) {
              I <- B$t[, isInt(colnames(B$t)), drop = FALSE]
              eb <- cbind(I, e)
              a <- attributes(B$t)[c("sim", "seed", "n")]
              attributes(eb)[names(a)] <- a
              eb
            } else {
              I <- B$t0[isInt(names(B$t0))]
              c(I, e)
            }
          } else e
        } else NA
      }, simplify = FALSE)
    }, simplify = FALSE)
  }
  e <- extEff(E)
  eb <- extEff(EB)
  
  # Extract all indirect effects (individual)
  ei <- lapply(I, function(i) {
    i <- subVar(i, p, m)
    if (length(i) > 0) i else NA
  })

  # Compile table of effects/CIs
  et <- do.call(rbind, lapply(names(ECI), function(i) {
    do.call(rbind, lapply(names(ECI[[i]]), function(j) {
      do.call(rbind, lapply(names(ECI[[i]][[j]]), function(k) {
        l <- ECI[[i]][[j]][[k]]
        if (l[1] != 0) dF(i, j, k, t(l))
      }))
    }))
  }))
  names(et) <- c("response", "effect_type", "predictor", 
                 "effect", "bias", "std_err", "lower_ci", "upper_ci")
  et$response <- subNam("_", ".", et$response)
  et$predictor <- subNam("_", ".", et$predictor)
  et$effect_type <- tolower(et$effect_type)
  attr(et, "ci.conf") <- ci.conf
  attr(et, "ci.type") <- ci.type
  
  # Compile summary tables of effects/CIs (formatted)
  s <- lapply(ECI, function(i) {
    
    # List of summary tables
    s <- lapply(names(i), function(j) {
      
      # Combine effects and CIs into table
      s <- dF(t(dF(i[[j]])))
      names(s) <- c("Effect", "Bias", "Std. Err.", "Lower CI", "Upper CI")
      s <- s[s[, 1] != 0, ]
      s <- round(s, digits)
      if (nrow(s) < 1) rownames(s) <- s[1, ] <- "-"
      
      # Add significance stars
      stars <- apply(s, 1, function(k) {
        if (is.numeric(k)) {
          k <- c(k[1], tail(k, 2))
          if (all(k > 0) || all(k < 0)) "*" else ""
        } else ""
      })
      s <- dF(s, " " = stars)
      
      # Format table (add title columns, borders, top space)
      s <- format(s, nsmall = digits)
      s <- dF(" " = "", " " = rownames(s), 
              "|", s[1], "|", s[2], "|", s[3], "|", s[4:5], "|", s[6])
      s[1, 1] <- toupper(paste0(j, ":"))
      b <- c("", "", rep(c("|", ""), 4), "", "|", "")
      rbind(b, s)
      
    })
    
    # Combine into one and format (add borders, text alignment, etc.)
    s <- tB(do.call(rbind, s))[-2, ]
    s[, 2] <- subNam("_", ".", s[, 2])
    s[1:2] <- format(s[1:2], justify = "left")
    rownames(s) <- 1:nrow(s)
    
    # Set attributes and output
    class(s) <- c("semEff", class(s))
    attr(s, "ci.conf") <- ci.conf
    attr(s, "ci.type") <- ci.type
    s
    
  })
  
  # Add summary table of correlated errors (formatted)
  if (any(ce)) {
    CE <- bootCI(sem[ce], ci.conf, ci.type, digits, bci.arg)
    if (length(ce) > 1) {
      CE <- c(CE[1], lapply(CE[-1], "[", 2,))
      CE <- do.call(rbind, CE)
      CE[1] <- format(CE[1], justify = "left")
      rownames(CE) <- 1:nrow(CE)
    }
    CE[, 1] <- subNam("_", ".", CE[, 1])
    class(CE) <- c("semEff", class(CE))
    s <- c(s, list("Correlated Errors" = CE))
  }

  # Add summary table of variables (formatted)
  v <- c(ex[ex %in% p], r)
  y <- "x"; n <- "-"
  v <- dF(
    " " = subNam("_", ".", v),
    "|",
    "Category" = ifelse(v %in% ex, "Exog.", "Endog."),
    "|",
    "Predictor" = ifelse(v %in% p, y, n),
    "Mediator" = ifelse(v %in% m, y, n),
    "Response" = sapply(v, function(i) {
      if (sum(E[[i]]$Total != 0)) y else n
    }),
    "|",
    "Dir. Eff." = sapply(v, function(i) {
      if (!i %in% ex) sum(E[[i]]$Direct != 0) else "-"
    }),
    "Ind. Eff." = sapply(v, function(i) {
      if (!i %in% ex) length(na.omit(ei[[i]])) else "-"
    }),
    "|"
  )
  if (any(ce)) {
    v <- dF(
      v,
      "Cor. Err." = sapply(v[, 1], function(i) {
        if (!i %in% ex) {
          cv <- unlist(lapply(CE[, 1], function(j) {
            gsub(" ", "", unlist(strsplit(j, "~~")))
          }))
          sum(cv == i)
        } else "-"
      }),
      "|"
    )
  }
  v <- tB(v)
  v[c(1:3)] <- format(v[c(1:3)], justify = "left")
  v[c(5:7)] <- format(v[c(5:7)], justify = "centre")
  rownames(v) <- 1:nrow(v)
  class(v) <- c("semEff", class(v))
  s <- c(list("Variables" = v), s)
  
  # Reinstate periods to variable names
  e <- subNam("_", ".", e)
  eb <- subNam("_", ".", eb)
  names(ei) <- subNam("_", ".", names(ei))
  names(s) <- subNam("_", ".", names(s))
  
  # Output effects
  e <- list(
    "Summary" = s, 
    "Effects" = list(
      "Table" = et, 
      "Original" = e, 
      "Bootstrapped" = eb, 
      "All Indirect" = ei
    )
  )
  class(e) <- c("semEff", class(e))
  e
  
  
}


#' @title Print `"semEff"` Objects
#' @description A [print()] method for an object of class `"semEff"`.
#' @param x An object of class `"semEff"`.
#' @param ... Further arguments passed to or from other methods. Not currently
#'   used.
#' @details This print method returns a summary table for the SEM variables,
#'   giving their status as exogenous or endogenous and as predictor, mediator
#'   and/or response. It also gives the number of direct vs. indirect paths
#'   leading to each variable, and the number of correlated errors (if
#'   applicable).
#' @return A summary table for the SEM variables (data frame).
# S3 method for class 'semEff'
#' @export
print.semEff <- function(x, ...) {

  if ("list" %in% class(x)) {

    # SEM variable details
    v <- x$Summary$Variables
    ct <- v$Category
    di <- v[c("Dir. Eff.", "Ind. Eff.")]
    di <- suppressWarnings(
      na.omit(apply(di, 2, as.numeric))
    )
    n1 <- paste(sum(grepl("^Ex", ct)))
    n2 <- paste(sum(grepl("^En", ct)))
    n3 <- paste(sum(di[, 1]))
    n4 <- paste(sum(di[, 2]))

    # Correlated errors?
    ce <- "Cor. Err."
    ce <- if (ce %in% names(v)) {
      n5 <- suppressWarnings(
        sum(na.omit(as.numeric(v[, ce]))) / 2
      )
      paste0("  * ", n5, " correlated error(s)\n")
    }

    # Print variable table
    message("\nPiecewise SEM with:\n  * ", n1, " exogenous vs. ", n2,
            " endogenous variable(s)\n  * ", n3, " direct vs. ", n4,
            " indirect effect(s)\n", ce, "\nVariables:\n")
    print.data.frame(v, row.names = FALSE)
    message("\nUse summary() for effects and confidence intervals for endogenous variables.\nSee ?getEff() for extracting (unformatted) effects.\n")

  }
  else print.data.frame(x, row.names = FALSE)

}


#' @title Summarise SEM Effects
#' @description A [summary()] method for an object of class `"semEff"`.
#' @param object An object of class `"semEff"`.
#' @param responses An optional character vector, the names of one or more SEM
#'   response variables for which to return summaries (and/or `"Correlated
#'   Errors"`, where applicable). Can also be a numeric vector of indices of
#'   `object`. If `NULL` (default), all summaries are returned.
#' @param ... Further arguments passed to or from other methods. Not currently
#'   used.
#' @details This summary method prints tables of effects and confidence
#'   intervals for SEM endogenous (response) variables.
#' @return A summary table or tables of effects for the endogenous variables
#'   (data frames).
# S3 method for class 'semEff'
#' @export
summary.semEff <- function(object, responses = NULL, ...) {

  # SEM response names
  s <- object$Summary[-1]
  r <- names(s)
  r2 <- responses
  if (is.null(r2)) r2 <- r
  if (is.numeric(r2)) r2 <- r[r2]
  if (!any(r2 %in% r))
    stop("Response(s) not in SEM.")

  # Print summary tables
  ce <- "Correlated Errors"
  if (length(r2[r2 != ce]) > 0) {
    message("\nSEM direct, summed indirect, total, and mediator effects:")
  }
  r <- r[r != ce]
  for (i in r2) {
    n <- which(r == i)
    ii <- if (length(n) > 0) {
      paste0("Response '", i, "' (", n, "/", length(r), ")")
    } else i
    message("\n", ii, ":\n")
    print(s[[i]])
    cat("\n")
  }

}


#' @title Get SEM Effects
#' @description Extract SEM effects from an object of class `"semEff"`.
#' @param eff An object of class `"semEff"`.
#' @param responses An optional character vector, the names of one or more SEM
#'   response variables for which to return effects. Can also be a numeric
#'   vector of indices of `eff`. If `NULL` (default), all effects are returned.
#' @param type The type of effects to return. Can be `"orig"` (original
#'   estimates - default) or `"boot"` (for bootstrapped).
#' @param ... Arguments (above) to be passed to `getEff()` from the other
#'   extractor functions. `type = "boot"` is not available for `getAllInd()` or
#'   `getEffTable()` (and derivatives).
#' @details These are simple extractor functions for effects calculated using
#'   [semEff()], intended for convenience (e.g. for use with [predEff()]).
#' @return A list containing the original or bootstrapped effects for each
#'   response variable as numeric vectors or matrices (respectively), or a table
#'   of (unformatted) effects and confidence intervals (for `getEffTable()`).
#' @name getEff
NULL
#' @describeIn getEff Get effects.
#' @export
getEff <- function(eff, responses = NULL, type = c("orig", "boot")) {

  e <- eff; r <- responses; type <- match.arg(type)
  if (class(e)[1] != "semEff")
    stop("Object is not of class 'semEff'")

  # Extract effects
  e <- e$Effects
  e <- if (type == "boot") e$Bootstrapped else e$Original
  
  # Subset responses and return effects
  if (!is.null(r)) {
    is.df <- is.data.frame(e)
    en <- if (is.df) unique(e$response) else names(e)
    if (is.numeric(r)) r <- en[r]
    if (!any(r %in% en))
      stop("Response(s) not in SEM.")
    if (is.df) e[e$response %in% r, ] else e[en %in% r]
  } else e
  
}
#' @describeIn getEff Get direct effects.
#' @export
getDirEff <- function(...) {
  e <- lapply(getEff(...), function(i) i$Direct)
  if (length(e) < 2) e[[1]] else e
}
#' @describeIn getEff Get indirect effects.
#' @export
getIndEff <- function(...) {
  e <- lapply(getEff(...), function(i) i$Indirect)
  if (length(e) < 2) e[[1]] else e
}
#' @describeIn getEff Get total effects.
#' @export
getTotEff <- function(...) {
  e <- lapply(getEff(...), function(i) i$Total)
  if (length(e) < 2) e[[1]] else e
}
#' @describeIn getEff Get mediator effects.
#' @export
getMedEff <- function(...) {
  e <- lapply(getEff(...), function(i) i$Mediators)
  if (length(e) < 2) e[[1]] else e
}
#' @describeIn getEff Get all indirect effects.
#' @export
getAllInd <- function(eff, ...) {
  eff$Effects$Original <- eff$Effects$`All Indirect`
  e <- getEff(eff, type = "orig", ...)
  if (length(e) < 2) e[[1]] else e
}
#' @describeIn getEff Get effects table.
#' @export
getEffTable <- function(eff, ...) {
  eff$Effects$Original <- eff$Effects$Table
  getEff(eff, type = "orig", ...)
}
#' @describeIn getEff Get direct effects table.
#' @export
getDirEffTable <- function(...) {
  e <- getEffTable(...)
  e[e$effect_type == "direct", ]
}
#' @describeIn getEff Get indirect effects table.
#' @export
getIndEffTable <- function(...) {
  e <- getEffTable(...)
  e[e$effect_type == "indirect", ]
}
#' @describeIn getEff Get total effects table.
#' @export
getTotEffTable <- function(...) {
  e <- getEffTable(...)
  e[e$effect_type == "total", ]
}
#' @describeIn getEff Get mediator effects table.
#' @export
getMedEffTable <- function(...) {
  e <- getEffTable(...)
  e[e$effect_type == "mediators", ]
}


#' @title Predict Effects
#' @description Generate predicted values for SEM direct, indirect, or total
#'   effects.
#' @param mod A fitted model object, or a list or nested list of such objects.
#' @param newdata An optional data frame of new values to predict, which should
#'   contain all the variables named in `effects` or all those used to fit
#'   `mod`.
#' @param effects A numeric vector of effects to predict, or a list or nested
#'   list of such vectors. These will typically have been calculated using
#'   [semEff()], [bootEff()], or [stdEff()]. Alternatively, a boot object
#'   produced by [bootEff()] can be supplied.
#' @param eff.boot A matrix of bootstrapped effects used to calculate confidence
#'   intervals for predictions, or a list or nested list of such matrices. These
#'   will have been calculated using [semEff()] or [bootEff()].
#' @param re.form For mixed models of class `"merMod"`, the formula for random
#'   effects to condition on when predicting effects. Defaults to `NA`, meaning
#'   random effects are averaged over. See
#'   [`lme4:::predict.merMod()`](https://rdrr.io/cran/lme4/man/predict.merMod.html)
#'   for further specification details.
#' @param type The type of prediction to return (for GLMs). Can be either
#'   `"link"` (default) or `"response"`.
#' @param interaction An optional name of an interactive effect, for which to
#'   return standardised effects for a 'main' continuous variable across
#'   different values or levels of interacting variables (see Details).
#' @param use.raw Logical, whether to use raw (unstandardised) effects for all
#'   calculations (if present).
#' @param ci.conf A numeric value specifying the confidence level for confidence
#'   intervals on predictions (and any interactive effects).
#' @param ci.type The type of confidence interval to return (defaults to `"bca"`
#'   – see Details). See [boot::boot.ci()] for further specification details.
#' @param digits The number of significant digits to return for interactive
#'   effects.
#' @param bci.arg A named list of any additional arguments to [boot::boot.ci()],
#'   excepting argument `index`.
#' @param parallel The type of parallel processing to use for calculating
#'   confidence intervals on predictions. Can be one of `"snow"`, `"multicore"`,
#'   or `"no"` (for none – the default).
#' @param ncpus Number of system cores to use for parallel processing. If `NULL`
#'   (default), all available cores are used.
#' @param cl Optional cluster to use if `parallel = "snow"`. If `NULL`
#'   (default), a local cluster is created using the specified number of cores.
#' @param ... Arguments to [stdEff()].
#' @details Generate predicted values for SEM direct, indirect, or total effects
#'   on a response variable, which should be supplied to `effects`. These are
#'   used in place of model coefficients in the standard prediction formula,
#'   with values for predictors drawn either from the data used to fit the
#'   original model(s) (`mod`) or from `newdata`. It is assumed that effects are
#'   fully standardised; however, if this is not the case, then the same
#'   centring and scaling options originally specified to [stdEff()] should be
#'   re-specified – which will then be used to standardise the data. If no
#'   effects are supplied, standardised (direct) effects will be calculated from
#'   the model and used to generate predictions. These predictions will equal
#'   the model(s) fitted values if `newdata = NULL`, `unique.eff = FALSE`, and
#'   `re.form = NULL` (where applicable).
#'
#'   Model-averaged predictions can be generated if averaged `effects` are
#'   supplied to the model in `mod`, or, alternatively, if `weights` are
#'   specified (passed to [stdEff()]) and `mod` is a list of candidate models
#'   (`effects` can also be passed using this latter method). For mixed model
#'   predictions where random effects are included (e.g. `re.form = NULL`), the
#'   latter approach should be used, otherwise the contribution of random
#'   effects will be taken from the single model instead of (correctly) being
#'   averaged over a candidate set.
#'
#'   If bootstrapped effects are supplied to `eff.boot` (or to `effects`, as
#'   part of a boot object), bootstrapped predictions are calculated by
#'   predicting from each effect. Confidence intervals can then be returned via
#'   [bootCI()], for which the `type` should be appropriate for the original
#'   form of bootstrap sampling (defaults to `"bca"`). If the number of
#'   observations to predict is very large, parallel processing (via
#'   [pSapply()]) may speed up the calculation of intervals.
#'
#'   Predictions are always returned in the original (typically unstandardised)
#'   units of the (link-transformed) response variable. For GLMs, they can be
#'   returned in the response scale if `type = "response"`.
#'
#'   Additionally, if the name of an interactive effect is supplied to
#'   `interaction`, standardised effects (and confidence intervals) can be
#'   returned for effects of a continuous 'main' variable across different
#'   values or levels of interacting variable(s). The name should be of the form
#'   `"x1:x2..."`, containing all the variables involved and matching the name
#'   of an interactive effect in the model(s) terms or in `effects`. The values
#'   for all variables should be supplied in `newdata`, with the main continuous
#'   variable being automatically identified as that having the most unique
#'   values.
#' @return A numeric vector of the predictions, or, if bootstrapped effects are
#'   supplied, a list containing the predictions and the upper and lower
#'   confidence intervals. Optional interactive effects may also be appended.
#'   Predictions may also be returned in a list or nested list, depending on the
#'   structure of `mod` (and other arguments).
#' @seealso [predict()]
#' @examples
#' # Predict effects (direct, total)
#' m <- shipley.sem
#' e <- shipley.sem.eff
#' dir <- getDirEff(e)
#' tot <- getTotEff(e)
#' f.dir <- predEff(m, effects = dir, type = "response")
#' f.tot <- predEff(m, effects = tot, type = "response")
#' f.dir$Live[1:10]
#' f.tot$Live[1:10]
#'
#' # Using new data for predictors
#' d <- na.omit(shipley)
#' xn <- c("lat", "DD", "Date", "Growth")
#' seq100 <- function(x) seq(min(x), max(x), length = 100)
#' nd <- data.frame(sapply(d[xn], seq100))
#' f.dir <- predEff(m, nd, dir, type = "response")
#' f.tot <- predEff(m, nd, tot, type = "response")
#' f.dir$Live[1:10]
#' f.tot$Live[1:10]
#' # Add CIs
#' # dir.b <- getDirEff(e, "boot")
#' # tot.b <- getTotEff(e, "boot")
#' # f.dir <- predEff(m, nd, dir, dir.b, type = "response")
#' # f.tot <- predEff(m, nd, tot, tot.b, type = "response")
#'
#' # Predict an interactive effect (e.g. Live ~ Growth * DD)
#' xn <- c("Growth", "DD")
#' d[xn] <- scale(d[xn])  # scale predictors (improves fit)
#' m <- lme4::glmer(Live ~ Growth * DD + (1 | site) + (1 | tree),
#'                  family = binomial, data = d)
#' nd <- with(d, expand.grid(
#'   Growth = seq100(Growth),
#'   DD = mean(DD) + c(-sd(DD), sd(DD))  # two levels for DD
#' ))
#' f <- predEff(m, nd, type = "response", interaction = "Growth:DD")
#' f$fit[1:10]
#' f$interaction
#' # Add CIs (need to bootstrap model...)
#' # system.time(B <- bootEff(m, R = 1000, ran.eff = "site"))
#' # f <- predEff(m, nd, B, type = "response", interaction = "Growth:DD")
#'
#' # Model-averaged predictions (several approaches)
#' m <- shipley.growth  # candidate models (list)
#' w <- runif(length(m), 0, 1)  # weights
#' e <- stdEff(m, w)  # averaged effects
#' f1 <- predEff(m[[1]], effects = e)  # pass avg. effects
#' f2 <- predEff(m, weights = w)  # pass weights argument
#' f3 <- avgEst(predEff(m), w)  # use avgEst function
#' stopifnot(all.equal(f1, f2))
#' stopifnot(all.equal(f2, f3))
#'
#' # Compare model fitted values: predEff() vs. fitted()
#' m <- shipley.sem$Live
#' f1 <- predEff(m, unique.eff = FALSE, re.form = NULL, type = "response")
#' f2 <- fitted(m)
#' stopifnot(all.equal(f1, f2))
#'
#' # Compare predictions using standardised vs. raw effects (same)
#' f1 <- predEff(m)
#' f2 <- predEff(m, use.raw = TRUE)
#' stopifnot(all.equal(f1, f2))
#' @export
predEff <- function(mod, newdata = NULL, effects = NULL, eff.boot = NULL,
                    re.form = NA, type = c("link", "response"),
                    interaction = NULL, use.raw = FALSE, ci.conf = 0.95,
                    ci.type = "bca", digits = 3, bci.arg = NULL,
                    parallel = "no", ncpus = NULL, cl = NULL, ...) {

  m <- mod; nd <- newdata; e <- effects; eb <- eff.boot; rf <- re.form;
  type <- match.arg(type); ix <- interaction; nc <- ncpus

  # Arguments to stdEff()
  a <- list(...)

  # Weights (for model averaging)
  w <- a$weights; a$weights <- NULL
  if (isList(m) && (isList(w) || is.null(w))) {
    n <- function(x) {NULL}
    N <- rMapply(n, m, SIMPLIFY = FALSE)
    if (is.null(w)) w <- N else N <- lapply(m, n)
    if (is.null(e)) e <- N
    if (is.null(eb)) eb <- N
  }

  # Refit model(s) with any supplied data
  d <- a$data; a$data <- NULL
  if (!is.null(d)) {
    upd <- function(m) eval(update(m, data = d, evaluate = FALSE))
    m <- rMapply(upd, m, SIMPLIFY = FALSE)
  }

  # Environment to look for model data
  env <- a$env; a$env <- NULL
  if (!is.null(d)) env <- environment()

  # Effect standardisation options (for back-transforming predictions)
  a$cen.x <- !(isFALSE(a$cen.x) || use.raw)
  a$cen.y <- !(isFALSE(a$cen.y) || use.raw)
  a$std.x <- !(isFALSE(a$std.x) || use.raw)
  a$std.y <- !(isFALSE(a$std.y) || use.raw)
  a$incl.raw <- FALSE

  # Function
  predEff <- function(m, w, e, eb) {

    # Effects
    if (is.null(e)) e <- do.call(stdEff, c(list(m, w, env = env), a))
    if (isBoot(e)) {eb <- e$t; e <- e$t0}
    en <- names(e)

    # Use raw (unstandardised) effects (if present)?
    r <- isRaw(en)
    if (any(r) & use.raw) {
      e[!r] <- e[r]
      if (!is.null(eb)) eb[, !r] <- eb[, r]
    }

    # Subset only relevant parameters
    e <- na.omit(e[!isPhi(en) & !isR2(en) & !r])
    en <- names(e)
    EN <- sapply(en, function(i) {
      unlist(strsplit(i, "(?<!:):(?!:)", perl = TRUE))
    })

    # Data used to fit model(s)
    d <- getData(m, subset = TRUE, merge = TRUE, env = env)
    obs <- rownames(d)

    # Extract the first model, if list
    # (model type and specification should be consistent)
    m1 <- if (isList(m)) m[[1]] else m

    # Model error family/link functions
    f <- getFamily(m1)
    lF <- f$linkfun
    lI <- f$linkinv

    # Random effects
    is.re <- isMer(m1) && !identical(rf, NA) && !identical(rf, ~ 0)
    re <- if (is.re) {
      pRE <- function(x) {
        predict(x, nd, re.form = rf, random.only = TRUE)
      }
      re <- rMapply(pRE, m, SIMPLIFY = FALSE)
      if (isList(re)) re <- avgEst(re, w)
      if (is.null(nd)) re[obs] else re
    } else 0

    # Model weights
    w <- weights(m1)
    if (is.null(w)) w <- rep(1, nobs(m1))
    w <- w[w > 0 & !is.na(w)]

    # Model offset(s)
    o <- if (!is.null(nd)) {
      nd <- data.frame(nd)
      tt <- terms(m1)
      on <- attr(tt, "offset")
      if (!is.null(on)) {
        tn <- attr(tt, "variables")
        on <- sapply(on, function(i) tn[[i + 1]])
      }
      on <- c(on, getCall(m1)$offset)
      if (!is.null(on)) rowSums(sapply(on, eval, nd))
    } else {
      mf <- model.frame(m1, data = d)
      model.offset(mf)
    }
    if (is.null(o)) o <- 0

    # Predictors
    x <- getX(en, d, as.df = TRUE)
    xc <- getX(en, d, centre = TRUE, as.df = TRUE)
    x[isInx(names(x))] <- xc[isInx(names(xc))]

    # Predictor means/SDs
    xm <- sapply(x, function(i) if (a$cen.x) mean(i) else 0)
    xmw <- sapply(x, function(i) if (a$cen.x) weighted.mean(i, w) else 0)
    xs <- sapply(x, function(i) if (a$std.x) sdW(i, w) else 1)

    # Response mean/SD (link scale)
    ym <- if (a$cen.y) lF(weighted.mean(getY(m1, env = env), w)) else 0
    ys <- if (a$std.y) sdW(getY(m1, link = TRUE, env = env), w) else 1

    # Data to predict (standardise using original means/SDs)
    if (!is.null(nd)) {
      d <- nd
      obs <- rownames(d)
      x <- getX(en, d, as.df = TRUE)
      xc <- getX(en, d, centre = xm, as.df = TRUE)
      x[isInx(names(x))] <- xc[isInx(names(xc))]
    }
    x <- sweep(x, 2, xmw)
    x <- sweep(x, 2, xs, "/")
    x[isInt(en)] <- 1

    # Predictions
    f <- colSums(e * t(x))
    f <- f * ys + ym + re + o
    if (type == "response") f <- lI(f)
    names(f) <- obs

    # Add CIs (& bias/SE)
    if (!is.null(eb)) {

      # Bootstrap attributes
      sim <- attr(eb, "sim")
      seed <- attr(eb, "seed")
      n <- attr(eb, "n")
      R <- nrow(eb)

      # Change default CI type for parametric bootstrapping
      if (sim == "parametric" && ci.type[1] == "bca") {
        message("Percentile confidence intervals used for parametric bootstrap samples.")
        ci.type <- "perc"
      }

      # Bootstrapped predictions
      eb <- eb[, en, drop = FALSE]
      fb <- t(sapply(1:R, function(i) {
        ei <- na.omit(eb[i, ])
        xi <- x[names(ei)]
        fi <- colSums(ei * t(xi))
        fi * ys + ym + re + o
      }))
      if (nrow(fb) != R) fb <- t(fb)
      if (type == "response") fb <- lI(fb)

      # Bootstrap bias/standard errors
      bi <- colMeans(fb, na.rm = TRUE) - f
      se <- apply(fb, 2, sd, na.rm = TRUE)
      se[is.na(f)] <- NA

      # Create dummy boot object (for CIs)
      set.seed(seed)
      dd <- data.frame(rep(1, n))  # dummy data
      B <- list(t0 = f, t = fb, R = R, data = dd, seed = .Random.seed,
                sim = sim, stype = "i", strata = dd[, 1])
      class(B) <- "boot"
      attr(B, "boot_type") <- "boot"

      # Calculate and add CIs
      ci <- pSapply(1:length(f), function(i) {
        ci <- do.call(
          boot::boot.ci,
          c(list(B, ci.conf, ci.type, i), bci.arg)
        )
        tail(as.vector(ci[[4]]), 2)
      }, parallel, nc, cl)
      ci <- as.matrix(ci)
      colnames(ci) <- obs
      f <- list(fit = f, bias = bi, se.fit = se, ci.lower = ci[1, ],
                ci.upper = ci[2, ])

    }

    # Add interactive effects
    if (isTRUE(isInx(ix) && ix %in% en && !is.null(nd))) {

      # Names of variables involved in interaction
      # (ab = all, a = main, b = interacting, a.b = interaction(s))
      ab <- EN[[ix]]
      xi <- getX(ab, d, as.df = TRUE)
      a <- ab[which.max(sapply(xi, function(i) length(unique(i))))]
      b <- ab[!ab %in% a]
      n <- length(ab)
      a.b <- if (n > 2) {
        a.b <- unlist(lapply(2:n, function(i) {
          combn(ab, i, paste, collapse = ":")
        }))
        a.b[sapply(a.b, function(i) a %in% EN[[i]])]
      } else ix

      # Values for interacting variable(s)
      xb <- sweep(unique(xi[b]), 2, xm[b])
      xb <- lapply(EN[a.b], function(i) {
        apply(xb[i[i %in% b]], 1, prod)
      })

      # Effects
      e <- e * ys / xs
      e <- e[a] + rowSums(mapply("*", e[a.b], xb))
      e <- e * xs[a] / ys
      names(e) <- paste(ix, 1:length(e), sep = "_")

      # Add CIs
      f <- if (!is.null(eb)) {

        # Bootstrapped effects
        eb <- t(sapply(1:R, function(i) {
          ei <- eb[i, ] * ys / xs
          ei <- ei[a] + rowSums(mapply("*", ei[a.b], xb))
          ei * xs[a] / ys
        }))
        if (nrow(eb) != R) eb <- t(eb)

        # CIs
        B$t0 <- e
        B$t <- eb
        e <- bootCI(B, ci.conf, ci.type, digits, bci.arg)
        c(f, list(interactions = e))

      } else {
        list(fit = f, interactions = round(e, digits))
      }

    }

    # Output
    if (!is.null(eb)) {
      set.seed(NULL)
      attr(f, "ci.conf") <- ci.conf
      attr(f, "ci.type") <- ci.type
    }
    f

  }

  # Apply recursively
  rMapply(predEff, m, w, e, eb, SIMPLIFY = FALSE)

}
murphymv/semEff documentation built on Sept. 18, 2024, 7:40 p.m.