R/sim_data.R
In power4mome: Power Analysis for Moderation and Mediation

Documented in pool_sim_data print.sim_data sim_data

#' @title Simulate Datasets Based on a Model
#'
#' @description Get a model matrix and
#' effect size specification and
#' simulate a number of datasets,
#' along with other information.
#'
#' @details
#'
#' The function [sim_data()] generates
#' a list of datasets based on a population
#' model.
#'
#' # The role of `sim_data()`
#'
#' The function [sim_data()] is used by
#' the all-in-one function
#' [power4test()]. Users usually do not
#' call this function directly, though
#' developers can use this function to
#' develop other functions for power
#' analysis, or to build their own
#' workflows to do the power analysis.
#'
#' # Workflow
#'
#' The function [sim_data()] does two tasks:
#'
#' - Determine the actual population
#'  model with population values based
#'  on:
#'
#'    - A model syntax for the observed
#'      variables (for a path model)
#'      or the latent factors (for a
#'      latent variable model).
#'
#'    - A textual specification of the
#'      effect sizes of parameters.
#'
#'    - The number of indicators for
#'      each latent factor if the model
#'      is a latent variable model.
#'
#'    - The reliability of each latent
#'      factor as measured by the
#'      indicators if the model is a
#'      latent factor model.
#'
#' - Generate *nrep* simulated datasets
#'  from the population model.
#'
#' The simulated datasets can then be
#' used to fit a model, test
#' parameters, and estimate power.
#'
#' The output is usually used by
#' [fit_model()] to fit a target model,
#' by default the population model, to each
#' of the dataset.
#'
#' # Set 'number_of_indicators' and 'reliability'
#'
#' The arguments `number_of_indicators`
#' and `reliability` are used to
#' specify the number of indicators
#' (e.g., items) for each factor,
#' and the population reliability
#' coefficient of each factor,
#' if the variables in the model
#' syntax are latent variables.
#'
#' ## Single-Group Model
#'
#' If a variable in the model is to be
#' replaced by indicators in the generated
#' data, set
#' `number_of_indicators` to a named
#' numeric vector. The names are the
#' variables of variables with
#' indicators, as appeared in the
#' `model` syntax. The value of each
#' name is the number of indicators.
#'
#' The
#' argument `reliability` should then be
#' set a named numeric vector (or list,
#' see the section on multigroup models)
#' to specify the population reliability
#' coefficient ("omega") of each set of
#' indicators. The population standardized factor
#' loadings are then computed to ensure
#' that the population reliability
#' coefficient is of the target value.
#'
#' These are examples for a single group
#' model:
#'
#' \preformatted{number of indicator = c(m = 3, x = 4, y = 5)}
#'
#' The numbers of indicators for `m`,
#' `x`, and `y` are 3, 4, and 5,
#' respectively.
#'
#' \preformatted{reliability = c(m = .90, x = .80, y = .70)}
#'
#' The population reliability
#' coefficients of `m`, `x`, and `y` are
#' .90, .80, and .70, respectively.
#'
#' ## Multigroup Models
#'
#' Multigroup models are supported.
#' The number of groups is inferred
#' from `pop_es` (see the help page
#' of [ptable_pop()]), or directly
#' from `ptable`.
#'
#' For a multigroup model, the number
#' of indicators for each variable
#' must be the same across groups.
#'
#' However, the population reliability
#' coefficients can be different
#' across groups. For a multigroup model
#' of *k* groups,
#' with one or more population reliability
#' coefficients differ across groups,
#' the argument `reliability` should be
#' set to a named list. The names are
#' the variables to which the population
#' reliability coefficients are to be
#' set. The element for each name is
#' either a single value for the common
#' reliability coefficient, or a
#' numeric vector of the reliability
#' coefficient of each group.
#'
#' This is an example of `reliability`
#' for a model with 2 groups:
#'
#' \preformatted{reliability = list(x = .80, m = c(.70, .80))}
#'
#' The reliability coefficients of `x` are
#' .80 in all groups, while the
#' reliability coefficients of `m` are
#' .70 in one group and .80 in another.
#'
#' ## Equal Numbers of Indicators and/or Reliability Coefficients
#'
#' If all variables in the model has
#' the same number of indicators,
#' `number_of_indicators` can be set
#' to one single value.
#'
#' Similarly, if all sets of indicators
#' have the same population reliability
#' in all groups, `reliability` can also
#' be set to one single value.
#'
#' # Specify The Distributions of Exogenous Variables Or Error Terms Using 'x_fun'
#'
#' By default, variables and error
#' terms are generated
#' from a multivariate normal distribution.
#' If desired, users can supply the
#' function used to generate an exogenous
#' variable and error term by setting `x_fun` to
#' a named list.
#'
#' The names of the list are the variables
#' for which a user function will be used
#' to generate the data.
#'
#' Each element of the list must also
#' be a list. The first element of this
#' list, can be unnamed, is the
#' function to be used. If other arguments
#' need to be supplied, they should be
#' included as named elements of this list.
#'
#' For example:
#'
#' \preformatted{x_fun = list(x = list(power4mome::rexp_rs),
#'              w = list(power4mome::rbinary_rs,
#'                       p1 = .70)))}
#'
#' The variables `x` and `w` will be
#' generated by user-supplied functions.
#'
#' For `x`, the function is
#' `power4mome::rexp_rs`. No additional
#' argument when calling this function.
#'
#' For `w`, the function is
#' `power4mome::rbinary_rx`. The argument
#' `p1 = .70` will be passed to this
#' function when generating the values
#' of `w`.
#'
#' If a variable is an endogenous
#' variable (e.g., being predicted by
#' another variable in a model), then
#' `x_fun` is used to generate its
#' *error term*. Its implied population
#' distribution may still be different
#' from that generate by `x_fun` because
#' the distribution also depends on the
#' distribution of other variables
#' predicting it.
#'
#' These are requirements for the
#' user-functions:
#'
#' - They must return a numeric vector.
#'
#' - They mush has an argument `n` for
#'   the number of values.
#'
#' - The *population* mean and standard
#'   deviation of the generated values
#'   must be 0 and 1, respectively.
#'
#' The package `power4mome` has
#' helper functions for generating
#' values from some common nonnormal
#' distributions and then scaling them
#' to have population mean and standard
#' deviation equal to 0 and 1 (by default), respectively.
#' These are some of them:
#'
#' - [rbinary_rs()].
#'
#' - [rexp_rs()].
#'
#' - [rbeta_rs()].
#'
#' - [rlnorm_rs()].
#'
#' - [rpgnorm_rs()].
#'
#' To use `x_fun`, the variables must
#' have zero covariances with other
#' variables in the population. It is
#' possible to generate nonnormal
#' multivariate data but we believe this
#' is rarely needed when estimating
#' power *before* having the data.
#'
#' @inheritSection ptable_pop Specify the Population Model by 'model'
#'
#' @inheritSection ptable_pop Specify 'pop_es' Using Named Vectors
#'
#' @inheritSection ptable_pop Specify 'pop_es' Using a Multiline String
#'
#' @inheritSection ptable_pop Set the Values for Effect Size Labels ('es1' and 'es2')
#'
#' @seealso [power4test()]
#'
#'
#' @param nrep The number of replications
#' to generate the simulated datasets.
#' Default is 10.
#'
#' @param ptable The output of
#' [ptable_pop()], which is a
#' `ptable_pop` object, representing the
#' population model. If `NULL`, the
#' default, [ptable_pop()] will be
#' called to generate the `ptable_pop`
#' object, using arguments such as
#' `model` and `pop_es`.
#'
#' @param model The `lavaan` model
#' syntax of the population model.
#' Ignored if `ptable` is
#' specified. See [ptable_pop] on
#' how to specify this argument.
#'
#' @param pop_es The character to
#' specify population effect sizes.
#' See [ptable_pop] on
#' how to specify this argument.
#' Ignored if `ptable` is
#' specified.
#'
#' @param ... For [sim_data], parameters
#' to be passed to [ptable_pop()]. For
#' [print.sim_data()], these arguments
#' are ignored.
#'
#' @param n The sample size for each
#' dataset. Default is 100.
#'
#' @param iseed The seed for the random
#' number generator. Default is `NULL`
#' and the seed is not changed.
#'
#' @param number_of_indicators A named
#' vector to specify the number of
#' indicators for each factors. See
#' the help page on how to set this
#' argument. Default is `NULL` and all
#' variables in the model syntax are
#' observed variables.
#' See the help page on how
#' to use this argument.
#'
#' @param reliability A named vector
#' (for a single-group model) or a
#' named list of named vectors
#' (for a multigroup model)
#' to set the reliability coefficient
#' of each set of indicators. Default
#' is `NULL`.
#' See the help page on how
#' to use this argument.
#'
#' @param x_fun The function(s) used to
#' generate the exogenous variables or
#' error terms. If
#' not supplied, or set to `list()`, the
#' default, the variables are generated
#' from a multivariate normal
#' distribution. See the help page on how
#' to use this argument.
#'
#' @param e_fun The function(s) used to
#' generate the error terms of indicators,
#' if any. If
#' not supplied, or set to `list()`, the
#' default, the error terms of indicators
#' are generated
#' from a multivariate normal
#' distribution. Specify in the same
#' way as `x_fun`. Refer to the help
#' page on `x_fun` on how to use this
#' argument.
#'
#' @param process_data If not `NULL`, it
#' must be a named list with these
#' elements: `fun` (required), the function
#' to further processing the simulated
#' data, such as generating missing data using
#' functions such as [mice::ampute()]; `args` (optional), a
#' named list of arguments to be passed
#' to `fun`, except the one for the
#' source data; `sim_data_name` (required) the
#' name of the argument to receive the
#' simulated data (e.g., `data` for
#' [mice::ampute()]); `processed_data_name`
#' (optional), the name of the data frame
#' after being processed by `fun`,
#' such as the data frame
#' with missing data in the output of
#' `fun` (e.g., `"amp"` for [mice::ampute()]),
#' if omitted, the output of `fun` should
#' be the data frame with missing data.
#'
#' @param parallel If `TRUE`, parallel
#' processing will be used to simulate
#' the datasets. Default is `FALSE`.
#'
#' @param progress If `TRUE`, the progress
#' of data simulation will be displayed.
#' Default is `FALSE.
#'
#' @param ncores The number of CPU
#' cores to use if parallel processing
#' is used.
#'
#' @return
#' The function [sim_out()] returns
#' a list of the class `sim_data`,
#' with length `nrep`. Each element
#' is a `sim_data_i` object, with
#' the following major elements:
#'
#' - `ptable`: A `lavaan` parameter
#'  table of the model, with population
#'  values set in the column `start`.
#'  (It is the output of the
#'  function [ptable_pop()].)
#'
#' - `mm_out`: The population model
#'  represented by model matrices
#'  as in `lavaan`. (It is the output
#'  of the function
#'  [model_matrices_pop()].)
#'
#' - `mm_lm_out`: A list of regression
#'  model formula, one for each
#'  endogenous variable. (It is the
#'  output of the internal function
#'  `mm_lm()`.)
#'
#' - `mm_lm_dat_out`: A simulated dataset
#'  generated from the population model.
#'  (It is the output of the internal
#'  function `mm_lm_data()`).
#'
#' - `model_original`: The original model
#'  syntax (i.e., the argument `model`).
#'
#' - `model_final`: A modified model
#'  syntax if the model is a latent
#'  variable model. Indicators are added
#'  to the syntax.
#'
#' - `fit0`: The output of [lavaan::sem()]
#'  with `ptable` as the model and
#'  `do.fit` set to `FALSE`. Used for the
#'  easy retrieval of information
#'  about the model.
#'
#' @examples
#'
#' # Specify the model
#'
#' mod <-
#' "m ~ x
#'  y ~ m + x"
#'
#' # Specify the population values
#'
#' es <-
#' "
#' y ~ m: m
#' m ~ x: m
#' y ~ x: n
#' "
#'
#' # Generate the simulated datasets
#'
#' data_all <- sim_data(nrep = 5,
#'                      model = mod,
#'                      pop_es = es,
#'                      n = 100,
#'                      iseed = 1234)
#'
#' data_all
#'
#' @export
sim_data <- function(nrep = 10,
                     ptable = NULL,
                     model = NULL,
                     pop_es = NULL,
                     ...,
                     n = 100,
                     iseed = NULL,
                     number_of_indicators = NULL,
                     reliability = NULL,
                     x_fun = list(),
                     e_fun = list(),
                     process_data = NULL,
                     parallel = FALSE,
                     progress = FALSE,
                     ncores = max(1, parallel::detectCores(logical = FALSE) - 1)) {

  if (is.null(ptable)) {
    if (is.null(model) || is.null(pop_es)) {
      stop("Both model and pop_es must be set if ptable is not set.")
    }
    ptable <- ptable_pop(model = model,
                         pop_es = pop_es,
                         ...)
  }
  mm_out <- model_matrices_pop(ptable,
                               drop_list_single_group = FALSE)
  mm_lm_out <- mm_lm(mm_out,
                     drop_list_single_group = FALSE)

  fit_tmp <- try(lavaan::sem(
                    ptable,
                    do.fit = FALSE
                  ),
                silent = TRUE)
  if (inherits(fit_tmp, "try-error")) {
    all_paths <- list()
  } else {
    all_paths <- manymome::all_indirect_paths(
                      fit_tmp
                    )
  }

  out <- do_FUN(X = seq_len(nrep),
                FUN = sim_data_i,
                ptable = ptable,
                model = model,
                mm_out = mm_out,
                mm_lm_out = mm_lm_out,
                n = n,
                number_of_indicators = number_of_indicators,
                reliability = reliability,
                x_fun = x_fun,
                e_fun = e_fun,
                process_data = process_data,
                fit_external = list(all_paths = all_paths),
                iseed = iseed,
                parallel = parallel,
                progress = progress,
                ncores = ncores)
  class(out) <- c("sim_data", class(out))
  return(out)
}

#' @param digits The numbers of digits
#' displayed after the decimal.
#'
#' @param digits_descriptive The
#' number of digits displayed after
#' the decimal for the descriptive
#' statistics table.
#'
#' @param x The `sim_data` object
#' to be printed.
#'
#' @param data_long If `TRUE`, detailed
#' information will be printed.
#'
#' @param fit_to_all_args A named list
#' of arguments to be passed to
#' [lavaan::sem()] when the model is
#' fitted to a sample combined from
#' all samples stored.
#'
#' @param est_type The type of estimates
#' to be printed. Can be a character
#' vector of one to two elements. If
#' only `"standardized"`, then the
#' standardized estimates are printed.
#' If only `"unstandardized"`, then the
#' unstandardized estimates are printed.
#' If a vector like
#' `c("standardized", "unstandardized")`,
#' then both unstandardized and
#' standardized estimates are printed.
#'
#' @param variances Logical. Whether
#' variances and error variances are printed.
#' Default depends on `est_type`. If
#' `"unstandardized"` is in `est_type`,
#' then default is `TRUE` If
#' only `"standardized"` is in `est_type`,
#' then default is `FALSE`.
#'
#' @param pure_x,pure_y When Logical. When
#' printing indirect effects, whether
#' only "pure" x-variables (variables
#' not predicted by another other variables)
#' and/or "pure" y-variables (variables
#' that do not predict any other variables
#' other than indicators) will be included
#' in enumerating the paths.
#'
#' @return
#' The `print` method of `sim_data`
#' returns `x` invisibly. It is called for
#' its side effect.
#'
#' @rdname sim_data
#' @export
print.sim_data <- function(x,
                           digits = 3,
                           digits_descriptive = 2,
                           data_long = TRUE,
                           fit_to_all_args = list(),
                           est_type = "standardized",
                           variances = NULL,
                           pure_x = TRUE,
                           pure_y = TRUE,
                           ...) {

  est_type <- match.arg(est_type,
                        choices = c("standardized", "unstandardized"),
                        several.ok = TRUE)

  if (is.null(variances)) {
    if (identical("standardized", est_type)) {
      variances <- FALSE
    } else {
      variances <- TRUE
    }
  }

  # This line needed for printing boot_out and mc_out
  requireNamespace("manymome", quietly = TRUE)

  x_i <- x[[1]]
  ptable <- x_i$ptable
  fit0 <- x_i$fit0

  group_labels <- x_i$group_labels
  ngroups <- length(group_labels)
  if (is.null(ngroups)) ngroups <- 1

  ptable0 <- lavaan::parameterTable(fit0)
  ptable <- set_user_pop(ptable,
                         fit = fit0)
  ptable1 <- ptable[, c("lhs",
                        "op",
                        "rhs",
                        "group",
                        "block",
                        "exo",
                        "label",
                        "start")]
  colnames(ptable1)[which(colnames(ptable1) == "start")] <- "Population"
  if (max(ptable1$group) == 1) {
    ptable1$group <- NULL
  }

  class(ptable1) <- c("lavaan.parameterEstimates", class(ptable1))
  if (ngroups > 1) {
    attr(ptable1, "group.label") <- group_labels
  }

  model_original <- x_i$model_original

  cat(header_str("Model Information",
                 prefix = "\n",
                 suffix = "\n"))

  cat(header_str("Model on Factors/Variables",
                 hw = .4,
                 prefix = "\n",
                 suffix = "\n"))
  cat(model_original)

  model_final <- x_i$model_final
  cat(header_str("Model on Variables/Indicators",
                 hw = .4,
                 prefix = "\n",
                 suffix = "\n"))
  cat(model_final, sep = "\n")

  cat(header_str("Population Values",
                 hw = .4,
                 prefix = "",
                 suffix = "\n"))
  print(ptable1,
        digits = digits)

  # Multigroup models automatically supported
  has_w <- FALSE
  all_ind <- tryCatch(pop_indirect(x,
                                   pure_x = pure_x,
                                   pure_y = pure_y,
                                   progress = TRUE),
                       warning = function(w) w,
                       error = function(e) e)
  # if (inherits(all_ind, "warning")) {
  #   if (grepl("moderator", all_ind$message)) {
  #     has_w <- TRUE
  #   }
  #   all_ind <- suppressWarnings(pop_indirect(
  #                                 x,
  #                                 pure_x = pure_x,
  #                                 pure_y = pure_y,
  #                                 progress = TRUE))
  # }
  if (!inherits(all_ind, "error")) {
    if (length(all_ind) > 0) {
      cat(header_str("Population Conditional/Indirect Effect(s)",
                    hw = .4,
                    prefix = "",
                    suffix = "\n"))
      for (all_ind_i in all_ind) {
        print(all_ind_i,
              digits = digits)
      }
      cat("\n")
    }
  }

  k0 <- x_i$number_of_indicators

  rel0 <- x_i$reliability

  if (!is.null(rel0[[1]])) {
    rel <- do.call(rbind,
                  rel0)
    rel <- as.data.frame(rel)
    rel_n <- nrow(rel)
    rownames(rel) <- paste0("Group ", seq_len(rel_n))
    cat(header_str("Population Reliability",
                  hw = .4,
                  prefix = "",
                  suffix = "\n\n"))
    print(rel,
          row.names = isTRUE(rel_n > 1),
          digits = digits)
  }

  if (!is.null(k0[[1]])) {
    lambda0 <- mapply(function(xx, yy) {
                        out <- mapply(lambda_from_reliability,
                                      p = xx,
                                      omega = yy)
                        out
                      },
                      xx = k0,
                      yy = rel0,
                      SIMPLIFY = FALSE)
    lambda <- do.call(rbind,
                      lambda0)
    lambda <- as.data.frame(lambda)
    lambda_n <- nrow(lambda)
    rownames(lambda) <- paste0("Group ", seq_len(lambda_n))
    cat(header_str("Population Standardized Loadings",
                  hw = .4,
                  prefix = "\n",
                  suffix = "\n\n"))
    print(lambda,
          row.names = isTRUE(lambda_n > 1),
          digits = digits)
  }

  # Summarize data

    nrep <- length(x)
    n <- nrow(x_i$mm_lm_dat_out)

    cat(header_str("Data Information",
                  prefix = "",
                  suffix = "\n\n"))

    cat("Number of Replications: ", nrep, "\n")
    cat("Sample Sizes: ", paste0(n, collapse = ", "), "\n")

  if (data_long) {
    all_data <- pool_sim_data(x)

    fit_to_all_args0 <- list(model = model_final,
                             data = all_data,
                             se = "none",
                             test = "none",
                             group = x_i$group_name,
                             fixed.x = FALSE)
    fit_to_all_args1 <- utils::modifyList(fit_to_all_args0,
                                          fit_to_all_args)
    fit_all <- do.call(lavaan::sem,
                       fit_to_all_args1)

    if (isTRUE(identical(est_type, "standardized"))) {
      # Standardized Only
      est_all <- lavaan::standardizedSolution(fit_all,
                                              se = FALSE,
                                              pvalue = FALSE,
                                              ci = FALSE,
                                              output = "text")
      if (!variances) {
        i <- est_all$lhs == est_all$rhs
        est_all <- est_all[!i, ]
      }
      tmp_est_hdr <- "Standardized Estimates"
    }

    if (isTRUE(identical(est_type, "unstandardized"))) {
      # Unstandardized Only
      est_all <- lavaan::parameterEstimates(fit_all,
                                            se = FALSE,
                                            pvalue = FALSE,
                                            ci = FALSE,
                                            standardized = FALSE,
                                            output = "text")
      if (!variances) {
        i <- est_all$lhs == est_all$rhs
        est_all <- est_all[!i, ]
      }
      tmp_est_hdr <- "Unstandardized Estimates"
    }

    if (isTRUE(all(c("unstandardized", "standardized") %in% est_type))) {
      # Both unstandardized and standardized
      est_all <- lavaan::parameterEstimates(fit_all,
                                            se = FALSE,
                                            pvalue = FALSE,
                                            ci = FALSE,
                                            standardized = TRUE,
                                            output = "text")
      if (!variances) {
        i <- est_all$lhs == est_all$rhs
        est_all <- est_all[!i, ]
      }
      tmp_est_hdr <- "Unstandardized and Standardized Estimates"
    }

    cat(header_str("Descriptive Statistics",
                  hw = .4,
                  prefix = "\n",
                  suffix = "\n\n"))

    print(psych::describe(all_data,
                          range = FALSE),
          digits = digits_descriptive)

    # Print missing data pattern

    mp <- tryCatch(miss_pattern(all_data),
                   error = function(e) e)
    if (!inherits(mp, "error")) {
      if (nrow(mp) != 1) {
        cat(header_str("Missing Data Pattern",
                          hw = .4,
                          prefix = "\n",
                          suffix = "\n\n"))
        cat("Missing data is present\n\n")
        print_miss_pattern(mp,
                           digits = max(0, digits = digits - 1))
      }
    }

    tmp <- paste("Parameter Estimates Based on All",
                nrep,
                "Samples Combined")
    cat(header_str(tmp,
                  prefix = "\n",
                  suffix = "\n\n"))

    cat("Total Sample Size:", n * nrep, "\n")

    cat(header_str(tmp_est_hdr,
                  hw = .4,
                  prefix = "\n",
                  suffix = "\n\n"))

    if (!variances) {
      cat("Variances and error variances omitted.\n")
    }

    print(est_all,
          nd = digits)
  } else {
    cat("\nCall print with 'data_long = TRUE' for further information.\n")
  }

  invisible(x)
}

#' @description
#' The function
#'
#' @param object Either a `sim_data`
#' object or a `power4test` object.
#' It extracts the simulated data
#' and return them, combined to one
#' single data frame or, if `as_list`
#' is `TRUE`, as a list of data
#' frames.
#'
#' @param as_list Logical. If `TRUE`,
#' the simulated datasets is returned as one
#' single data frame. If `FALSE`, they
#' are returned as a list of data
#' frames.
#'
#' @return
#' The function `pool_sim_data()` returns
#' either one data frame or a list
#' of data frames, depending on the
#' argument `as_list`
#'
#' @rdname sim_data
#' @export
pool_sim_data <- function(object,
                          as_list = FALSE) {
  if (!inherits(object, "power4test") &&
      !inherits(object, "sim_data")) {
    stop("pool_sim_data() only supports 'power4test' or 'sim_data' objects.")
  }
  if (inherits(object, "power4test")) {
    object <- object$sim_all
  }
  all_data <- lapply(object,
                     function(x) x$mm_lm_dat_out)
  if (!as_list) {
    all_data <- do.call(rbind,
                        all_data)
  }
  all_data
}

#' @noRd
set_user_pop <- function(ptable,
                         fit) {
  if (!(":=" %in% ptable$op)) {
    return(ptable)
  }
  user_fun <- fit@Model@def.function
  i <- ptable$free
  i <- i[i > 0]
  j <- match(i, ptable$free)
  pop <- ptable$start[j]
  user_pop <- user_fun(pop)
  k <- match(names(user_pop), ptable$lhs)
  ptable[k, "start"] <- user_pop
  ptable
}


#' @noRd
sim_data_i <- function(repid = 1,
                       n = 100,
                       model = NULL,
                       pop_es = NULL,
                       ptable = NULL,
                       mm_out = NULL,
                       mm_lm_out = NULL,
                       number_of_indicators = NULL,
                       reliability = NULL,
                       x_fun = list(),
                       e_fun = list(),
                       process_data = NULL,
                       fit_external = NULL,
                       seed = NULL,
                       drop_list_single_group = TRUE,
                       merge_groups = TRUE) {
  if (!is.null(seed)) set.seed(seed)
  if (is.null(ptable)) {
    ptable <- ptable_pop(model = model,
                        pop_es = pop_es,
                        standardized = TRUE)
  } else {
    model <- attr(ptable, "model")
  }
  if (is.null(mm_out)) {
    mm_out <- model_matrices_pop(ptable,
                                drop_list_single_group = FALSE)
  }
  if (is.null(mm_lm_out)) {
  mm_lm_out <- mm_lm(mm_out,
                     drop_list_single_group = FALSE)
  }

  ngroups <- max(ptable$group)
  if (length(n) == 1) {
    n <- rep(n, ngroups)
  }

  vnames <- lavaan::lavNames(ptable,
                             type = "ov")
  p <- length(vnames)

  if ((length(number_of_indicators) == 1) &&
      is.numeric(number_of_indicators)) {
    number_of_indicators <- rep(number_of_indicators,
                                p)
    names(number_of_indicators) <- vnames
  }
  if (!is.list(number_of_indicators)) {
    number_of_indicators <- rep(list(number_of_indicators),
                                ngroups)
  }

  if ((length(reliability) == 1) &&
      is.numeric(reliability)) {
    reliability <- rep(reliability,
                                p)
    names(reliability) <- vnames
  }
  if (!is.list(reliability)) {
    reliability <- rep(list(reliability),
                            ngroups)
  } else {
    reliability <- split_par_es(reliability)
  }
  mm_lm_dat_out <- mapply(mm_lm_data,
                          object = mm_lm_out,
                          n = n,
                          number_of_indicators = number_of_indicators,
                          reliability = reliability,
                          MoreArgs = list(keep_f_scores = FALSE,
                                          x_fun = x_fun,
                                          e_fun = e_fun,
                                          process_data = process_data),
                          SIMPLIFY = FALSE)

  model_original <- model
  # add_indicator_syntax() already supports
  # a model syntax with "x:z ~~ y:w"
  model <- add_indicator_syntax(model,
                                number_of_indicators = number_of_indicators[[1]],
                                reliability = reliability[[1]])
  if (!is.null(attr(ptable, "model_fixed"))) {
    if (utils::packageVersion("lavaan") <= "0.6.19") {
      # lavaan 0.6.20+ should support "x:w ~~ y:z"
      # TODO:
      # - Decide which ptable to store the variables are
      #   latent variables.
      attr(model, "ptable") <- ptable
    }
  }
  tmp <- ptable
  tmp$est <- tmp$start
  # fixed.x set to FALSE such that covariances are also displayed
  fit0 <- lavaan::sem(tmp,
              do.fit = FALSE,
              fixed.x = FALSE)
  if (ngroups > 1) {
    group_labels <- names(mm_out)
    group_name <- "group"
  } else {
    group_labels <- NULL
    group_name <- NULL
  }
  if (drop_list_single_group && (ngroups == 1)) {
    mm_out <- mm_out[[1]]
    mm_lm_out <- mm_lm_out[[1]]
    mm_lm_dat_out <- mm_lm_dat_out[[1]]
  }
  if (ngroups > 1) {
    for (i in group_labels) {
      mm_lm_dat_out[[i]]$group <- i
    }
  }
  if (merge_groups && (ngroups > 1)) {
    mm_lm_dat_out <- do.call(rbind,
                             mm_lm_dat_out)
    rownames(mm_lm_dat_out) <- NULL
  }
  out <- list(ptable = ptable,
              mm_out = mm_out,
              mm_lm_out = mm_lm_out,
              mm_lm_dat_out = mm_lm_dat_out,
              model_original = model_original,
              model_final = model,
              fit0 = fit0,
              group_name = group_name,
              group_labels = group_labels,
              number_of_indicators = number_of_indicators,
              reliability = reliability,
              fit_external = fit_external)
  class(out) <- c("sim_data_i", class(out))
  out
}

#' @noRd
pop_indirect <- function(x,
                         progress = TRUE,
                         pure_x = TRUE,
                         pure_y = TRUE) {
  x_i <- x[[1]]
  ptable0 <- lavaan::parameterTable(x_i$fit0)
  # Need to select a subset because
  # many_indirect_effects can be very slow
  # for a large sample
  dat_tmp <- x_i$mm_lm_dat_out
  n <- nrow(dat_tmp)
  i <- sample.int(n, 1000, replace = TRUE)
  dat_tmp <- dat_tmp[i, ]
  fit_to_all_args0 <- list(model = x_i$model_final,
                           data = dat_tmp,
                           se = "none",
                           test = "none",
                           group = x_i$group_name,
                           check.post = FALSE,
                           fixed.x = FALSE)
  fit_all <- suppressWarnings(do.call(lavaan::sem,
                      fit_to_all_args0))

  # TODO:
  # - Need a better way to find product terms
  p_terms <- lavaan::lavNames(ptable0, "ov.interaction")

  if (length(p_terms) == 0) {
    p_terms <- NULL
  }

  x_terms <- NULL
  y_terms <- NULL
  if (pure_x) {
    x_terms <- pure_x(x[[1]]$fit0)
  } else {
    x_terms <- all_x(x[[1]]$fit0)
  }
  if (pure_y) {
    y_terms <- pure_y(x[[1]]$fit0)
  } else {
    y_terms <- all_y(x[[1]]$fit0)
  }

  # Multigroup models automatically supported
  out <- list()
  if ((length(x_terms) > 0) &&
      (length(y_terms) > 0)) {
    all_paths <- manymome::all_indirect_paths(x[[1]]$fit0,
                                              exclude = p_terms,
                                              x = x_terms,
                                              y = y_terms)
    has_indirect <- isTRUE(length(all_paths) > 0)
    all_direct_paths <- get_direct(
                          x = x_terms,
                          y = y_terms,
                          ptable = ptable0
                        )
    ngroups <- lavaan::lavTech(x[[1]]$fit0, "ngroups")
    if (ngroups > 1) {

      # ==== Multigroup Models =====

      if (has_indirect) {
        # TODO:
        # - Within-group moderation not yet supported in
        #   multigroup model.
        all_ind <- manymome::many_indirect_effects(all_paths,
                                                  fit = fit_all,
                                                  est = ptable0)
        out <- c(out, list(all_ind))
      }
    } else {

      # ==== Single-group Models =====

      all_paths_combined <- c(all_paths, all_direct_paths)
      all_w <- get_w_for_paths(all_paths_combined,
                              fit = fit_all)
      no_w_i <- sapply(all_w, function(xx) length(xx) == 0)
      has_w_i <- !no_w_i
      if (any(no_w_i)) {
        if (progress) {
          cat(paste("(Computing indirect effects for",
                    sum(no_w_i),
                    "paths ...)\n\n"))
        }
        all_ind <- manymome::many_indirect_effects(all_paths_combined[no_w_i],
                                                  fit = fit_all,
                                                  est = ptable0)
        out <- c(out, list(all_ind))
      }
      if (any(has_w_i)) {
        i0 <- which(has_w_i)
        if (progress) {
          cat(paste("(Computing conditional effects for",
                    sum(has_w_i),
                    "paths ...)\n\n"))
        }
        for (i in i0) {

          # Exclude paths moderated by mediators
          if (!all(all_w[[i]] %in% x_terms)) next

          fit_tmp <- fix_w_sd_and_mean(x_i,
                                       all_w[[i]])
          cond <- do.call(manymome::cond_indirect_effects,
                        c(all_paths_combined[[i]],
                          list(wlevels = all_w[[i]],
                                fit = fit_tmp,
                                est = ptable0)))
          out <- c(out, list(cond))
        }
      }
    }

  }
  out
}

#' @noRd
fix_w_sd_and_mean <- function(x_i,
                              w) {
  # Can only be used for single-group model
  dat_tmp <- x_i$mm_lm_dat_out
  n <- nrow(dat_tmp)
  i <- sample.int(n, 1000, replace = TRUE)
  dat_tmp <- dat_tmp[i, ]
  ptable0 <- lavaan::parameterTable(x_i$fit0)
  for (w_i in w) {
    w_tmp <- scale(dat_tmp[, w_i])[, 1]
    w_var <- ptable0[(ptable0$lhs == w_i) &
                   (ptable0$rhs == w_i) &
                   (ptable0$op == "~~"), "est"]
    i1 <- (ptable0$lhs == w_i) &
          (ptable0$op == "~1")
    if (any(i1)) {
      w_m <- ptable0[i1, "est"]
    } else {
      w_m <- 0
    }
    w_tmp <- w_tmp * sqrt(w_var) + w_m
    dat_tmp[, w_i] <- w_tmp
  }
  v_ind <- grepl(":", colnames(dat_tmp), fixed = TRUE)
  dat_tmp <- dat_tmp[, !v_ind]
  fit_to_all_args0 <- list(model = x_i$model_final,
                          data = dat_tmp,
                          se = "none",
                          test = "none",
                          group = x_i$group_name,
                          check.post = FALSE,
                          fixed.x = FALSE)
  fit_all <- suppressWarnings(do.call(lavaan::sem,
                      fit_to_all_args0))
  fit_all
}