R/power_curve.R
In power4mome: Power Analysis for Moderation and Mediation

Documented in power_curve print.power_curve

#' @title Power Curve
#'
#' @description Estimate the relation
#' between power and a characteristic,
#' such as sample size or population
#' effect size (population value of
#' a model parameter).
#'
#' @details
#' The function [power_curve()]
#' retrieves the information
#' from the output of
#' [power4test_by_n()] or
#' [power4test_by_es()], and
#' estimate the power curve: the
#' relation between the characteristic
#' varied, sample size for
#' [power4test_by_n()] and the
#' population effect size for
#' [power4test_by_es()], and the
#' rejection rate of the test conducted
#' by [power4test_by_n()] or
#' [power4test_by_es()]. This
#' rejection rate is the power when the
#' null hypothesis is false (e.g., the
#' population value of the effect size
#' being tested is nonzero).
#'
#' The model fitted is *not* intended to
#' be a precise model for the relation
#' across a wide range. It is only a
#' crude estimate based on the limited
#' number of values of the
#' characteristic (e.g., sample size)
#' examined, which can be as small as
#' four or even smaller. The model is
#' intended to be
#' used for only for the range covered,
#' and for estimating the probable
#' sample size or effect size with a
#' desirable level of power. This value
#' should then be studied by higher
#' precision through simulation
#' using functions such as
#' [power4test()].
#'
#' These are the models to be tried,
#' in the following order:
#'
#' - One or nonlinear models, to be
#'   fitted by [stats::nls()]. If
#'   several models are specified, all
#'   will be fitted and the one with the
#'   smallest deviance will be used.
#'
#' - If all the nonlinear models failed,
#'   for whatever reason, a logistic
#'   regression will be fitted by
#'   [stats::glm()] to predict the
#'   binary significant test results.
#'
#' - If the logistic model also failed,
#'   for whatever reason, a simple
#'   linear regression model will be
#'   fitted. Although the power curve is
#'   nonlinear across a wide range of,
#'   say, sample size, a linear model
#'   can still be a good enough
#'   approximation for a narrow range of
#'   the predictor.
#'
#' The output can then be plotted to
#' visualize the power curve, using
#' the `plot` method ([plot.power_curve()])
#' for the output
#' of [power_curve()].
#'
#' This function can be used directly,
#' but is also used internally by
#' functions such as [x_from_power()].
#'
#' @return
#' It returns a list which is a
#' `power_curve` object, with the
#' following elements:
#'
#' - `fit`: The model fitted, which is the
#'    output of [stats::nls()],
#'    [stats::glm()], or [stats::lm()].
#'
#' - `reject_df`: The table of reject
#'  rates and other characteristics,
#'  which is generated by
#'  [rejection_rates()].
#'
#'  - `predictor`: The predictor or the
#'  power curve, ether `"n"` (sample
#'  size) or `"es"` (population effect
#'  size).
#'
#'  - `call`: The call used to run this
#'  function.
#'
#' @param object An object of the class
#' `power4test_by_n` or `power4test_by_es`,
#' which is the output of [power4test_by_n()]
#' or [power4test_by_es()].
#'
#' @param formula A formula of the model
#' for [stats::nls()]. It can also be
#' a list of formulas, and the models
#' will be fitted successively by
#' [stats::nls()], with the first
#' model fitted successfully adopted.
#' The response variable in the formula must be named
#' `reject`, and the predictor named
#' `x`. Whether `x` represents `n` or
#' `es` depends on the class of `object`.
#' If `NULL`, the default, it will be
#' determined internally based on the type of
#' `object`.
#'
#' @param start Either a named vector
#' of the start value(s) of parameter(s)
#' in `formula`, or a list of named
#' vectors of the starting value(s)
#' of the list of formula(s). If `NULL`,
#' the default, they will be determined
#' internally.
#'
#' @param lower_bound Either a named vector
#' of the lower bound(s) of parameter(s)
#' in `formula`, or a list of named
#' vectors of the lower bound(s)
#' for the list of formula(s). They will
#' be passed to `lower` of [stats::nls()].
#' If `NULL`, the default, it will be
#' determined internally based on the type of
#' `object`.
#'
#' @param upper_bound Either a named vector
#' of the upper bound(s) of parameter(s)
#' in `formula`, or a list of named
#' vectors of the upper bound(s)
#' for the list of formula(s). They will
#' be passed to `upper` of [stats::nls()].
#' If `NULL`, the default, it will be
#' determined internally based on the type of
#' `object`.
#'
#' @param nls_args A named list of
#' arguments to be used when calling
#' [stats::nls()]. Used to override
#' internal default, such as the
#' algorithm (default is `"port"`).
#' Use this argument with cautions.
#'
#' @param nls_control A named list of
#' arguments to be passed the `control`
#' argument of [stats::nls()]. The values will
#' override internal default values,
#' and also override `nls_args`.
#' Use this argument with cautions.
#'
#' @param verbose Logical. Whether
#' messages will be printed when
#' trying different models.
#'
#' @param models Models to try. Support
#' `"nls"` (fitted by [nls()]),
#' `"logistic"` (fitted by [glm()]), and
#' `"lm"` (fitted by [lm()]). By default,
#' all three models will be attempted,
#' in this order.
#'
#' @seealso [power4test_by_n()] and [power4test_by_es()]
#' for the output supported by
#' [power_curve()], [plot.power_curve()]
#' for the `plot` method and
#' [predict.power_curve()]
#' for the `predict` method of the output
#' of [power_curve()].
#'
#' @examples
#'
#' # Specify the population model
#'
#' model_simple_med <-
#' "
#' m ~ x
#' y ~ m + x
#' "
#'
#' # Specify the effect sizes (population parameter values)
#'
#' model_simple_med_es <-
#' "
#' y ~ m: l
#' m ~ x: m
#' y ~ x: s
#' "
#'
#' # Simulate datasets to check the model
#' # Set `parallel` to TRUE for faster, usually much faster, analysis
#' # Set `progress` to TRUE to display the progress of the analysis
#'
#' sim_only <- power4test(nrep = 10,
#'                        model = model_simple_med,
#'                        pop_es = model_simple_med_es,
#'                        n = 50,
#'                        fit_model_args = list(fit_function = "lm"),
#'                        do_the_test = FALSE,
#'                        iseed = 1234,
#'                        parallel = FALSE,
#'                        progress = FALSE)
#'
#' # By n: Do a test for different sample sizes
#'
#' out1 <- power4test_by_n(sim_only,
#'                         nrep = 10,
#'                         test_fun = test_parameters,
#'                         test_args = list(par = "y~x"),
#'                         n = c(25, 50, 100),
#'                         by_seed = 1234,
#'                         parallel = FALSE,
#'                         progress = FALSE)
#'
#' pout1 <- power_curve(out1)
#' pout1
#' plot(pout1)
#'
#' # By pop_es: Do a test for different population values of a model parameter
#'
#' out2 <- power4test_by_es(sim_only,
#'                          nrep = 10,
#'                          test_fun = test_parameters,
#'                          test_args = list(par = "y~x"),
#'                          pop_es_name = "y ~ x",
#'                          pop_es_values = c(0, .3, .5),
#'                          by_seed = 1234,
#'                          parallel = FALSE,
#'                          progress = FALSE)
#'
#' pout2 <- power_curve(out2)
#' pout2
#' plot(pout2)
#'
#' @export
power_curve <- function(object,
                        formula = NULL,
                        start = NULL,
                        lower_bound = NULL,
                        upper_bound = NULL,
                        nls_args = list(),
                        nls_control = list(),
                        verbose = FALSE,
                        models = c("nls", "logistic", "lm")) {

  models <- match.arg(models,
                      several.ok = TRUE)

  # reject ~ I((x - c0)^e) / (b + I((x - c0)^e))
  # The formula used depends on the nature of the predictors
  # The predictor should always be named 'x' in the formula.
  # The outcome is always 'reject'
  # `formula` can be a list of possible NLS models.

  class0 <- class(object)[1]
  if (!(class0 %in% c("power4test_by_n",
                      "power4test_by_es"))) {
    stop("'object' is neither a power4test_by_n or power4test_by_n object.")
  }

  if (class0 == "power4test_by_n") {
    if (is.null(formula)) {
      formula <- reject ~ I((x - c0)^e) / (b + I((x - c0)^e))
    }
    if (is.null(start)) {
      start <- c(b = 2, c0 = 100, e = 1)
    }
    if (is.null(lower_bound)) {
      lower_bound <- c(b = 0, c0 = 0, e = 1)
    }
    if (is.null(upper_bound)) {
      upper_bound <- c(b = Inf, c0 = Inf, e = Inf)
    }
  }

  if (class0 == "power4test_by_es") {
    if (is.null(formula)) {
      formula <- list(reject ~ 1 - 1 / I((1 + (x / d)^a)^b),
                      reject ~ 1 - exp(x / a) / I((1  + exp(x / a))^b),
                      reject ~ 1 - 2 / (exp(x / d) + exp(-x / d)),
                      reject ~ 1 / (1 + a * exp(-b * x)))
    }
    if (is.null(start)) {
      start <- list(c(a = 2, b = 4, d = 4),
                    c(a = 1, b = 2),
                    c(d = 1),
                    c(a = 1, b = 1))
    }
    if (is.null(lower_bound)) {
      lower_bound <- list(NULL, NULL, NULL, NULL)
    }
    if (is.null(upper_bound)) {
      upper_bound <- list(NULL, NULL, NULL, NULL)
    }
  }

  reject0 <- rejection_rates(object,
                             all_columns = TRUE)

  predictor <- switch(class0,
                      power4test_by_n = "n",
                      power4test_by_es = "es")

  #reject0$power <- reject0$reject
  reject0$x <- reject0[[predictor]]

  model_found <- FALSE

  fit <- NA

  if ((nrow(reject0) >= 4) &&
      ("nls" %in% models)) {

    # === nls ===
    nls_args_fixed <- fix_nls_args(formula = formula,
                                   start = start,
                                   lower_bound = lower_bound,
                                   upper_bound = upper_bound,
                                   nls_args = nls_args,
                                   nls_control = nls_control)
    # Try each formula
    fit_deviance <- Inf
    for (i in seq_along(nls_args_fixed$formula)) {
      nls_args1 <- utils::modifyList(nls_args_fixed$nls_args,
                                     list(data = reject0,
                                          control = nls_args_fixed$nls_contorl1,
                                          nrep = reject0$nrep))
      nls_args_fixed_i <- list(formula = nls_args_fixed$formula[[i]],
                               start = nls_args_fixed$start[[i]],
                               lower = nls_args_fixed$lower_bound[[i]],
                               upper = nls_args_fixed$upper_bound[[i]])
      for (xx in names(nls_args_fixed_i)) {
        if (is.null(nls_args_fixed_i[[xx]])) {
          nls_args_fixed_i[[xx]] <- NULL
        }
      }
      nls_args1 <- utils::modifyList(nls_args1,
                                     nls_args_fixed_i)
      # Do nls0
      fit_i <- tryCatch(suppressWarnings(do.call(do_nls,
                                                 nls_args1)),
                        error = function(e) e)
      if (inherits(fit_i, "nls")) {
        fit_i_d <- stats::deviance(fit_i)
        if (fit_i_d < fit_deviance) {
          fit <- fit_i
          fit_deviance <- fit_i_d
        }
      }
    }

    if (inherits(fit, "nls")) {
      model_found <- TRUE
    } else {
      if (verbose) {
        cat("- 'nls()' estimation failed. Switch to logistic regression.\n")
      }
    }

  } else if ("nls" %in% models) {
    if (verbose) {
      cat("- 'nls()' estimation skipped when less than 4 values of predictor examined.\n")
    }
  }

  # === Logistic ===

  if (!model_found ||
      ("logistic" %in% models)) {
    # Do logistic
    fit <- do_logistic(reject_df = reject0)

    if (inherits(fit, "glm")) {
      model_found <- TRUE
    } else {
      if (verbose) {
        cat("- Logistic regression failed. Switch to linear regression.\n")
      }
    }
  }

  # === OLS Regression ===

  if (!model_found ||
      ("ls" %in% models)) {
  # Last resort: OLS regression
    fit <- do_lm(reject_df = reject0,
                weights = reject0$nrep)

    if (inherits(fit, "lm")) {
      model_found <- TRUE
    } else {
      if (verbose) {
        cat("- OLS regression failed. No power curve estimated.\n")
      }
    }
  }

  # TODO:
  # - Consider using `splinefun()` as a last resort.

  out <- list(fit = fit,
              reject_df = reject0,
              predictor = predictor,
              call = match.call())
  class(out) <- c("power_curve", class(out))
  return(out)
}

#' @rdname power_curve
#'
#' @param x A `power_curve` object.
#'
#' @param data_used Logical. Whether
#' the rejection rates data frame
#' used to fit the model is printed.
#'
#' @param digits,right,row.names
#' Arguments of the same names used
#' by the `print` method of a
#' `data.frame` object. Used when `data_used`
#' is `TRUE` and the rejection rates
#' data frame is printed.
#'
#' @param ... For the `print` method of
#' `power_curve` objects, they are optional
#' arguments to be passed to
#' [print.data.frame()] when printing
#' the rejection rates data frame.
#'
#' @return
#' The `print` method of `power_curve`
#' object returns `x` invisibly. Called
#' for its side-effect.
#'
#' @export
print.power_curve <- function(x,
                              data_used = FALSE,
                              digits = 3,
                              right = FALSE,
                              row.names = FALSE,
                              ...) {
  cat("Call:\n")
  print(x$call)
  cat("\nPredictor: ",
      x$predictor,
      " (",
      switch(x$predictor,
             n = "Sample Size",
             es = "Effect Size"),
      ")\n",
      sep = "")
  cat("\nModel:\n")
  tmp <- x$fit
  tmp$data <- "(Omitted)"
  print(tmp)
  if (data_used) {
    cat("\nData Used:\n")
    print(x$reject_df,
          digits = digits,
          right = right,
          row.names = row.names,
          ...)
  }
  invisible(x)
}

#' @noRd
do_nls <- function(...,
                   nrep = NULL) {
  args <- list(...)
  # Try weights
  if (!is.null(nrep)) {
    args1 <- utils::modifyList(args,
                              list(weights = nrep))
    fit <- tryCatch(suppressWarnings(do.call(stats::nls,
                                     args1)),
                   error = function(e) e)
    if (inherits(fit, "nls")) {
      return(fit)
    }
  }

  # Do not use weights
  fit <- tryCatch(suppressWarnings(do.call(stats::nls,
                                   args)),
                  error = function(e) e)
  if (inherits(fit, "nls")) {
    return(fit)
  }
  # fit is an error. Return it
  return(fit)
}

#' @noRd
do_logistic <- function(reject_df) {
  # The predictor is always 'x'.

  # Expand to one row per replication
  reject1 <- reject_df[, c("x", "reject", "nrep")]
  reject1$sig <- round(reject1$reject * reject1$nrep)
  reject1$ns <- reject1$nrep - reject1$sig
  tmp <- mapply(function(x, y) {
                  c(rep(1, x), rep(0, y - x))
                },
                x = reject1$sig,
                y = reject1$nrep,
                SIMPLIFY = FALSE)
  tmp <- unlist(tmp)
  reject1 <- data.frame(x = rep(reject1$x,
                                times = reject1$nrep),
                        sig = tmp)
  # Rename sig to reject,
  # to be consistent with other mdoels
  reject1$reject <- reject1$sig

  # Do logistic regression
  fit <- tryCatch(stats::glm(reject ~ x,
                             data = reject1,
                             family = "binomial"),
                  error = function(e) e,
                  warning = function(w) w)
  # Also catch warning such as
  # - "fitted probabilities numerically 0 or 1 occurred>"

  # Always return the fit
  # Let the calling function to handle error
  return(fit)
}

#' @noRd
do_lm <- function(reject_df,
                  weights) {

  # Use weights
  fit <- tryCatch(stats::lm(reject ~ x,
                            data = reject_df,
                            weights = weights),
                  error = function(e) e)
  if (inherits(fit, "lm")) {
    return(fit)
  }
  # Do not use weights
  fit <- tryCatch(stats::lm(reject ~ x,
                            data = reject_df),
                  error = function(e) e)

  # Return fit, error or not
  return(fit)
}

#' @noRd

fix_nls_args <- function(formula,
                         start,
                         lower_bound,
                         upper_bound,
                         nls_args,
                         nls_control) {

  # Set up the per-formula arguments
  # - If not a list, coerce to a list:
  #   one element for each formula.

  if (!is.list(formula)) {
    formula <- list(formula)
  }
  if (!is.list(start)) {
    start <- list(start)
  }
  if (!is.list(lower_bound)) {
    lower_bound <- list(lower_bound)
  }
  if (!is.list(upper_bound)) {
    upper_bound <- list(upper_bound)
  }
  # Default argument values
  nls_contorl0 <- list(maxiter = 1000)
  nls_contorl1 <- utils::modifyList(nls_contorl0,
                                    nls_control)

  # Use "port" because zero residual cases are possible
  # But can be overridden
  nls_args0 <- list(algorithm = "port")
  nls_args1 <- utils::modifyList(nls_args0,
                                 nls_args)

  out <- list(formula = formula,
              start = start,
              lower_bound = lower_bound,
              upper_bound = upper_bound,
              nls_args = nls_args1,
              nls_control = nls_control)

  return(out)

}