SimSolve: Optimized target quantities in simulation experiments...

View source: R/SimSolve.R

SimSolveR Documentation

Optimized target quantities in simulation experiments (ProBABLI)

Description

This function provides a stochastic optimization approach to solving specific quantities in simulation experiments (e.g., solving for a specific sample size to meet a target power rate) using the Probablistic Bisection Algorithm with Bolstering and Interpolations (ProBABLI; Chalmers, in review). The structure follows the steps outlined in runSimulation, however portions of the design input are taken as variables to be estimated rather than fixed, and the constant b is required in order to solve the root equation f(x) - b = 0.

Usage

SimSolve(
  design,
  interval,
  b,
  generate,
  analyse,
  summarise,
  replications = list(burnin = 15L, burnin.reps = 100L, max.reps = 500L, increase.by =
    10L),
  integer = TRUE,
  formula = y ~ poly(x, 2),
  family = "binomial",
  parallel = FALSE,
  cl = NULL,
  save = TRUE,
  ncores = parallel::detectCores() - 1L,
  type = ifelse(.Platform$OS.type == "windows", "PSOCK", "FORK"),
  maxiter = 100L,
  verbose = TRUE,
  control = list(),
  ...
)

## S3 method for class 'SimSolve'
summary(object, tab.only = FALSE, reps.cutoff = 300, ...)

## S3 method for class 'SimSolve'
plot(x, y, ...)

Arguments

design

a tibble or data.frame object containing the Monte Carlo simulation conditions to be studied, where each row represents a unique condition and each column a factor to be varied (see also createDesign). However, exactly one column of this object must be specified with NA placeholders to indicate that the missing value should be solved via the stochastic optimizer

interval

a vector of length two, or matrix with nrow(design) and two columns, containing the end-points of the interval to be searched. If a vector then the interval will be used for all rows in the supplied design object

b

a single constant used to solve the root equation f(x) - b = 0

generate

generate function. See runSimulation

analyse

analysis function. See runSimulation

summarise

summary function that returns a single number corresponding to a function evaluation f(x) in the equation f(x) = b to be solved as a root f(x) - b = 0. Unlike in the standard runSimulation() definitions this input is required. For further information on this function specification, see runSimulation

replications

a named list or vector indicating the number of replication to use for each design condition per PBA iteration. By default the input is a list with the arguments burnin = 15L, specifying the number of burn-in iterations to used, burnin.reps = 100L to indicate how many replications to use in each burn-in iteration, max.reps = 500L to prevent the replications from increasing higher than this number, and increase.by = 10L to indicate how many replications to increase after the burn-in stage. Unless otherwise specified these defaults will be used, but can be overwritten by explicit definition (e.g., replications = list(increase.by = 25L))

Vector inputs can specify the exact replications for each iterations. As a general rule, early iterations should be relatively low for initial searches to avoid unnecessary computations for locating the approximate root, though the number of replications should gradually increase to reduce the sampling variability as the PBA approaches the root.

integer

logical; should the values of the root be considered integer or numeric? If TRUE then bolstered directional decisions will be made in the pba function based on the collected sampling history throughout the search

formula

regression formula to use when interpolate = TRUE. Default fits an orthogonal polynomial of degree 2

family

family argument passed to glm. By default the 'binomial' family is used, as this function defaults to power analysis setups where isolated results passed to summarise will return 0/1s, however other families should be used had summarise returned something else (e.g., if solving for a particular standard error then a 'gaussian' family would be more appropriate).

Note that if individual results from the analyse steps should not be used (i.e., only the aggregate from summarise is meaningful) then set control = list(summarise.reg_data = TRUE) to override the default behaviour, thereby using only the aggregate information and weights

parallel

for parallel computing for slower simulation experiments (see runSimulation for details)

cl

see runSimulation

save

logical; store temporary file in case of crashes. If detected in the working directory will automatically be loaded to resume (see runSimulation for similar behavior)

ncores

see runSimulation

type

type of cluster object to define. If type used in plot then can be 'density' to plot the density of the iteration history after the burn-in stage, 'iterations' for a bubble plot with inverse replication weights. If not specified then the default PBA plots are provided (see PBA)

maxiter

the maximum number of iterations (default 100)

verbose

logical; print information to the console?

control

a list of the algorithm control parameters. If not specified, the defaults described below are used.

tol

tolerance criteria for early termination (.1 for integer = TRUE searches; .00025 for non-integer searches

rel.tol

relative tolerance criteria for early termination (default .0001)

k.success

number of consecutive tolerance success given rel.tol and tol criteria. Consecutive failures add -1 to the counter (default is 3)

bolster

logical; should the PBA evaluations use bolstering based on previous evaluations? Default is TRUE, though only applicable when integer = TRUE

single_step.iter

when integer = TRUE, do not take steps larger than size 1 (default is 40). This prevents wide oscillations around the probable root for uncertain solutions

interpolate.R

number of replications to collect prior to performing the interpolation step (default is 3000 after accounting for data exclusion from burnin). Setting this to 0 will disable any interpolation computations

include_reps

logical; include a column in the condition elements to indicate how many replications are currently being evaluated? Mainly useful when further precision tuning within each ProBABLI iteration is desirable (e.g., for bootstrapping). Default is FALSE

summarise.reg_data

logical; should the aggregate results from Summarise (along with its associated weights) be used for the interpolation steps, or the raw data from the Analyse step? Set this to TRUE when the individual results from Analyse give less meaningful information

...

additional arguments to be pasted to PBA

object

object of class 'SimSolve'

tab.only

logical; print only the (reduce) table of estimates?

reps.cutoff

integer indicating the rows to omit from output if the number of replications do no reach this value

x

object of class 'SimSolve'

y

design row to plot. If omitted defaults to 1

Details

Optimization is performed using the probabilistic bisection algorithm (PBA) to find the associated root given the noisy simulation objective function evaluations. Information is also collected throughout the iterations in order to make more aggressive predictions about the associated root via interpolation and extrapolation.

Value

the filled-in design object containing the associated lower and upper interval estimates from the stochastic optimization

Author(s)

Phil Chalmers rphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations with the SimDesign Package. The Quantitative Methods for Psychology, 16(4), 248-280. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.20982/tqmp.16.4.p248")}

Examples


## Not run: 

# TASK: Find specific sample size in each group for independent t-test
# corresponding to a power rate of .8
#
# For ease of the setup, assume the groups are the same size, and the mean
# difference corresponds to Cohen's d values of .2, .5, and .8
# This example can be solved numerically using the pwr package (see below),
# though the following simulation setup is far more general and can be
# used for any generate-analyse combination of interest

# SimFunctions(SimSolve=TRUE)

#### Step 1 --- Define your conditions under study and create design data.frame.
#### However, use NA placeholder for sample size as it must be solved,
#### and add desired power rate to object

Design <- createDesign(N = NA,
                       d = c(.2, .5, .8),
                       sig.level = .05)
Design    # solve for NA's

#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions

Generate <- function(condition, fixed_objects = NULL) {
    Attach(condition)
    group1 <- rnorm(N)
    group2 <- rnorm(N, mean=d)
    dat <- data.frame(group = gl(2, N, labels=c('G1', 'G2')),
                      DV = c(group1, group2))
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    p <- t.test(DV ~ group, dat, var.equal=TRUE)$p.value
    p
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    # Must return a single number corresponding to f(x) in the
    # root equation f(x) = b

    ret <- EDR(results, alpha = condition$sig.level)
    ret
}

#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Optimize N over the rows in design
# Initial search between N = [10,500] for each row using the default
# integer solver (integer = TRUE)

# In this example, b = target power
solved <- SimSolve(design=Design, b=.8, interval=c(10, 500),
                generate=Generate, analyse=Analyse,
                summarise=Summarise)
solved
summary(solved)
plot(solved, 1)
plot(solved, 2)
plot(solved, 3)

# also can plot median history and estimate precision
plot(solved, 1, type = 'history')
plot(solved, 1, type = 'density')
plot(solved, 1, type = 'iterations')

# verify with true power from pwr package
library(pwr)
pwr.t.test(d=.2, power = .8, sig.level = .05)
pwr.t.test(d=.5, power = .8, sig.level = .05)
pwr.t.test(d=.8, power = .8, sig.level = .05)

# use estimated N results to see how close power was
N <- solved$N
pwr.t.test(d=.2, n=N[1], sig.level = .05)
pwr.t.test(d=.5, n=N[2], sig.level = .05)
pwr.t.test(d=.8, n=N[3], sig.level = .05)

# with rounding
N <- ceiling(solved$N)
pwr.t.test(d=.2, n=N[1], sig.level = .05)
pwr.t.test(d=.5, n=N[2], sig.level = .05)
pwr.t.test(d=.8, n=N[3], sig.level = .05)

# failing analytic formula, confirm results with more precise
#  simulation via runSimulation()
csolved <- solved
csolved$N <- ceiling(solved$N)
confirm <- runSimulation(design=csolved, replications=10000, parallel=TRUE,
                         generate=Generate, analyse=Analyse,
                         summarise=Summarise)
confirm


############
# Similar setup as above, however goal is now to solve d given sample
# size and power inputs (inputs for root no longer required to be an integer)

Design <- createDesign(N = c(100, 50, 25),
                       d = NA,
                       sig.level = .05)
Design    # solve for NA's

#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 2 --- Define generate, analyse, and summarise functions (same as above)

#~~~~~~~~~~~~~~~~~~~~~~~~
#### Step 3 --- Optimize d over the rows in design
# search between d = [.1, 2] for each row

# In this example, b = target power
# note that integer = FALSE to allow smooth updates of d
solved <- SimSolve(design=Design, b = .8, interval=c(.1, 2),
                generate=Generate, analyse=Analyse,
                summarise=Summarise, integer=FALSE)
solved
summary(solved)
plot(solved, 1)
plot(solved, 2)
plot(solved, 3)

# plot median history and estimate precision
plot(solved, 1, type = 'history')
plot(solved, 1, type = 'density')
plot(solved, 1, type = 'iterations')

# verify with true power from pwr package
library(pwr)
pwr.t.test(n=100, power = .8, sig.level = .05)
pwr.t.test(n=50, power = .8, sig.level = .05)
pwr.t.test(n=25, power = .8, sig.level = .05)

# use estimated d results to see how close power was
pwr.t.test(n=100, d = solved$d[1], sig.level = .05)
pwr.t.test(n=50, d = solved$d[2], sig.level = .05)
pwr.t.test(n=25, d = solved$d[3], sig.level = .05)

# failing analytic formula, confirm results with more precise
#  simulation via runSimulation()
confirm <- runSimulation(design=solved, replications=10000, parallel=TRUE,
                         generate=Generate, analyse=Analyse,
                         summarise=Summarise)
confirm


## End(Not run)

SimDesign documentation built on Oct. 17, 2023, 5:07 p.m.