aggregate_simulations: Collapse separate simulation files into a single result

View source: R/aggregate_simulations.R

aggregate_simulationsR Documentation

Collapse separate simulation files into a single result

Description

This function aggregates the results from SimDesign's runSimulation into a single objects suitable for post-analyses, or combines all the saved results directories and combines them into one. This is useful when results are run piecewise on one node (e.g., 500 replications in one batch, 500 again at a later date) or run independently across different nodes/computers that are not on the same network.

Usage

aggregate_simulations(
  files = NULL,
  filename = NULL,
  dirs = NULL,
  results_dirname = "SimDesign_aggregate_results",
  select = NULL,
  check.only = FALSE,
  target.reps = NULL
)

Arguments

files

a character vector containing the names of the simulation's final .rds files

filename

(optional) name of .rds file to save aggregate simulation file to. If not specified then the results will only be returned in the R console

dirs

a character vector containing the names of the save_results directories to be aggregated. A new folder will be created and placed in the results_dirname output folder

results_dirname

the new directory to place the aggregated results files

select

a character vector indicating columns to variables to select from the SimExtract(what='results') information. This is mainly useful when RAM is an issue given simulations with many stored estimates. Default includes the results objects in their entirety, though to omit all internally stored simulation results pass the character 'NONE'

check.only

logical; for larger simulations file sets, such as those generated by runArraySimulation, return the design conditions that do no satisfy the target.reps

target.reps

(optional) number of replications to check against to evaluate whether the simulation files returned the desired number of replications. If missing, the highest detected value from the collected set of replication information will be used

Value

if files is used the function returns a data.frame/tibble with the (weighted) average of the simulation results. Otherwise, if dirs is used, the function returns NULL

Author(s)

Phil Chalmers rphilip.chalmers@gmail.com

References

Chalmers, R. P., & Adkins, M. C. (2020). Writing Effective and Reliable Monte Carlo Simulations with the SimDesign Package. The Quantitative Methods for Psychology, 16(4), 248-280. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.20982/tqmp.16.4.p248")}

Sigal, M. J., & Chalmers, R. P. (2016). Play it again: Teaching statistics with Monte Carlo simulation. Journal of Statistics Education, 24(3), 136-156. \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/10691898.2016.1246953")}

See Also

runSimulation

Examples

## Not run: 

setwd('my_working_directory')

## run simulations to save the .rds files (or move them to the working directory)
# ret1 <- runSimulation(..., filename='file1')
# ret2 <- runSimulation(..., filename='file2')

# saves to the hard-drive and stores in workspace
final <- aggregate_simulations(files = c('file1.rds', 'file2.rds'))
final

# If filename not included, can be extracted from results
# files <- c(SimExtract(ret1, 'filename'), SimExtract(ret2, 'filename'))
# final <- aggregate_simulations(files = files)

# aggregate saved results for .rds files and results directories
# runSimulation(..., save_results = TRUE, save_details = list(save_results_dirname = 'dir1'))
# runSimulation(..., save_results = TRUE, save_details = list(save_results_dirname = 'dir2'))

# place new saved results in 'SimDesign_results/' by default
aggregate_simulations(files = c('file1.rds', 'file2.rds'),
                      filename='aggreged_sim.rds',
                      dirs = c('dir1', 'dir2'))

# If dirnames not included, can be extracted from results
# dirs <- c(SimExtract(ret1, 'save_results_dirname'),
            SimExtract(ret2, 'save_results_dirname'))
# aggregate_simulations(dirs = dirs)

#################################################
# Example where each row condition is repeated, evaluated independently,
# and later collapsed into a single analysis object

# Each condition repeated four times (hence, replications
# should be set to desired.reps/4)
Design <- createDesign(N  = c(30, 60),
                       mu = c(0,5))
Design

Design4 <- expandDesign(Design, 4)
Design4

#-------------------------------------------------------------------

Generate <- function(condition, fixed_objects = NULL) {
    dat <- with(condition, rnorm(N, mean=mu))
    dat
}

Analyse <- function(condition, dat, fixed_objects = NULL) {
    ret <- c(mean=mean(dat), SD=sd(dat))
    ret
}

Summarise <- function(condition, results, fixed_objects = NULL) {
    ret <- colMeans(results)
    ret
}

#-------------------------------------------------------------------

# Generate fixed seeds to be distributed
set.seed(1234)
seeds <- gen_seeds(Design)
seeds

# replications vector (constant is fine if the same across conditions;
# below is vectorized to demonstrate that this could change)
replications <- rep(250, nrow(Design))

# create directory to store all final simulation files
dir.create('sim_files/')

# distribute jobs independently (explicitly parallelize here on cluster,
# which is more elagantly managed via runArraySimulation)
sapply(1:nrow(Design), \(i) {
  runSimulation(design=Design[i, ], replications=replications[i],
                generate=Generate, analyse=Analyse, summarise=Summarise,
                filename=paste0('sim_files/job-', i)) |> invisible()
})

# check that all replications satisfy target
files <- paste0('sim_files/job-', 1:nrow(Design), ".rds")
aggregate_simulations(files = files, check.only = TRUE)

# this would have been returned were the target.rep supposed to be 1000
aggregate_simulations(files = files, check.only = TRUE, target.reps=1000)

# aggregate into single object
sim <- aggregate_simulations(files = paste0('sim_files/job-',
                                     1:nrow(Design), ".rds"))
sim


## End(Not run)

philchalmers/SimDesign documentation built on April 14, 2024, 6:38 p.m.