knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval = FALSE ) library(MplusAutomation)
Many modern SEM applications (e.g., BSEM with MCMC, multilevel SEM with many random effects, ML with multidimensional integration) can require tens of minutes to many hours per model. When you need to estimate hundreds or thousands of models, such as in Monte Carlo studies or large screening pipelines, a high-performance computing cluster (HPCC) is the right tool. MplusAutomation::submitModels()
streamlines creating, batching, submitting, and tracking Mplus jobs on HPCC schedulers (SLURM or Torque), so projects that would take weeks locally can finish in hours on a cluster.
submitModels()
submitModels( target = getwd(), recursive = FALSE, filefilter = NULL, replaceOutfile = "modifiedDate", scheduler = c("slurm", "torque"), sched_args = NULL, cores_per_model = 1L, memgb_per_model = 8L, time_per_model = "1:00:00", combine_jobs = TRUE, max_time_per_job = "24:00:00", combine_memgb_tolerance = 1, combine_cores_tolerance = 2, batch_outdir = NULL )
.inp
files; optionally recurse and/or use filefilter
(regex) to narrow submissions. replaceOutfile = "modifiedDate"
to resubmit only when the .inp
is newer than an existing .out
. "slurm"
or "torque"
). combine_jobs = TRUE
to group similar models into a single batch job capped by max_time_per_job
; “similarity” is controlled by tolerances for memory and cores.Submit all .inp
files in a directory (not its subdirectories) to SLURM:
track <- submitModels( target = "/proj/my_mplus_models", scheduler = "slurm", cores_per_model = 1L, memgb_per_model = 8L, time_per_model = "01:00:00", combine_jobs = TRUE, max_time_per_job = "24:00:00" )
Filter by regex and search subfolders:
track <- submitModels( target = "/proj/my_mplus_models", recursive = TRUE, filefilter = ".*12hour_forecast.*", replaceOutfile = "modifiedDate", scheduler = "slurm", cores_per_model = 2L, memgb_per_model = 16L, time_per_model = "02:00:00" )
Torque/PBS users:
track <- submitModels( target = "path/to/models", scheduler = "torque", cores_per_model = 4L, memgb_per_model = 24L, time_per_model = "0-06:00:00" # dd-hh:mm:ss accepted by Torque )
.inp
filesYou can override global submitModels()
arguments by embedding comment-line directives in the Mplus input file. These are read and translated into scheduler flags at submission time:
! memgb 16 ! processors 2 ! time 0:30:00 ! #SBATCH --mail-type=FAIL ! #PBS -m ae ! pre Rscript --vanilla pre_run.R ! post Rscript --vanilla post_run.R
memgb
, processors
, time
set per-model requests. ! #SBATCH ...
or ! #PBS ...
lines are passed through to SLURM/Torque. pre
/post
let you run scripts around the Mplus call (e.g., bookkeeping, post-parse with readModels()
).Example .inp
header
! memgb 16 ! processors 2 ! time 0:30:00 ! #SBATCH --mail-type=FAIL ! pre Rscript --vanilla pre_example.R ! post Rscript --vanilla post_example.R TITLE: Example regression DATA: FILE IS ex3.1.dat; VARIABLE: NAMES ARE y1 x1 x3; MODEL: y1 ON x1 x3;
A simple “post” script might parse the output to RDS:
# post_example.R mplusdir <- Sys.getenv("MPLUSDIR") mplusinp <- Sys.getenv("MPLUSINP") library(MplusAutomation) m <- readModels(file.path(mplusdir, sub("\\.inp$", ".out", mplusinp))) saveRDS(m, file.path(mplusdir, sub("\\.inp$", ".rds", mplusinp)))
Submitting thousands of tiny jobs can annoy schedulers and slow throughput. With combine_jobs = TRUE
, submitModels()
groups models with similar resource needs (within combine_memgb_tolerance
GB and combine_cores_tolerance
cores) into a batch whose total time does not exceed max_time_per_job
. This reduces queue overhead and improves cluster utilization.
Example strategy:
track <- submitModels( target = "/proj/mplus_runs", scheduler = "slurm", combine_jobs = TRUE, max_time_per_job = "06:00:00", combine_memgb_tolerance = 1, combine_cores_tolerance = 2 )
submitModels()
returns a data frame that records job metadata (IDs, paths, resources). Use checkSubmission()
(or summary(track)
) to query the scheduler for live status:
checkSubmission(track) # Submission status as of: 2024-10-10 08:16:53 # ------- # jobid file status # 50531540 ex3.3.inp queued # 50531541 ex3.1.inp queued Sys.sleep(45) checkSubmission(track) # jobid file status # 50531540 ex3.3.inp complete # 50531541 ex3.1.inp complete
This makes it easy to poll progress and kick off downstream steps once batches are done.
hh:mm:ss
or d-hh:mm:ss
; Torque often prefers d-hh:mm:ss
). replaceOutfile = "modifiedDate"
to avoid resubmitting completed models unless the .inp
changed. pre
/post
hooks to encapsulate pre/post-processing, logging, and artifact capture.Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.