knitr::opts_chunk$set( collapse = TRUE, cache = TRUE, eval = FALSE, comment = "#>" )
If you have access to a distributed computing platform (e.g. a computer cluster provided by your university or a cloud computing service like AWS), then you can easily move your simulatr
simulation from your laptop to this platform.
simulatr
specifier objectThe same simulatr
specifier object you created on your laptop can be used on your distributed computing platform! Please read the "simulatr
on your laptop" article (linked above) if you have not done so already. Once you have a simulatr
specifier object you are happy with, save it to disk using a command like the following:
saveRDS(object = simulatr_spec, file = simulatr_spec_filename)
Running simulatr
on a distributed computing platform is facilitated by a Nextflow pipeline, Katsevich-Lab/simulatr-pipeline
. This pipeline takes as inputs the simulatr
specifier file created in steps 1-3, as well as the desired limits on the the memory and time usage of each individual process. It then adaptively parallelizes the simulation tasks accordingly, splitting the replicates for each combination of method and parameter setting into one or more processes.
Below are instructions for running this pipeline on your distributed computing platform.
Installing Nextflow is easy; follow the instructions here. Next, you must configure Nextflow to work with your specific distributed computing platform. To get familiar with Nextflow configuration, you can read this tutorial, see this list of configuration files at other institutions, read 5 Nextflow tips for HPC users and 5 more Nextflow tips for HPC users, and finally, consult the Nextflow documentation.
simulatr
pipelineTo make sure you have the latest version of the simulatr
pipeline, download it using the shell command
nextflow pull katsevich-lab/simulatr-pipeline
simulatr
pipelineYou can run the simulatr
pipeline using a command like the following:
nextflow run katsevich-lab/simulatr-pipeline \ --simulatr_specifier_fp /path/to/simspec/obj \ --result_dir directory/for/results \ --result_file_name "simulatr_result.rds" \ --B_check 5 \ --B 100 \ --max_gb 8 \ --max_hours 4
The command-line arguments are described below. Note that only the first (simulatr_specifier_fp
) is required; the rest have sensible defaults, included in the following descriptions.
simulatr_specifier_fp
: (Required) The path to the simulatr
specifier object saved at the end of steps 1-3 above.result_dir
: (Optional) The directory in which to write the output file. Defaults to the current working directory. result_file_name
: (Optional) The name of the output file. Defaults to "simulatr_result.rds"
.B_check
: (Optional) The number of initial simulation replicates to run for each combination of method and parameter setting in order to benchmark the memory and time required for adaptive parallelization. Defaults to 5.B
: (Optional) The number of simulation replicates to run for the full simulation. Defaults to the value in the simulatr
specifier object. You may want to set B
to a small number during an initial trial run. max_gb
: (Optional) The maximum number of GB each process has available. This number is used in the adaptive parallelization scheme. Defaults to 8. max_hours
: (Optional) The maximum number of hours each process can run for. This number is used in the adaptive parallelization scheme. Defaults to 4.time_fudge_factor
: a positive real number indicating how liberal or conservative we should be in estimating the number of processors to use for a given method-grid row pair. A number in the interval (0,1) indicates that we should be conservative (i.e., request more processors than is probably necessary), while a number in the interval $(1, \infty)$ indicates that we should be liberal (i.e., request fewer processors than is probably necessary). The default is 0.8.mem_fudge_factor
: Similar to time_fudge_factor
, but for memory.Upon successful completion of the pipeline, the results will be written to disk in RDS format at the location requested. The results are in the same format as returned by check_simulatr_specifier_object()
; see "simulatr on your laptop".
Read the results from disk using readRDS()
. The rest is the same as in "simulatr on your laptop".
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.