Description Usage Arguments Details Value Methods (by class) See Also Examples
In a given directory, writes the argument grid given from grid_apply(.f, ..., .eval=FALSE)
,
an Rscript to run .f
on one set of arguments,
a submission script to run .f
on combination of arguments,
and directories to store results and job log files.
1 2 3 4 5 6 7 8 | setup(object, ...)
## S3 method for class 'gapply'
setup(object, .dir = getwd(), .reps = 1, .seed = NULL,
.mc.cores = 1, .verbose = 1, .queue = "long",
.script.name = "doone.R", .job.name = "distributr",
.out.dir = "SGE_Output", .R.version = "3.2.5", .email.options = "a",
.email.addr = NULL, .shell = "bash", ...)
|
object |
object from |
... |
arguments to methods |
.dir |
directory name relative to the current working directory (no trailing backslash) |
.reps |
total number of replications for each condition |
.seed |
An integer or |
.mc.cores |
number of cores used to run replications in parallel (can be a range) |
.verbose |
verbose level: |
.queue |
name of queue |
.script.name |
name of script (default |
.job.name |
name of job |
.out.dir |
name of directory in which to put SGE output files. |
.R.version |
name of R version. Possible values include any in |
.email.options |
one or more characters from "bea" meaning email when "job Begins", "job Ends", and "job Aborts". Default is "a". |
.email.addr |
email address |
.shell |
shell to use. Default is 'bash' |
Long running grid_apply
computations can be easily run in parallel on
SGE using array tasks. Each row in the argument grid given by grid_apply(f, ...)
is mapped to a unique task id, which is run on a separate node.
setup()
makes this easy by writing
the argument grid (arg_grid.Rdata
), an R script to run one combination of arguments, a submission
script assigning all rows to a unique task id, seeds (if specified), and folders to store results in
a given directory. Jobs are submitted to the scheduler by running qsub submit
at the prompt, or by running submit()
within R.
The argument grid (arg_grid
) is saved to .dir
as arg_grid.Rdata
.
It contains the columns of expand.grid(...)
from grid_apply(.f, ...)
.
A column $.sge_id
is appended that assigns each row a unique job id.
A simple R script (doone.R
) is provided that runs .f
on one row
of arg_grid
. Running doone.R
at the command line exactly replicates
how the script will be run on each node.
A file (submit
) is also written, which specifies a task array for qsub
for all jobs in arg_grid
. It can be submitted to the queue by running
qsub submit
at the command line. Job status can be monitored with qstat
.
Various email
Results are stored in results/
, as $SGE_TASK_ID.Rdata
where
SGE_TASK_ID
is the array task corresponding to a unique row in arg_grid
.
It is sometimes convenient to access this variable within .f
, which can
be done by Sys.getenv("SGE_TASK_ID")
. This might be used to
cache intermediate results.
If .seed
is given, a list of seeds is generated in seeds.Rdata
using
L'ecuyer-CMRG streams for reproducible random number generation. A unique seed is
generated for each independent job in the argument grid.
Subsequent calls to setup using the same .seed generate the same seeds and reproducible results.
See parallel::nextRNGStream
for more details.
The function .f
can be run multiple times for every row in arg_grid
by setting .reps > 1
. These replications can be run in parallel using
mclapply
by setting .mc.cores > 1
. To decrease waiting times in the queue,
mc.cores
can be given a range (e.g. mc.cores = c(1, 8)
), and the job will
be submitted when a given set of cores in that range is available. To access the
number of cores given to each job, use Sys.getenv("NSLOTS")
.
It is easy to corrupt arg_grid.Rdata
by running setup
on different
sets of arguments, making future merges of results with arguments based on
.sge_id
invalid. If arg_grid.Rdata
already exists, setup
prompts the user for verification that an overwrite is intended, or stops with an error
if not run interactively.
Invisibly, the original object with argument grid modified to append a
column $.sge_id
assigning each row to a unique job id.
As side effects, the function writes the following objects to .dir
:
arg_grid.Rdata |
Data frame containing the argument grid,
appended with a column |
doone.R |
Script to run one job, or one row from |
submit |
Submission script specifying a task array over the grid of parameters in (all rows of) arg_grid.Rdata |
seeds.Rdata |
If |
results/ |
Folder to store results. Each file is |
SGE_Output/ |
Folder for output from SGE |
gapply
: Setup sge files from gapply, grid_apply
grid_apply to define the grid, jobs to see the grid,
collect to collect completed results, and tidy to merge
completed results with the argument grid.
test_job Runs a job with a given id on the head node.
filter_jobs writes a submission script for jobs matching conditions as in dplyr::filter
sge_env can be used to access environmental variables.
1 2 3 4 5 6 7 8 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.