render_batch: Call 'rmarkdown::render()' as a new slurm job

View source: R/slurm-utils.R

render_batchR Documentation

Call rmarkdown::render() as a new slurm job

Description

Use slurm_call to perform a single function evaluation a the Slurm cluster.

Usage

render_batch(
  ...,
  devtools_pkgs = devtools::dev_packages(),
  global_objects = NULL,
  pkgs = rev(.packages()),
  slurm_options = list()
)

render(use_sbatch = FALSE, ...)

Arguments

...

Arguments passed on to rmarkdown::render

input

The input file to be rendered. This can be an R script (.R), an R Markdown document (.Rmd), or a plain markdown document.

output_format

The R Markdown output format to convert to. The option "all" will render all formats defined within the file. The option can be the name of a format (e.g. "html_document") and that will render the document to that single format. One can also use a vector of format names to render to multiple formats. Alternatively, you can pass an output format object (e.g. html_document()). If using NULL then the output format is the first one defined in the YAML frontmatter in the input file (this defaults to HTML if no format is specified there). If you pass an output format object to output_format, the options specified in the YAML header or _output.yml will be ignored and you must explicitly set all the options you want when you construct the object. If you pass a string, the output format will use the output parameters in the YAML header or _output.yml.

output_file

The name of the output file. If using NULL then the output filename will be based on filename for the input file. If a filename is provided, a path to the output file can also be provided. Note that the output_dir option allows for specifying the output file path as well, however, if also specifying the path, the directory must exist. If output_file is specified but does not have a file extension, an extension will be automatically added according to the output format. To avoid the automatic file extension, put the output_file value in I(), e.g., I('my-output').

output_dir

The output directory for the rendered output_file. This allows for a choice of an alternate directory to which the output file should be written (the default output directory of that of the input file). If a path is provided with a filename in output_file the directory specified here will take precedence. Please note that any directory path provided will create any necessary directories if they do not exist.

output_options

List of output options that can override the options specified in metadata (e.g. could be used to force self_contained or mathjax = "local"). Note that this is only valid when the output format is read from metadata (i.e. not a custom format object passed to output_format).

output_yaml

Paths to YAML files specifying output formats and their configurations. The first existing one is used. If none are found, then the function searches YAML files specified to the output_yaml top-level parameter in the YAML front matter, _output.yml or _output.yaml, and then uses the first existing one.

intermediates_dir

Intermediate files directory. If a path is specified then intermediate files will be written to that path. If NULL, intermediate files are written to the same directory as the input file.

knit_root_dir

The working directory in which to knit the document; uses knitr's root.dir knit option. If NULL then the behavior will follow the knitr default, which is to use the parent directory of the document.

runtime

The runtime target for rendering. The static option produces output intended for static files; shiny produces output suitable for use in a Shiny document (see run). The default, auto, allows the runtime target specified in the YAML metadata to take precedence, and renders for a static runtime target otherwise.

clean

Using TRUE will clean intermediate files that are created during rendering.

params

A list of named parameters that override custom params specified within the YAML front-matter (e.g. specifying a dataset to read or a date range to confine output to). Pass "ask" to start an application that helps guide parameter configuration.

knit_meta

(This option is reserved for expert use.) Metadata generated by knitr.

envir

The environment in which the code chunks are to be evaluated during knitting (can use new.env() to guarantee an empty new environment).

run_pandoc

An option for whether to run pandoc to convert Markdown output.

quiet

An option to suppress printing during rendering from knitr, pandoc command line and others. To only suppress printing of the last "Output created: " message, you can set rmarkdown.render.message to FALSE

encoding

Ignored. The encoding is always assumed to be UTF-8.

devtools_pkgs

character names of packages that are currently loaded with devtools (and should be loaded on slurm instance). The paths to these packages will be passed on down.

global_objects

A character vector containing the name of R objects to be saved in a .RData file and loaded on each cluster node prior to calling f.

pkgs

A character vector containing the names of packages that must be loaded on each cluster node. By default, it includes all packages loaded by the user when slurm_call is called.

slurm_options

A named list of options recognized by sbatch; see Details below for more information.

use_sbatch

logical whether to use render_batch() or rmarkdown::render()

Details

This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.

The names of any other R objects (besides params) that f needs to access should be listed in the global_objects argument.

Use slurm_options to set any option recognized by sbatch, e.g. slurm_options = list(time = "1:00:00", share = TRUE). See http://slurm.schedmd.com/sbatch.html for details on possible options. Note that full names must be used (e.g. "time" rather than "t") and that flags (such as "share") must be specified as TRUE. The "job-name", "ntasks" and "output" options are already determined by slurm_call and should not be manually set.

When processing the computation job, the Slurm cluster will output two files in the temporary folder: one with the return value of the function ("results_0.RDS") and one containing any console or error output produced by R ("slurm_[node_id].out").

If submit = TRUE, the job is sent to the cluster and a confirmation message (or error) is output to the console. If submit = FALSE, a message indicates the location of the saved data and script files; the job can be submitted manually by running the shell command sbatch submit.sh from that directory.

After sending the job to the Slurm cluster, slurm_call returns a slurm_job object which can be used to cancel the job, get the job status or output, and delete the temporary files associated with it. See the description of the related functions for more details.

Value

A slurm_job object containing the jobname and the number of nodes effectively used.

Functions

  • render: call rmarkdown::render using sbatch, or not, ignoring sbatch specific arguments

See Also

slurm_apply to parallelize a function over a parameter set.

cancel_slurm, cleanup_files, get_slurm_out and get_job_status which use the output of this function.

Examples

## Not run: 
create_exampleproject(skeleton_args = list(authors = 'you and me', project_type = 'scRNA', investigator = 'alligator', project_title = 'schit', navigate_rawdata = FALSE))
render_batch(slurm_options = list(time = "1:00:00", "mem-per-cpu" = "16gb", partition = "amcdavid", "cpus-per-task" = 1),
input = "01qc.Rmd",
params = list(tenx_root = NULL, tenx_h5 = 'scratch/AGG1/raw_feature_bc_matrix.h5', auto_filter = FALSE,
output_root='refined/01qc_nofilter', batch_var = 'tissue_source', citeseq_str = '_TotalA'),
output_file = '01qc_nofilter', output_dir = 'reports', quiet = TRUE)

## End(Not run)

amcdavid/Genesee documentation built on April 14, 2022, 5:16 a.m.