render_batch: Call 'rmarkdown::render()' as a new slurm job
In amcdavid/Genesee: Core functions for genomics project templating

render_batch

R Documentation

Call `rmarkdown::render()` as a new slurm job

Description

Use slurm_call to perform a single function evaluation a the Slurm cluster.

Usage

render_batch(
  ...,
  devtools_pkgs = devtools::dev_packages(),
  global_objects = NULL,
  pkgs = rev(.packages()),
  slurm_options = list()
)

render(use_sbatch = FALSE, ...)

Arguments

`...`	Arguments passed on to `rmarkdown::render` `input` The input file to be rendered. This can be an R script (.R), an R Markdown document (.Rmd), or a plain markdown document. `output_format` The R Markdown output format to convert to. The option `"all"` will render all formats defined within the file. The option can be the name of a format (e.g. `"html_document"`) and that will render the document to that single format. One can also use a vector of format names to render to multiple formats. Alternatively, you can pass an output format object (e.g. `html_document()`). If using `NULL` then the output format is the first one defined in the YAML frontmatter in the input file (this defaults to HTML if no format is specified there). If you pass an output format object to `output_format`, the options specified in the YAML header or `_output.yml` will be ignored and you must explicitly set all the options you want when you construct the object. If you pass a string, the output format will use the output parameters in the YAML header or `_output.yml`. `output_file` The name of the output file. If using `NULL` then the output filename will be based on filename for the input file. If a filename is provided, a path to the output file can also be provided. Note that the `output_dir` option allows for specifying the output file path as well, however, if also specifying the path, the directory must exist. If `output_file` is specified but does not have a file extension, an extension will be automatically added according to the output format. To avoid the automatic file extension, put the `output_file` value in `I()`, e.g., `I('my-output')`. `output_dir` The output directory for the rendered `output_file`. This allows for a choice of an alternate directory to which the output file should be written (the default output directory of that of the input file). If a path is provided with a filename in `output_file` the directory specified here will take precedence. Please note that any directory path provided will create any necessary directories if they do not exist. `output_options` List of output options that can override the options specified in metadata (e.g. could be used to force `self_contained` or `mathjax = "local"`). Note that this is only valid when the output format is read from metadata (i.e. not a custom format object passed to `output_format`). `output_yaml` Paths to YAML files specifying output formats and their configurations. The first existing one is used. If none are found, then the function searches YAML files specified to the `output_yaml` top-level parameter in the YAML front matter, _output.yml or _output.yaml, and then uses the first existing one. `intermediates_dir` Intermediate files directory. If a path is specified then intermediate files will be written to that path. If `NULL`, intermediate files are written to the same directory as the input file. `knit_root_dir` The working directory in which to knit the document; uses knitr's `root.dir` knit option. If `NULL` then the behavior will follow the knitr default, which is to use the parent directory of the document. `runtime` The runtime target for rendering. The `static` option produces output intended for static files; `shiny` produces output suitable for use in a Shiny document (see `run`). The default, `auto`, allows the `runtime` target specified in the YAML metadata to take precedence, and renders for a `static` runtime target otherwise. `clean` Using `TRUE` will clean intermediate files that are created during rendering. `params` A list of named parameters that override custom params specified within the YAML front-matter (e.g. specifying a dataset to read or a date range to confine output to). Pass `"ask"` to start an application that helps guide parameter configuration. `knit_meta` (This option is reserved for expert use.) Metadata generated by knitr. `envir` The environment in which the code chunks are to be evaluated during knitting (can use `new.env()` to guarantee an empty new environment). `run_pandoc` An option for whether to run pandoc to convert Markdown output. `quiet` An option to suppress printing during rendering from knitr, pandoc command line and others. To only suppress printing of the last "Output created: " message, you can set `rmarkdown.render.message` to `FALSE` `encoding` Ignored. The encoding is always assumed to be UTF-8.
`devtools_pkgs`	`character` names of packages that are currently loaded with devtools (and should be loaded on slurm instance). The paths to these packages will be passed on down.
`global_objects`	A character vector containing the name of R objects to be saved in a .RData file and loaded on each cluster node prior to calling `f`.
`pkgs`	A character vector containing the names of packages that must be loaded on each cluster node. By default, it includes all packages loaded by the user when `slurm_call` is called.
`slurm_options`	A named list of options recognized by `sbatch`; see Details below for more information.
`use_sbatch`	`logical` whether to use `render_batch()` or `rmarkdown::render()`

Details

This function creates a temporary folder ("_rslurm_[jobname]") in the current directory, holding .RData and .RDS data files, the R script to run and the Bash submission script generated for the Slurm job.

The names of any other R objects (besides params) that f needs to access should be listed in the global_objects argument.

Use slurm_options to set any option recognized by sbatch, e.g. slurm_options = list(time = "1:00:00", share = TRUE). See http://slurm.schedmd.com/sbatch.html for details on possible options. Note that full names must be used (e.g. "time" rather than "t") and that flags (such as "share") must be specified as TRUE. The "job-name", "ntasks" and "output" options are already determined by slurm_call and should not be manually set.

When processing the computation job, the Slurm cluster will output two files in the temporary folder: one with the return value of the function ("results_0.RDS") and one containing any console or error output produced by R ("slurm_[node_id].out").

If submit = TRUE, the job is sent to the cluster and a confirmation message (or error) is output to the console. If submit = FALSE, a message indicates the location of the saved data and script files; the job can be submitted manually by running the shell command sbatch submit.sh from that directory.

After sending the job to the Slurm cluster, slurm_call returns a slurm_job object which can be used to cancel the job, get the job status or output, and delete the temporary files associated with it. See the description of the related functions for more details.

Value

A slurm_job object containing the jobname and the number of nodes effectively used.

Functions

render: call rmarkdown::render using sbatch, or not, ignoring sbatch specific arguments

Examples

## Not run: 
create_exampleproject(skeleton_args = list(authors = 'you and me', project_type = 'scRNA', investigator = 'alligator', project_title = 'schit', navigate_rawdata = FALSE))
render_batch(slurm_options = list(time = "1:00:00", "mem-per-cpu" = "16gb", partition = "amcdavid", "cpus-per-task" = 1),
input = "01qc.Rmd",
params = list(tenx_root = NULL, tenx_h5 = 'scratch/AGG1/raw_feature_bc_matrix.h5', auto_filter = FALSE,
output_root='refined/01qc_nofilter', batch_var = 'tissue_source', citeseq_str = '_TotalA'),
output_file = '01qc_nofilter', output_dir = 'reports', quiet = TRUE)

## End(Not run)

amcdavid/Genesee documentation built on April 14, 2022, 5:16 a.m.