clusterRun: Submit command-line tools to cluster
In tgirke/systemPipeR: systemPipeR: Workflow Environment for Data Analysis and Report Generation

clusterRun

R Documentation

Submit command-line tools to cluster

Description

Submits non-R command-line software to queueing/scheduling systems of compute clusters using run specifications defined by functions similar to runCommandline. clusterRun can be used with most queueing systems since it is based on utilities from the batchtools package which supports the use of template files (*.tmpl) for defining the run parameters of the different schedulers. The path to the *.tmpl file needs to be specified in a conf file provided under the conffile argument.

Usage

clusterRun(args, 
            FUN = runCommandline, 
            more.args = list(args = args, make_bam = TRUE), 
            conffile = ".batchtools.conf.R", 
            template = "batchtools.slurm.tmpl", 
            Njobs, 
            runid = "01", 
            resourceList)

Arguments

`args`	Object of class `SYSargs` or `SYSargs2`.
`FUN`	Accepts functions such as `runCommandline(args, ...)` where the `args` argument is mandatory and needs to be of class `SYSargs` or `SYSargs2`.
`more.args`	Object of class `list`, which provides the arguments that control the `FUN` function.
`conffile`	Path to conf file (default location `./.batchtools.conf.R`). This file contains in its simplest form just one command, such as this line for the Slurm scheduler: `cluster.functions <- makeClusterFunctionsSlurm(template="batchtools.slurm.tmpl")`. For more detailed information visit this page: https://mllg.github.io/batchtools/index.html
`template`	The template files for a specific queueing/scheduling systems can be downloaded from here: https://github.com/mllg/batchtools/tree/master/inst/templates. Slurm, PBS/Torque, and Sun Grid Engine (SGE) templates are provided.
`Njobs`	Interger defining the number of cluster jobs. For instance, if `args` contains 18 command-line jobs and `Njobs=9`, then the function will distribute them accross 9 cluster jobs each running 2 command-line jobs. To increase the number of CPU cores used by each process, one can do this under the corresonding argument of the command-line tool, e.g. `-p` argument for Tophat.
`runid`	Run identifier used for log file to track system call commands. Default is `"01"`.
`resourceList`	`List` for reserving for each cluster job sufficient computing resources including memory (Megabyte), number of nodes, CPU cores, walltime (minutes), etc. For more details, one can consult the template file for each queueing/scheduling system.

Value

Object of class Registry, as well as files and directories created by the executed command-line tools.

Author(s)

Daniela Cassol and Thomas Girke

References

For more details on batchtools, please consult the following page: https://github.com/mllg/batchtools/

Examples

#########################################
## Examples with \code{SYSargs} object ##
#########################################
## Construct SYSargs object from param and targets files 
param <- system.file("extdata", "hisat2.param", package="systemPipeR")
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
args <- systemArgs(sysma=param, mytargets=targets)
args
names(args); modules(args); cores(args); outpaths(args); sysargs(args)

## Not run: 
## Execute SYSargs on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers. 
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=cores(args), memory=1024) 
reg <- clusterRun(args, FUN = runCommandline, 
                    more.args = list(args = args, make_bam = TRUE), 
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl", 
                    Njobs=18, runid="01", 
                    resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)
file.exists(outpaths(args))

## End(Not run)

##########################################
## Examples with \code{SYSargs2} object ##
##########################################
## Construct SYSargs2 object from CWl param, CWL input, and targets files 
targets <- system.file("extdata", "targets.txt", package="systemPipeR")
dir_path <- system.file("extdata/cwl", package="systemPipeR")
WF <- loadWorkflow(targets=targets, wf_file="hisat2/hisat2-mapping-se.cwl", 
                  input_file="hisat2/hisat2-mapping-se.yml", dir_path=dir_path)
WF <- renderWF(WF, inputvars=c(FileName="_FASTQ_PATH1_", SampleName="_SampleName_"))
WF
names(WF); modules(WF); targets(WF)[1]; cmdlist(WF)[1:2]; output(WF)

## Not run: 
## Execute SYSargs2 on multiple machines of a compute cluster. The following
## example uses the conf and template files for the Slurm scheduler. Please 
## read the instructions on how to obtain the corresponding files for other schedulers.  
file.copy(system.file("extdata", ".batchtools.conf.R", package="systemPipeR"), ".")
file.copy(system.file("extdata", "batchtools.slurm.tmpl", package="systemPipeR"), ".")
resources <- list(walltime=120, ntasks=1, ncpus=4, memory=1024) 
reg <- clusterRun(WF, FUN = runCommandline, 
                    more.args = list(args = WF, make_bam = TRUE),
                    conffile=".batchtools.conf.R", 
                    template="batchtools.slurm.tmpl",
                    Njobs=18, runid="01", resourceList=resources)

## Monitor progress of submitted jobs
getStatus(reg=reg)

## Updates the path in the object \code{output(WF)}
WF <- output_update(WF, dir=FALSE, replace=TRUE, extension=c(".sam", ".bam"))

## Alignment stats
read_statsDF <- alignStats(WF) 
read_statsDF <- cbind(read_statsDF[targets$FileName,], targets)
write.table(read_statsDF, "results/alignStats.xls", row.names=FALSE, 
                quote=FALSE, sep="\t")

## End(Not run)

tgirke/systemPipeR documentation built on June 13, 2025, 1:38 p.m.