superApply: Efficient parallel lapply using a SLURM cluster

Description Usage Arguments Details Value Examples

View source: R/superApply.R

Description

An easy-to-use form of lapply that emulates parallelization using a SLURM cluster.

Usage

1
2
3
4
superApply(x, FUN, ..., tasks = 1, workingDir = getwd(), packages = NULL,
  sources = NULL, extraBashLines = NULL, extraScriptLines = "",
  clean = T, partition = NULL, time = NULL, mem = NULL, proc = NULL,
  totalProc = NULL, nodes = NULL, email = NULL)

Arguments

x

vector/list - FUN will be applied to the elements of this

FUN

function - function to be applied to each element of x

...

further arguments of FUN

tasks

integer - number of individual parallel jobs to execute

workingDir

string - path to folder that will contain all the temporary files needed for submission, execution, and compilation of inidivudal jobs

packages

character vector - package names to be loaded in individual tasks

sources

character vector - paths to R code to be loaded in individual tasks

extraBashLines

character vector - each element will be added as a line to the inidividual task execution bash script before R gets executed. For instance, here you may want to load R if it is not in your system by default

extraScriptLines

character vector - each element will be added as a line to the individual task execution R script before starting lapply

clean

logical - if TRUE all files created in workingDir will be deleted

partition

character - Partition to use. Equivalent to --partition of SLURM sbatch

time

character - Time requested for job execution, one accepted format is "HH:MM:SS". Equivalent to --time of SLURM sbatch

mem

character - Memory requested for job execution, one accepted format is "xG" or "xMB". Equivalent to --mem of SLURM sbatch

proc

integer - Number of processors requested per task. Equivalent to --cpus-per-task of SLURM sbatch

totalProc

integer - Number of tasks requested for job. Equivalent to --ntasks of SLURM sbatch

nodes

integer - Number of nodes requested for job. Equivalent to --nodes of SLURM sbatch

Details

Mimics the functionality of lapply but implemented in a way that iterations can be submmitted as one or more individual jobs to a SLURM cluster. Each job batch, err, out, and script files are stored in a temporary folder. Once all jobs have been submmitted, the function waits for them to finish. When they are done executing, all results from individual jobs will be compiled into a single list.

Value

list - results of FUN applied to each element in x

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
## Not run: 
#------------------------
# Parallel execution of 100 function calls using 4 parellel tasks
myFun <- function(x) {
    #Sys.sleep(10)
    return(rep(x, 3))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G")


#------------------------
# Parallel execution of 100 function calls using 100  parellel tasks
sapOut <- superApply(1:100, FUN = myFun, tasks = 100, workingDir = "~/testSap", time = "60", mem = "1G")


#------------------------
# Parallel execution where a package is required in function calls
myFun <- function(x) {
    return(ggplot(data.frame(x = 1:100, y = (1:100)*x), aes(x = x, y = y )) + geom_point() + ylim(0, 1e4))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", packages = "ggplot2",  time = "60", mem = "1G")


#------------------------
# Parallel execution where R has to be loaded in the system (e.g. in bash `module load R`)
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, workingDir = "~/testSap", time = "60", mem = "1G", extraBashLines = "module load R")


#------------------------
# Parellel execution where a source is required in funciton calls
# Content of ./customRep.R
   customRep <- function(x) {
           return(paste("customFunction", rep(x, 3)))
   }
# Super appply execution 
myFun <- function(x) {
    return(customRep(x))
}

dir.create("~/testSap")
sapOut <- superApply(1:100, FUN = myFun, tasks = 4, sources = "./customRep.R", workingDir = "~/testSap", time = "60", mem = "1G")


## End(Not run)

pablo-gar/rSubmitter documentation built on Jan. 26, 2020, 2:08 a.m.