rSubmmiter is a package that allows simple communication between R and a SLURM cluster to achieve three main tasks:
This section demonstrates the simplest way to implement these tasks.
Currently rSubmitter can only be installed via the R devtools package:
library("devtools") install_github("pablo-gar/rSubmitter")
The creation, management and submission of individual jobs is achieved by the use of the Job
class.
The first step is to create an object of the Job
class by using the Job$new()
constructor method. The only required argument is a character vector, where each element is an independet shell command.
Then you can submit the job to the SLURM cluster using the $submit()
method of a Job
object. And you can use the $wait()
method to wait until a job is completed or failed.
On its simplest form the submission of a job looks like this
library("rSubmitter") myJob <- Job$new(commandVector = c("echo hola world!", "sleep 30")) myJob$submit() myJob$wait() # 2018-04-23 18:22:35 --- Cluster Status | PENDING = 1 | # 2018-04-23 18:22:40 --- Cluster Status | RUNNING = 1 | # 2018-04-23 18:23:30 --- Cluster Status | COMPLETED = 1 |
The STDERR and STDOUT are streamed to files of the form rSubmitter_job_[randomNumber].[err|out]
in the current working directory.
Refer to the advanced section if you wish to select the destination and names of these files, as well as to set the system requirements of memory, time, cpus, nodes, etc.
You can also access the full documentation of the Job class from R:
?Job
To submit many jobs at once in an efficient way you can use the JobArray
class that implments the SLURM arrays. Similar to the Job
class, you first create a JobArray object with the JobArray$new()
constructor method. The difference is that you have to pass a list of character vectors, where each element of the list is a vector of commands which wil be submmitted as independent jobs. All jobs in the array will have the same system requirements of memory, cpus, etc.
Identically to the Job
class, you can submit the array using the $submit()
method, and you can use the $wait()
method to wait until all jobs are completed or one or more fail.
library("rSubmitter") commands <- list(c( # First Job "echo hola", "sleep 20" ), c( # Second Job "echo adios", "sleep 60" ) ) jobArray <- JobArray$new(commandList = commands) jobArray$submit() jobArray$wait() # 2018-04-25 17:49:30 --- Cluster Status | PENDING = 2 | # 2018-04-25 17:49:45 --- Cluster Status | RUNNING = 2 | # 2018-04-25 17:50:50 --- Cluster Status | COMPLETED = 2 |
The STDERR and STDOUT are streamed to files of the form rSubmitter_job_[randomNumber]_[jobNumber].[err|out]
in the current working directory.
Refer to the advanced section if you wish to select the destination and names of these files, as well as to set the system requirements of memory, time, cpus, nodes, etc.
You can also access the full documentation of the JobArray class from R:
?JobArray
lapply is an R-base function that applies a function FUN
to each element of a list or vector (similar to map()
in python). It's one of the R ways of performing loops.
For example this:
x <- lapply(1:4, as.character)
Is equivalent to:
x <- list() for(i in 1:4) x[[i]] <- as.character(i)
rSubmitter comes with an implementation of lapply, superApply
, that enables parallelization of lapply function calls using a SLURM cluster. Under the hood, it implements the JobArray class (described above) along with some complicated file management to partition one lapply call into many indepent ones that are parallely executed, then when they are all done the results of indivudal calls are gathered and returned to the user.
The number of parallel tasks is defined in the tasks
argument of superApply
, on its simplest form a parallel lapply call looks like this:
myFun <- function(x) { return(rep(x, 3)) } library("rSubmitter") x <- superApply(1:100, myFun, tasks = 4) # 2018-05-04 15:29:34 Partitioning function calls # 2018-05-04 15:29:35 Submmiting parallel Jobs # 2018-05-04 15:29:40 --- Cluster Status | PENDING = 4 | # 2018-05-04 15:29:45 --- Cluster Status | COMPLETED = 4 | # 2018-05-04 15:29:45 Merging parellel results # 2018-05-04 15:29:45 Merge done # 2018-05-04 15:29:46 Cleaning partitioned data # 2018-05-04 15:29:47 Cleaning done
This will partition the lapply call into 4 different ones, each having an equally distributed elements of 1:100
, these partitions will be submitted as independents jobs, the results will be compiled and returned without the need of doing anything else.
It is important to note that not all systems come with an out-of-the-box R executable, and since superApply performs independent R calls in the SLURM cluster then you have to make sure that these R calls can be executed. If you have to execute any commands from your shell enviroment to be able to open the R interpreter, then you have to include those commands in the extraBashLines
argument of superApply
For example, if in order to execute R in your system you usally do:
$ module load R $ R
Then you have to execute superApply like this:
x <- superApply(1:100, myFun, tasks = 4, extraBashLines = "module load R")
You can see the full documentation of superApply from R:
?superApply
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.