flowr
In flowr: Streamlining Design and Deployment of Complex Workflows

library(params)
library(flowr)

Installation

Requirements:

R version > 3.1, preferred 3.2

## for a latest released version (from CRAN)
install.packages("flowr", repos = CRAN="http://cran.rstudio.com")

## OR latest version
devtools::install_github("sahilseth/flowr", ref = "master")

After installation run setup(), this will copy the flowr's helper script to ~/bin. Please make sure that this folder is in your $PATH variable.

library(flowr)
setup()

Running flowr from the terminal should now show the following:

Usage: flowr function [arguments]

status          Detailed status of a flow(s).
rerun           rerun a previously failed flow
kill            Kill the flow, upon providing working directory
fetch_pipes     Checking what modules and pipelines are available; flowr fetch_pipes

Please use 'flowr -h function' to obtain further information about the usage of a specific function.

If you interested, visit funr's github page for more details

From this step on, one has the option of typing commands in a R console OR a bash shell (command line). For brevity we will show examples using the shell.

Test

Test a small pipeline on the cluster

This will run a three step pipeline, testing several different relationships between jobs. Initially, we can test this locally, and later on a specific HPCC platform.

## This may take about a minute or so.
flowr run x=sleep_pipe platform=local execute=TRUE
## corresponding R command:
run(x='sleep_pipe', platform='local', execute=TRUE)

If this completes successfully, we can try this on a computing cluster; where this would submit a few interconnected jobs.

Several platforms are supported out of the box (torque, moab, sge, slurm and lsf), you may use the platform variable to switch between platforms.

flowr run pipe=sleep_pipe platform=lsf execute=TRUE
## other options for platform: torque, moab, sge, slurm, lsf
## this shows the folder being used as a working directory for this flow.

Once the submission is complete, we can test the status using status() by supplying it the full path as recovered from the previous step.

flowr status x=~/flowr/runs/sleep_pipe-samp1-20150923-10-37-17-4WBiLgCm

## we expect to see a table like this when is completes successfully:

|               | total| started| completed| exit_status|status    |
|:--------------|-----:|-------:|---------:|-----------:|:---------|
|001.sleep      |     3|       3|         3|           0|completed |
|002.create_tmp |     3|       3|         3|           0|completed |
|003.merge      |     1|       1|         1|           0|completed |
|004.size       |     1|       1|         1|           0|completed |

## Also we expect a few files to be created:
ls ~/flowr/runs/sleep_pipe-samp1-20150923-10-37-17-4WBiLgCm/tmp
samp1_merged  samp1_tmp_1  samp1_tmp_2  samp1_tmp_3

## If both these checks are fine, we are all set !

There are a few places where things may go wrong, you may follow the advanced configuration guide for more details. Feel free to post questions on github issues page.

Advanced Configuration

HPCC Support Overview

Support for several popular cluster platforms is built-in. There is a template, for each platform, which should work out of the box. Further, one may copy and edit them (and save to ~/flowr/conf) in case some changes are required. Templates from this folder (~/flowr/conf), would override defaults.

Here are links to latest templates on github:

torque
lsf
moab
sge
slurm, needs testing

Not sure what platform you have?

You may check the version by running ONE of the following commands:

msub --version
## Version: **moab** client 8.1.1
man bsub
##Submits a job to **LSF** by running the specified
qsub --help

Here are some helpful guides and details on the platforms:

PBS: wiki
Torque: wiki
- MD Anderson
LSF wiki:
- Also used at Broad
SGE wiki
- A tutorial for Sun Grid Engine
- Another from JHSPH
- Dependency info here
SLURM:
- quick start: SLURM

Comparison_of_cluster_software

flowr configuration file

This needs expansion

flowr has a configuration file, with parameters regarding default paths, verboseness etc. flowr loads this default configuration from the package installation. In addition, to customize the parameters, simply create a tab-delimited file called ~/.flowr. An example of this file is available here

Additional files loaded if available:

(flow installation)/flowr.conf
(ngsflows installation)/ngsflows.conf
~/flowr/conf/flowr.conf
~/.flowr

Troubleshooting & FAQs

Errors in job submission

Possible issue: Jobs are not getting submitted

1. Check if the right platform was used for submission. 2. Confirm (with your system admin) that you have the privilege to submit jobs. 3. **Use a custom flowdef**: Many institutions have strict specification on the resource reservations. Make sure that the queue, memory, walltime, etc. requirements are specified properly 4. **Use a custom submission template**: There are several parameters in the submission script used to submit jobs to the cluster. You may customize this template to suit your needs.

3. Use a custom flowdef

We can copy an example flow definition and customize it to suit our needs. This a tab delimited text file, so make sure that the format is correct after you make any changes.

cd ~/flowr/pipelines
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/pipelines/sleep_pipe.def
## check the format
flowr as.flowdef x=~/flowr/pipelines/sleep_pipe.def

Run the test with a custom flowdef:

flowr run x=sleep_pipe execute=TRUE def=~/flowr/pipelines/sleep_pipe.def ## platform=lsf [optional, picked up from flowdef]

4. Use a custom submission template

If you need to customize the HPCC submission template, copy the file for your platform and make your desired changes. For example the MOAB based cluster in our institution does not accept the queue argument, so we need to comment it out.

Download the template for a specific HPCC platform into ~/flowr/conf

cd ~/flowr/conf ## flowr automatically picks up a template from this folder.
## for MOAB (msub)
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/moab.sh
## for Torque (qsub)
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/torque.sh
## for IBM LSF (bsub)
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/lsf.sh
## for SGE (qsub)
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/sge.sh
## for SLURM (sbatch) [untested]
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/slurm.sh

Make the desired changes using your favourite editor and submit again.

Possible issue: Jobs for subsequent steps are not submitting (though first step works fine).

1. **Confirm jobids are parsing fine**: Flowr parses the computing platform's output and extracts job IDs of submitted jobs. 2. **Check dependency string**:

1. Parsing job ids

Flowr parses job IDs to keep a log of all submitted jobs, and also to pass them along as a dependency to subsequent jobs. This is taken care by the parse_jobids() function. Each job scheduler shows the jobs id, when you submit a job, but it may show it in a slightly different fashion. To accommodate this one can use regular expressions as described in the relevant section of the flowr config.

For example LSF may show a string such as:

Job <335508> is submitted to queue <transfer>.
## test if it parses correctly
jobid="Job <335508> is submitted to queue <transfer>."
set_opts(flow_parse_lsf = ".*(\<[0-9]*\>).*")
parse_jobids(jobid, platform="lsf")
[1] "335508"

In this case 335508 was the job id and regex worked well !

Once we identify the correct regex for the platform you may update the configuration file with it.

cd ~/flowr/conf 
wget https://raw.githubusercontent.com/sahilseth/flowr/master/inst/conf/flowr.conf
## flowr automatically reads from this location, if you prefer to put it elsewhere, use
load_opts("flowr.conf") ## visit sahilseth.github.io/params for more details.

Update the regex pattern and submit again.

2. Check dependency string

After collecting job ids from previous jobs, flowr renders them as a dependency for subsequent jobs. This is handled by render_dependency.PLATFORM functions.

Confirm that the dependency parameter is specified correctly in the submission scripts:

wd=~/flowr/runs/sleep_pipe-samp1-20150923-11-20-39-dfvhp5CK ## path to the most recent submission
cat $wd/002.create_tmp/create_tmp_cmd_1.sh

Flowr Configuration file

Possible issue: Flowr shows too much OR too little information.

There are several verbose levels available (0, 1, 2, 3, ...)

One can change the verbose levels in this file (~/flowr/conf/flowr.conf) and check verbosity section in the help pages for more details.

Flowdef resource columns

Possible issue: What all resources are supported in the flow definition?

The resource requirement columns of flow definition are passed along to the final (cluster) submission script. For example values in cpu_reserved column would be populated as {{{CPU}}} in the submission template.

The following table provides a mapping between the flow definition columns and variables in the submission templates:

#extdata = file.path(system.file(package = "flowr"), "extdata")
mat = read_sheet("files/flow_def_columns.txt")
kable(mat, col.names = c("flowdef variable", "submission template variable"))

* These are generated on the fly and ** This is gathered from flow mat

Adding a new platform

Possible issue: Need to add a new platform

Adding a new platform involves a few steps, briefly we need to consider the following steps where changes would be necessary.

job submission: One needs to add a new template for the new platform. Several examples are available as described in the previous section.
parsing job ids: flowr keeps a log of all submitted jobs, and also to pass them along as a dependency to subsequent jobs. This is taken care by the parse_jobids() function. Each job scheduler shows the jobs id, when you submit a job, but each shows it in a slightly different pattern. To accommodate this one can use regular expressions as described in the relevant section of the flowr config.
render dependency: After collecting job ids from previous jobs, flowr renders them as a dependency for subsequent jobs. This is handled by render_dependency.PLATFORM functions.
recognize new platform: Flowr needs to be made aware of the new platform, for this we need to add a new class using the platform name. This is essentially a wrapper around the job class

Essentially this requires us to add a new line like: setClass("torque", contains = "job").

killing jobs: Just like submission flowr needs to know what command to use to kill jobs. This is defined in detect_kill_cmd function.

There are several job scheduling systems available and we try to support the major players. Adding support is quite easy if we have access to them. Your favourite not in the list? re-open this issue, with details on the platform: adding platforms

Possible issue: For other issues upload the error shown in the out files to [github issues tracker](https://github.com/flow-r/flowr/issues).

## outfiles end with .out, and are placed in a folder like 00X.<jobname>/
## here is one example:
cat $wd/002.create_tmp/create_tmp_cmd_1.out
## final script:
cat $wd/002.create_tmp/create_tmp_cmd_1.sh

Installation Error (Github)

devtools:::install_github(“sahilseth/flowr”)
error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

Solution:

This is basically a issue with httr (link) Try this:

install.packages("RCurl")
devtools:::install_github("sahilseth/flowr")

If not then try this: install.packages("httr");

library(httr);
set_config( config( ssl.verifypeer = 0L ) )
devtools:::install_github("sahilseth/flowr")

Any scripts or data that you put into this service are public.

flowr documentation built on March 3, 2021, 1:12 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

flowr
Streamlining Design and Deployment of Complex Workflows

flowr
In flowr: Streamlining Design and Deployment of Complex Workflows

Installation

Test

Advanced Configuration

HPCC Support Overview

flowr configuration file

Troubleshooting & FAQs

Errors in job submission

Flowr Configuration file

Flowdef resource columns

Adding a new platform

Installation Error (Github)

Try the flowr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

flowr Streamlining Design and Deployment of Complex Workflows

flowr In flowr: Streamlining Design and Deployment of Complex Workflows

Installation

Test

Advanced Configuration

HPCC Support Overview

flowr configuration file

Troubleshooting & FAQs

Errors in job submission

Flowr Configuration file

Flowdef resource columns

Adding a new platform

Installation Error (Github)

Try the flowr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

flowr
Streamlining Design and Deployment of Complex Workflows

flowr
In flowr: Streamlining Design and Deployment of Complex Workflows