module: Class to implement modules, which are standalone pieces of...

moduleR Documentation

Class to implement modules, which are standalone pieces of code that expect standard inputs and outputs

Description

Module constructor takes as input the path to a directory containing either a .module or hydrant.deploy file (for compatibility with firehose) The module file is scraped to look for a single line beginning with "command:" that specifies the code to be run which placeholders for inputs and specifying code that only writes data to <relative paths> at or below the current working directory.

Module('path.to.module.directory.with.module.or.hydrant.deploy.file')

The input placeholders are specified using the following syntax: $t Argument_Name. These inputs will eventually be attached to entity annotation.

There is also a line <libdir> that specifies the file path of the <module directory>. Since the code will be eventually executed in a Job specific output (hence the need for the module to only write to relative paths) there needs to be a way the code to refer to other files in the module directory. This is what <libdir> provides a handle to.

So an example module file for JabbA has a single line: command: sh <libdir>run.sh <libdir>run.jabba.R -l <libdir> -n $t TumorName -s $t SegFile -a $t CovFile -r $t RAfile -g $t SubSample -b $t NormalSegFile -k $t SlackPenaltyPerLooseEndCopy –iterate $t NumIterations –tfield $t TierFieldName –hets $t OptionalHetPileupOutput

Now, in most cases run.sh will be a pretty generic top level shell script that will set up the environment (eg loading additional libraries, setting environment variables) and pipe the remaining commands to an R, python, or perl script. In other cases this can be a script doing more of the "heavy lifting" in the task. This is up to the user. However the .module file itself should be a one-liner as above.

Regarding outputs, The module definitoin does not currently specify what are its output files (this may change). These are currently specified at the task level, where the task author (who presumably knows the module) will specify regexp to scrape specific files out of the Job output directory and attach their paths to the respective entity.

Usage

Module(...)

## S4 method for signature 'Job'
module(.Object)

Arguments

path

character path to module directory containing .module or .deploy file

Author(s)

Marcin Imielinski


mskilab/Flow documentation built on Jan. 12, 2023, 8:31 a.m.