generate_slurm_indexes: Generate all indexes for the abundance quantification step

View source: R/runCallsSlurm.R

generate_slurm_indexesR Documentation

Generate all indexes for the abundance quantification step

Description

Check all unique lines of the input file to check which indexes have to be generated beore running all abundance quantification. This function is meant to be used with a cluster where the Slurm queuing system is installed. This step has to be run before the quantification otherwise indexes will be created for each abundance quantification. This will slow down the abundance quantification and can generate errors when writting the same file at the same time from different nodes. This function also generate tx2gene and gene2biotype mapping files.

Usage

generate_slurm_indexes(
  kallistoMetadata = new("KallistoMetadata"),
  bgeeMetadata = new("BgeeMetadata"),
  userMetadata = new("UserMetadata"),
  userFile,
  submit_sh_template = NULL,
  slurm_options = NULL,
  rscript_path = NULL,
  modules = NULL,
  submit = TRUE,
  nodes = 10
)

Arguments

kallistoMetadata

A Reference Class KallistoMetadata object (optional) allowing to tune your gene quantification abundance analyze. If no object is provided a new one will be created with default values.

bgeeMetadata

A Reference Class BgeeMetadata object (optional) allowing to choose the version of reference intergenic sequences. If no object is provided a new one will be created with default values.

userMetadata

A Class UserMetadata object (optional). If no object is provided a new one will be created with default values.

userFile

Path to the file where each line corresponds to one abundance quantification to be run. The structure of the file is the same than the 'userFile' used as input of the 'generate_calls_workflow' function. A template of this file can be loaded with the command : “'inputFile <- read.table(system.file("userMetadataTemplate.tsv", package = "BgeeCall"), header = TRUE)“' It is important to keep the same column names.

submit_sh_template

A template of the bash script used to submit the jobs. By default the submition script provided by rslurm is used. Modify only if module dependancies have to be added (like kallisto or R)

slurm_options

A named list of options recognized by sbatch. More details in the documentation of the rslurm::slurm_apply function

rscript_path

The location of the Rscript command. If not specified, defaults to the location of Rscript within the R installation being run.

modules

A list of modules you want to load in the invironment. Should stay empty except if you need to load R and/or kallisto (e.g module add R)

submit

Whether or not to submit the job to the cluster with sbatch. Default value is TRUE

nodes

The (maximum) number of cluster nodes to spread the calculation over. slurm_apply automatically divides params in chunks of approximately equal size to send to each node. Less nodes are allocated if the parameter set is too small to use all CPUs on the requested nodes. By default this number is 10.

Value

generate index files

Examples

## Not run: 
# use function with all default values
userFile <- "/path/to/userList.tsv"
sjobs <- generate_slurm_indexes(userFile = userFile)

## End(Not run)

BgeeDB/BgeeCall documentation built on Nov. 10, 2023, 5:40 a.m.