```{css, echo=FALSE} pre code { white-space: pre !important; overflow-x: scroll !important; word-break: keep-all !important; word-wrap: initial !important; }

<!---
- Compile from command-line
Rscript -e "rmarkdown::render('systemPipeRdata.Rmd', c('BiocStyle::html_document'), clean=F); knitr::knit('systemPipeRdata.Rmd', tangle=TRUE)"

-->

```r
BiocStyle::markdown()
options(width=60, max.print=1000)
knitr::opts_chunk$set(
    eval=as.logical(Sys.getenv("KNITR_EVAL", "TRUE")),
    cache=as.logical(Sys.getenv("KNITR_CACHE", "TRUE")), 
    tidy.opts=list(width.cutoff=60), tidy=TRUE)
suppressPackageStartupMessages({
    library(systemPipeRdata)
})

Note: the most recent version of this vignette can be found here.

Note: if you use systemPipeR and systemPipeRdata in published research, please cite:

Backman, T.W.H and Girke, T. (2016). systemPipeR: Workflow and Report Generation Environment. BMC Bioinformatics, 17: 388. 10.1186/s12859-016-1241-0.

Introduction

systemPipeRdata is a helper package to generate with a single command workflow templates that are intended to be used by its parent package systemPipeR [@H_Backman2016-bt]. The systemPipeR project provides a suite of R/Bioconductor packages for designing, building and running end-to-end analysis workflows on local machines, HPC clusters and cloud systems, while generating at the same time publication quality analysis reports.

To test workflows quickly or design new ones from existing templates, users can generate with a single command workflow instances fully populated with sample data and parameter files required for running a chosen workflow. Pre-configured directory structure of the workflow environment and the sample data used by systemPipeRdata are described here.

systemPipeRdata package provides a demo sample FASTQ files used in the workflow reporting vignettes. The chosen data set SRP010938 obtains 18 paired-end (PE) read sets from Arabidposis thaliana [@Howard2013-fq]. To minimize processing time during testing, each FASTQ file has been subsetted to 90,000-100,000 randomly sampled PE reads that map to the first 100,000 nucleotides of each chromosome of the A. thalina genome. The corresponding reference genome sequence (FASTA) and its GFF annotation files (provided in the same download) have been truncated accordingly. This way the entire test sample data set requires less than 200MB disk storage space. A PE read set has been chosen for this test data set for flexibility, because it can be used for testing both types of analysis routines requiring either SE (single-end) reads or PE reads.

Getting started

Installation

The systemPipeRdata package is available at Bioconductor and can be installed from within R as follows:

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeRdata") 

Also, it is possible to install the development version from Bioconductor.

BiocManager::install("systemPipeRdata", version = "devel", build_vignettes=TRUE,
                     dependencies=TRUE)  # Installs Devel version from Bioconductor

Loading package and documentation

library("systemPipeRdata") # Loads the package
library(help="systemPipeRdata") # Lists package info
vignette("systemPipeRdata") # Opens vignette

Starting with pre-configured workflow templates

Load one of the available workflows into your current working directory. The following does this for the rnaseq workflow template. The name of the resulting workflow directory can be specified under the mydirname argument. The default NULL uses the name of the chosen workflow. An error is issued if a directory of the same name and path exists already.

genWorkenvir(workflow="systemPipeR/SPrnaseq", mydirname="rnaseq")
setwd("rnaseq")

On Linux and OS X systems the same can be achieved from the command-line of a terminal with the following commands.

```{bash generate_workenvir_from_shell, eval=FALSE} $ Rscript -e "systemPipeRdata::genWorkenvir(workflow='systemPipeR/SPrnaseq', mydirname='rnaseq')"

## Build, run and visualize the workflow template

- Build workflow from RMarkdown file

This template provides some common steps for a `RNAseq` workflow. One can add, remove, modify 
workflow steps by operating on the `sal` object. 

```r
sal <- SPRproject() 
sal <- importWF(sal, file_path = "systemPipeVARseq.Rmd", verbose = FALSE)

Next, we can run the entire workflow from R with one command:

sal <- runWF(sal)

systemPipeR workflows instances can be visualized with the plotWF function.

plotWF(sal)

systemPipeR compiles all the workflow execution logs in one central location, making it easier to check any standard output (stdout) or standard error (stderr) for any command-line tools used on the workflow or the R code stdout.

sal <- renderLogs(sal)

Also, the technical report can be generated using renderReport function.

sal <- renderReport(sal)

Workflow templates collection

A collection of workflow templates are available, and it is possible to browse the current availability, as follows:

availableWF(github = TRUE)

This function returns the list of workflow templates available within the package and systemPipeR Organization on GitHub. Each one listed template can be created as described above.

The workflow template choose from Github will be installed as an R package, and also it creates the environment with all the settings and files to run the demo analysis.

genWorkenvir(workflow="systemPipeR/SPrnaseq", mydirname="NULL")
setwd("SPrnaseq")

Besides, it is possible to choose different versions of the workflow template, defined through other branches on the GitHub Repository. By default, the master branch is selected, however, it is possible to define a different branch with the ref argument.

genWorkenvir(workflow="systemPipeR/SPrnaseq", ref = "singleMachine")
setwd("SPrnaseq")

Download a specific R Markdown file

Also, it is possible to download a specific workflow script for your analysis. The URL can be specified under url argument and the R Markdown file name in the urlname argument. The default NULL copies the current version available in the chose template.

genWorkenvir(workflow="systemPipeR/SPrnaseq", url = "https://raw.githubusercontent.com/systemPipeR/systemPipeRNAseq/cluster/vignettes/systemPipeRNAseq.Rmd", 
             urlname = "rnaseq_V-cluster.Rmd")
setwd("rnaseq")

Dynamic generation of workflow template

It is possible to create a new workflow structure from RStudio menu File -> New File -> R Markdown -> From Template -> systemPipeR New WorkFlow. This interactive option creates the same environment as demonstrated above.

Figure 1: Selecting workflow template within RStudio.

Directory Structure

The workflow templates generated by genWorkenvir contain the following preconfigured directory structure:

Note: Directory names are indicated in green. Users can change this structure as needed, but need to adjust the code in their workflows accordingly.

Figure 2: systemPipeR's preconfigured directory structure.

Return paths to sample data

The location of the sample data provided by systemPipeRdata can be returned as a list.

pathList()[1:2]

Version information

sessionInfo()

Funding

This project was supported by funds from the National Institutes of Health (NIH) and the National Science Foundation (NSF).

References



tgirke/systemPipeRdata documentation built on Oct. 19, 2023, 9:20 p.m.