In tgirke/systemPipeRdata: systemPipeRdata: Workflow templates and sample data

Outline

Introduction
Motivation
Design
Templates
Getting started

Introduction

systemPipeR is an R package for building end-to-end analysis pipelines with automated report generation for next generation NGS applications [@Girke2014-oy].
Important features:
- Support for R and command-line software, such as NGS aligners, peak callers, variant callers, etc.
- Runs on single machines and compute clusters with schedulers
- Uniform sample handling and annotation

Outline

Introduction
Motivation
Design
Templates
Getting started

Motivation

Many NGS applications share several analysis routines, such as:
- Read QC and preprocessing
- Alignments
- Quantification
- Feature annotations
- Enrichment analysis
Thus, a common workflow environment has many advantages for improving efficiency, standardization and reproducibility

Advantages of `systemPipeR`

Facilitates design of complex NGS workflows involving multiple R/Bioconductor packages [@Huber2015-ag].
Makes NGS analysis with Bioconductor utilities more accessible to new users
Simplifies usage of command-line software from within R
Reduces complexity of using compute clusters for R and command-line software
Accelerates runtime of workflows via parallelization on computer systems with mutiple CPU cores and/or multiple compute nodes
Automates generation of analysis reports to improve reproducibility

Outline

Introduction
Motivation
Design
Templates
Getting started

Workflow design in `systemPipeR` {.flexbox .vcenter .smaller}

Drawing

Workflow steps with input/output file operations are controlled by SYSargs objects.
Each SYSargs instance is constructed from a targets file and a param file.
Only input provided by user is initial targets file. Subsequent targets instances are created automatically.
Any number of predefined or custom workflow steps are supported.

Outline

Introduction
Motivation
Design
Templates
Getting started

`systemPipeRdata`: template workflows

Helper package to generate with a single command NGS workflow templates for systemPipeR.
Includes sample data for testing.
User can create new workflows or change and extend existing ones.

RNA-Seq workflow template

Read preprocessing
- Quality filtering (trimming)
- FASTQ quality report
Alignments: rsubread, Bowtie2/Tophat2
- Available soon: alignment free approaches with Kallisto via artemis or sleuth
Alignment statistics
Read counting per annotation
Sample-wise correlation analysis
DEG analysis with edgeR or DESeq2
Enrichment analysis of GO terms or other annotation types
Gene-wise cluster analysis

VAR-Seq workflow template

Read preprocessing
- Quality filtering (trimming)
- FASTQ quality report
Alignments: gsnap, bwa
Alignment statistics
Variant calling: VariantTools, GATK, BCFtools
Variant filtering: VariantTools and VariantAnnotation
Variant annotation: VariantAnnotation
Combine results from many samples
Summary statistics of samples

ChIP-Seq workflow template

Read preprocessing
- Quality filtering and/or trimming
- FASTQ quality report
Alignments: rsubread, Bowtie2
Alignment statistics
Genome-wide coverage statistics
Peak calling: MACS2, BayesPeak
Peak annotation with genomic context
Differential binding analysis
Enrichment analysis of GO terms or other annotation types
Motif analysis

Ribo-Seq workflow template {.smaller}

Read preprocessing
- Adaptor trimming and quality filtering
- FASTQ quality report
Alignments: Tophat2 (or any other RNA-Seq aligner)
Alignment stats
Compute read distribution across genomic features
Adding custom features to workflow (e.g. uORFs)
Genomic read coverage along transcripts
Read counting
Sample-wise correlation analysis
Analysis of differentially expressed genes (DEGs)
GO term enrichment analysis
Gene-wise clustering
Differential ribosome binding (translational efficiency)

Coming soon

Workflow templates for:

miRNA-Seq
BS-Seq

Outline

Introduction
Motivation
Design
Templates
Getting Started

Install and load packages {.smaller}

Install required packages

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeR") # Install systemPipeR from Bioconductor
BiocManager::install("tgirke/systemPipeRdata", build_vignettes=TRUE, dependencies=TRUE) # From github

Load packages and accessing help

library("systemPipeR"); library("systemPipeRdata")

library("systemPipeR")
library("systemPipeRdata")

Access help

library(help="systemPipeR")
vignette("systemPipeR")

`Targets` file organizes samples {.smaller}

Structure of targets file for single-end (SE) library

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:5]

Structure of targets file for paired-end (PE) library

targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:4]

`SYSargs`: `targets` & `param` {.smaller}

SYSargs instances are constructed from a targets file and a param file. The param file contains the settings for running command-line software.

parampath <- system.file("extdata", "tophat.param", package="systemPipeR")
(args <- suppressWarnings(systemArgs(sysma=parampath, mytargets=targetspath)))

Slots and accessor functions have the same names

names(args)[c(5,8,13)]

Return command-line arguments for given software, here Tophat2 for 1st sample.

sysargs(args)[1]

## tophat -p 4 -o SRR446027_1.fastq.tophat tair10.fasta SRR446027_1.fastq .SRR446027_2.fastq

Run on single machines or clusters

Run command-line tool, here Tophat2, on single machine. Command-line tool needs to be installed for this.

runCommandline(args)

Submit command-line or R processes to a computer cluster with a queueing system.

clusterRun(args, ...)

The last step requires additional resource allocation arguments. For details please visit the main manual here.

Workflow templates

Generate workflow template, e.g. "rnaseq", "varseq" or "chipseq"

### <b>
genWorkenvir(workflow="varseq", mydirname=NULL)
### </b>
setwd("varseq")

Command-line alternative for generating workflow environments ```{.sh generate_workenvir_from_shell, eval=FALSE, cache=TRUE} $ echo 'library(systemPipeRdata); genWorkenvir(workflow="varseq", mydirname=NULL)' | R --slave

## Workflow template structure

The workflow templates generated by _`genWorkenvir`_ contain the following preconfigured directory structure:
<br></br>
```r
### <b>
workflow_name/            # *.Rnw/*.Rmd scripts, targets file, etc.
                param/    # parameter files for command-line software 
                data/     # inputs e.g. FASTQ, reference, annotations
                results/  # analysis result files
### </b>

The above structure can be customized as needed, but for first-time users it is easier to keep changes to a minimum.

Run workflows

Next, run from within R the chosen sample workflow by executing the code provided in the corresponding *.Rnw template file (or *.Rmd or *.R versions).
Alternatively, one can run an entire workflow from start to finish with a single command by executing from the command-line: {.sh run_make, eval=FALSE} $ make -B
Analysis reports in PDF or HTML format are autogenerated when running a workflow using standard R resources for scientific report generation including knitr and rmarkdown, respectively.
Integration of ReportingTools is also straightforward.

Continue here

Overview Vignette

Future development

Workflow templates with support for both PDF (.Rnw) and HTML (.Rmd) reports
Workflow templates for additional NGS applications (see here)
docopt support for generating .param files
Additional visualization functions
Streamline support of very complex experimental designs

References {.smaller}

tgirke/systemPipeRdata documentation built on Oct. 24, 2024, 9:49 p.m.

tgirke/systemPipeRdata
systemPipeRdata: Workflow templates and sample data

In tgirke/systemPipeRdata: systemPipeRdata: Workflow templates and sample data

Outline

Introduction

Outline

Motivation

Advantages of `systemPipeR`

Outline

Workflow design in `systemPipeR` {.flexbox .vcenter .smaller}

Outline

`systemPipeRdata`: template workflows

RNA-Seq workflow template

VAR-Seq workflow template

ChIP-Seq workflow template

Ribo-Seq workflow template {.smaller}

Coming soon

Outline

Install and load packages {.smaller}

`Targets` file organizes samples {.smaller}

`SYSargs`: `targets` & `param` {.smaller}

Run on single machines or clusters

Workflow templates

Run workflows

Continue here

Future development

References {.smaller}

R Package Documentation

Browse R Packages

We want your feedback!

tgirke/systemPipeRdata systemPipeRdata: Workflow templates and sample data

In tgirke/systemPipeRdata: systemPipeRdata: Workflow templates and sample data

Outline

Introduction

Outline

Motivation

Advantages of systemPipeR

Outline

Workflow design in systemPipeR {.flexbox .vcenter .smaller}

Outline

systemPipeRdata: template workflows

RNA-Seq workflow template

VAR-Seq workflow template

ChIP-Seq workflow template

Ribo-Seq workflow template {.smaller}

Coming soon

Outline

Install and load packages {.smaller}

Targets file organizes samples {.smaller}

SYSargs: targets & param {.smaller}

Run on single machines or clusters

Workflow templates

Run workflows

Continue here

Future development

References {.smaller}

R Package Documentation

Browse R Packages

We want your feedback!

tgirke/systemPipeRdata
systemPipeRdata: Workflow templates and sample data

Advantages of `systemPipeR`

Workflow design in `systemPipeR` {.flexbox .vcenter .smaller}

`systemPipeRdata`: template workflows

`Targets` file organizes samples {.smaller}

`SYSargs`: `targets` & `param` {.smaller}