Outline

Introduction

Outline

Motivation

Advantages of systemPipeR

Outline

Workflow design in systemPipeR {.flexbox .vcenter .smaller}

Drawing



Outline

systemPipeRdata: template workflows

RNA-Seq workflow template

  1. Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  2. Alignments: rsubread, Bowtie2/Tophat2
  3. Alignment statistics
  4. Read counting per annotation
  5. Sample-wise correlation analysis
  6. DEG analysis with edgeR or DESeq2
  7. Enrichment analysis of GO terms or other annotation types
  8. Gene-wise cluster analysis

VAR-Seq workflow template

  1. Read preprocessing
    • Quality filtering (trimming)
    • FASTQ quality report
  2. Alignments: gsnap, bwa
  3. Alignment statistics
  4. Variant calling: VariantTools, GATK, BCFtools
  5. Variant filtering: VariantTools and VariantAnnotation
  6. Variant annotation: VariantAnnotation
  7. Combine results from many samples
  8. Summary statistics of samples

ChIP-Seq workflow template

  1. Read preprocessing
    • Quality filtering and/or trimming
    • FASTQ quality report
  2. Alignments: rsubread, Bowtie2
  3. Alignment statistics
  4. Genome-wide coverage statistics
  5. Peak calling: MACS2, BayesPeak
  6. Peak annotation with genomic context
  7. Differential binding analysis
  8. Enrichment analysis of GO terms or other annotation types
  9. Motif analysis

Ribo-Seq workflow template {.smaller}

  1. Read preprocessing
    • Adaptor trimming and quality filtering
    • FASTQ quality report
  2. Alignments: Tophat2 (or any other RNA-Seq aligner)
  3. Alignment stats
  4. Compute read distribution across genomic features
  5. Adding custom features to workflow (e.g. uORFs)
  6. Genomic read coverage along transcripts
  7. Read counting
  8. Sample-wise correlation analysis
  9. Analysis of differentially expressed genes (DEGs)
  10. GO term enrichment analysis
  11. Gene-wise clustering
  12. Differential ribosome binding (translational efficiency)

Coming soon

Workflow templates for:

Outline

Install and load packages {.smaller}

Install required packages

if (!requireNamespace("BiocManager", quietly=TRUE))
    install.packages("BiocManager")
BiocManager::install("systemPipeR") # Install systemPipeR from Bioconductor
BiocManager::install("tgirke/systemPipeRdata", build_vignettes=TRUE, dependencies=TRUE) # From github

Load packages and accessing help

library("systemPipeR"); library("systemPipeRdata")
library("systemPipeR")
library("systemPipeRdata")

Access help

library(help="systemPipeR")
vignette("systemPipeR")

Targets file organizes samples {.smaller}

Structure of targets file for single-end (SE) library

targetspath <- system.file("extdata", "targets.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:5]

Structure of targets file for paired-end (PE) library

targetspath <- system.file("extdata", "targetsPE.txt", package="systemPipeR")
read.delim(targetspath, comment.char = "#")[1:3,1:4]

SYSargs: targets & param {.smaller}

SYSargs instances are constructed from a targets file and a param file. The param file contains the settings for running command-line software.

parampath <- system.file("extdata", "tophat.param", package="systemPipeR")
(args <- suppressWarnings(systemArgs(sysma=parampath, mytargets=targetspath)))

Slots and accessor functions have the same names

names(args)[c(5,8,13)]

Return command-line arguments for given software, here Tophat2 for 1st sample.

sysargs(args)[1]
## tophat -p 4 -o SRR446027_1.fastq.tophat tair10.fasta SRR446027_1.fastq .SRR446027_2.fastq

Run on single machines or clusters

Run command-line tool, here Tophat2, on single machine. Command-line tool needs to be installed for this.

runCommandline(args)

Submit command-line or R processes to a computer cluster with a queueing system.

clusterRun(args, ...) 

The last step requires additional resource allocation arguments. For details please visit the main manual here.

Workflow templates

Generate workflow template, e.g. "rnaseq", "varseq" or "chipseq"

### <b>
genWorkenvir(workflow="varseq", mydirname=NULL)
### </b>
setwd("varseq")



Command-line alternative for generating workflow environments ```{.sh generate_workenvir_from_shell, eval=FALSE, cache=TRUE} $ echo 'library(systemPipeRdata); genWorkenvir(workflow="varseq", mydirname=NULL)' | R --slave

## Workflow template structure

The workflow templates generated by _`genWorkenvir`_ contain the following preconfigured directory structure:
<br></br>
```r
### <b>
workflow_name/            # *.Rnw/*.Rmd scripts, targets file, etc.
                param/    # parameter files for command-line software 
                data/     # inputs e.g. FASTQ, reference, annotations
                results/  # analysis result files
### </b>



The above structure can be customized as needed, but for first-time users it is easier to keep changes to a minimum.

Run workflows

Continue here







Overview Vignette

Future development

References {.smaller}



tgirke/systemPipeRdata documentation built on Oct. 19, 2024, 7:49 p.m.