An Ultra-Fast All-in-One FASTQ preprocessor
In Rfastp: An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

title: "An Ultra-Fast All-in-One FASTQ preprocessor" author: "Wei Wang periwinkle.david@gmail.com" date: "r format(Sys.Date(), '%m/%d/%Y')" package: Rfastp

output: BiocStyle::html_document: number_sections: yes toc: true vignette: > %\VignetteIndexEntry{Rfastp} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} %\usepackage[utf8]{inputenc}

bibliography: - fastp.bib

knitr::opts_chunk$set(tidy=FALSE, cache=FALSE,
                      #dev="png",
                      message=FALSE, error=FALSE, warning=TRUE)
options(width=100)

Introduction

The Rfastp package provides an interface to the all-in-one preprocessing for FastQ files toolkit fastp[@10.1093/bioinformatics/bty560].

Installation

Use the BiocManager package to download and install the package from Bioconductor as follows:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install("Rfastp")

If required, the latest development version of the package can also be installed from GitHub.

BiocManager::install("remotes")
BiocManager::install("RockefellerUniversity/Rfastp")

Once the package is installed, load it into your R session:

library(Rfastp)

FastQ Quality Control with rfastp

The package contains three example fastq files, corresponding to a single-end fastq file, a pair of paired-end fastq files.

se_read1 <- system.file("extdata","Fox3_Std_small.fq.gz",package="Rfastp")
pe_read1 <- system.file("extdata","reads1.fastq.gz",package="Rfastp")
pe_read2 <- system.file("extdata","reads2.fastq.gz",package="Rfastp")
outputPrefix <- tempfile(tmpdir = tempdir())

a normal QC run for single-end fastq file.

Rfastp support multiple threads, set threads number by parameter thread.

se_json_report <- rfastp(read1 = se_read1, 
    outputFastq = paste0(outputPrefix, "_se"), thread = 4)

a normal QC run for paired-end fastq files.

pe_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2,
    outputFastq = paste0(outputPrefix, "_pe"))

merge paired-end fastq files after QC.

pe_merge_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2, merge = TRUE,
    outputFastq = paste0(outputPrefix, '_unpaired'),
    mergeOut = paste0(outputPrefix, "_merged.fastq.gz"))

UMI processing

a normal UMI processing for 10X Single-Cell library.

umi_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2, 
    outputFastq = paste0(outputPrefix, '_umi1'), umi = TRUE, umiLoc = "read1",
    umiLength = 16)

Set a customized UMI prefix and location in sequence name.

the following example will add prefix string before the UMI sequence in the sequence name. An "_" will be added between the prefix string and UMI sequence. The UMI sequences will be inserted into the sequence name before the first space.

umi_json_report <- rfastp(read1 = pe_read1, read2 = pe_read2, 
    outputFastq = paste0(outputPrefix, '_umi2'), umi = TRUE, umiLoc = "read1",
    umiLength = 16, umiPrefix = "#", umiNoConnection = TRUE, 
    umiIgnoreSeqNameSpace = TRUE)

A QC example with customized cutoffs and adapter sequence.

Trim poor quality bases at 3' end base by base with quality higher than 5; trim poor quality bases at 5' end by a 29bp window with mean quality higher than 20; disable the polyG trimming, specify the adapter sequence for read1.

clipr_json_report <- rfastp(read1 = se_read1, 
    outputFastq = paste0(outputPrefix, '_clipr'),
    disableTrimPolyG = TRUE,
    cutLowQualFront = TRUE,
    cutFrontWindowSize = 29,
    cutFrontMeanQual = 20,
    cutLowQualTail = TRUE,
    cutTailWindowSize = 1,
    cutTailMeanQual = 5,
    minReadLength = 29,
    adapterSequenceRead1 = 'GTGTCAGTCACTTCCAGCGG'
)

multiple input files for read1/2 in a vector.

rfastq can accept multiple input files, and it will concatenate the input files into one and the run fastp.

pe001_read1 <- system.file("extdata","splited_001_R1.fastq.gz",
    package="Rfastp")
pe002_read1 <- system.file("extdata","splited_002_R1.fastq.gz",
    package="Rfastp")
pe003_read1 <- system.file("extdata","splited_003_R1.fastq.gz",
    package="Rfastp")
pe004_read1 <- system.file("extdata","splited_004_R1.fastq.gz",
    package="Rfastp")
inputfiles <- c(pe001_read1, pe002_read1, pe003_read1, pe004_read1)
cat_rjson_report <- rfastp(read1 = inputfiles, 
    outputFastq = paste0(outputPrefix, "_merged1"))

concatenate multiple fastq files.

catfastq concatenate all the input files into a new file.

pe001_read2 <- system.file("extdata","splited_001_R2.fastq.gz",
    package="Rfastp")
pe002_read2 <- system.file("extdata","splited_002_R2.fastq.gz",
    package="Rfastp")
pe003_read2 <- system.file("extdata","splited_003_R2.fastq.gz",
    package="Rfastp")
pe004_read2 <- system.file("extdata","splited_004_R2.fastq.gz",
    package="Rfastp")
inputR2files <- c(pe001_read2, pe002_read2, pe003_read2, pe004_read2)
catfastq(output = paste0(outputPrefix,"_merged2_R2.fastq.gz"), 
    inputFiles = inputR2files)

Generate report tables/plots

A data frame for the summary.

dfsummary <- qcSummary(pe_json_report)

a ggplot2 object of base quality plot.

p1 <- curvePlot(se_json_report)
p1

a ggplot2 object of GC Content plot.

p2 <- curvePlot(se_json_report, curve="content_curves")
p2

a data frame for the trimming summary.

dfTrim <- trimSummary(pe_json_report)

Miscellaneous helper functions

usage of rfastp:

?rfastp

usage of catfastq:

?catfastq

usage of qcSummary:

?qcSummary

usage of trimSummary:

?trimSummary

usage of curvePlot:

?curvePlot

Acknowledgments

Thank you to Ji-Dung Luo for testing/vignette review/critical feedback, Doug Barrows for critical feedback/vignette review and Ziwei Liang for their support.

Session info

sessionInfo()

References

Any scripts or data that you put into this service are public.

Rfastp documentation built on Nov. 8, 2020, 5:52 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Rfastp
An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

An Ultra-Fast All-in-One FASTQ preprocessor
In Rfastp: An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

Introduction

Installation

FastQ Quality Control with rfastp

a normal QC run for single-end fastq file.

a normal QC run for paired-end fastq files.

merge paired-end fastq files after QC.

UMI processing

a normal UMI processing for 10X Single-Cell library.

Set a customized UMI prefix and location in sequence name.

A QC example with customized cutoffs and adapter sequence.

multiple input files for read1/2 in a vector.

concatenate multiple fastq files.

catfastq concatenate all the input files into a new file.

Generate report tables/plots

A data frame for the summary.

a ggplot2 object of base quality plot.

a ggplot2 object of GC Content plot.

a data frame for the trimming summary.

Miscellaneous helper functions

Acknowledgments

Session info

References

Try the Rfastp package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Rfastp An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

An Ultra-Fast All-in-One FASTQ preprocessor In Rfastp: An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

Introduction

Installation

FastQ Quality Control with rfastp

a normal QC run for single-end fastq file.

a normal QC run for paired-end fastq files.

merge paired-end fastq files after QC.

UMI processing

a normal UMI processing for 10X Single-Cell library.

Set a customized UMI prefix and location in sequence name.

A QC example with customized cutoffs and adapter sequence.

multiple input files for read1/2 in a vector.

concatenate multiple fastq files.

catfastq concatenate all the input files into a new file.

Generate report tables/plots

A data frame for the summary.

a ggplot2 object of base quality plot.

a ggplot2 object of GC Content plot.

a data frame for the trimming summary.

Miscellaneous helper functions

Acknowledgments

Session info

References

Try the Rfastp package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

Rfastp
An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).

An Ultra-Fast All-in-One FASTQ preprocessor
In Rfastp: An Ultra-Fast and All-in-One Fastq Preprocessor (Quality Control, Adapter, low quality and polyX trimming) and UMI Sequence Parsing).