trim_fastq: An R-based wrapper for fastp

Description Usage Arguments Details fastp path References

Description

Run the fastp tool

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
trim_fastq(fastp = "fastp", fastq1, fastq2 = NULL, dest.dir = NULL,
  disable_adapter_trimming = FALSE, adapter_sequence = NULL,
  adapter_sequence_r2 = NULL, trim_front1 = 0, trim_front2 = 0,
  trim_tail1 = 0, trim_tail2 = 0, trim_poly_g = FALSE,
  poly_g_min_len = 10, trim_poly_x = FALSE, poly_x_min_len = 10,
  cut_by_quality5 = FALSE, cut_by_quality3 = FALSE,
  cut_window_size = 4, cut_mean_quality = 20,
  disable_quality_filtering = FALSE, qualified_quality_phred = 15,
  unqualified_percent_limit = 40, n_base_limit = 5,
  disable_length_filtering = FALSE, length_required = 15,
  length_limit = 0, low_complexity_filter = FALSE,
  complexity_threshold = 30, filter_by_index1 = NULL,
  filter_by_index2 = NULL, filter_by_index_threshold = 0,
  correction = FALSE, overlap_len_require = 30,
  overlap_diff_limit = 5, overrepresentation_analysis = FALSE,
  overrepresentation_sampling = 20, threads = NULL)

Arguments

fastp

a character string specifying the path to the fastp executable. [DEFAULT = "fastp"].

fastq1

a character vector indicating the read files to be trimmed.

fastq2

(optional) a character vector indicating read files to be trimmmed. If specified, it is assumed the reads are paired, and this vector MUST be in the same order as those listed in fastq1. If NULL then it is assumed the reads are single-end. [DEFAULT = NULL]

dest.dir

a character string specifying the output directory. If NULL a directory named "TRIMMED_FASTQC" is created in the current working directory. [DEFAULT = NULL].

disable_adapter_trimming

logical, if TRUE adapter trimming is disabled. [DEFAULT = FALSE]

adapter_sequence

character string, specifying the adapter for read1. For SE data, if not specified, the adapter will be auto-detected. For PE data, this is used if R1/R2 are found not overlapped. [DEFAULT = NULL]

adapter_sequence_r2

character string, the adapter for read2 (PE data only). This is used if R1/R2 are found not overlapped. If not specified, it will be the same as <adapter_sequence>. [DEFAULT = NULL]

trim_front1

integer specifying number of bases to trim at 5' end for read1 [DEFAULT = 0]

trim_front2

integer specifying number of bases to trim at 5' end for read2 [DEFAULT = 0]

trim_tail1

integer specifying number of bases to trim at 3' end for read1 [DEFAULT = 0]

trim_tail2

integer specifying number of bases to trim at 3' end for read2 [DEFAULT = 0]

trim_poly_g

logical, if TRUE, force polyG tail trimming [DEFAULT = FALSE].

poly_g_min_len

integer specifying the minimum length to detect polyG in the read tail. [DEFAULT = 10]

trim_poly_x

logical, f TRUE, enable polyX trimming in 3' ends. [DEFAULT = FALSE]

poly_x_min_len

integer specifying the minimum length to detect polyX in the read tail. [DEFAULT = 10]

cut_by_quality5

logical, if TRUE enable per read cutting by quality at 5' end (WARNING: this will interfere deduplication for both PE/SE data) [DEFAULT = FALSE]

cut_by_quality3

logical, if TRUE enable per read cutting by quality at 3' end (WARNING: this will interfere deduplication for both SE data) [DEFAULT = FALSE]

cut_window_size

integer specifying the base pair size of the sliding window for sliding window trimming [DEFAULT = 4]

cut_mean_quality

integer specifying the mean phred quality threshold within a sliding window for removing bases [DEFAULT = 20]

disable_quality_filtering

logical, if TRUE then quality filtering is enabled. [DEFAULT = TRUE]

qualified_quality_phred

integer specifying the base quality threshold. [DEFAULT = 15]

unqualified_percent_limit

numeric specifying the percentage of bases allowed to be below the threshold before a read/pair is discarded. [DEFAULT = 40]

n_base_limit

integer specifying the number of allowable uncallable reads (N) before a read/pair is discarded. [DEFAULT = 5]

disable_length_filtering

logical, if TRUE then length filtering is enabled. [DEFAULT = TRUE]

length_required

integer specifying the length below which reads will be discarded. [DEFAULT = 15]

length_limit

integer specifying the length above which reads will be discarded; if 0 then no limit applied. [DEFAULT = 0]

low_complexity_filter

logical, if TRUE then enable low complexity filter. The complexity is defined as the percentage of base that is different from its next base (base[i] != base[i+1]). [DEFAULT = FALSE]

complexity_threshold

numeric specifying the threshold for the low complexity filter (0~100). [DEFAULT = 30]

filter_by_index1

character string specifying a file containing a list of barcodes of index1 to be filtered out, one barcode per line. [DEFAULT = NULL]

filter_by_index2

character string specifying a file containing a list of barcodes of index2 to be filtered out, one barcode per line. [DEFAULT = NULL]

filter_by_index_threshold

the allowed difference of index barcode for index filtering; 0 means completely identical. [DEFAULT = 0]

correction

logical, if TRUE theb enable base correction in overlapped regions (only for PE data). [DEFAULT = FALSE]

overlap_len_require

integer specifying the minimum length of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 30]

overlap_diff_limit

integer specifying the maximum difference of the overlapped region for overlap analysis based adapter trimming and correction. [DEFAULT = 5]

overrepresentation_analysis

logical, if TRUE then enable overrepresented sequence analysis. [DEFAULT = FALSE]

overrepresentation_sampling

integer specifying how reads will be computed for overrepresentation analysis, e.g. if set to 20, then 1-in029 reads will be sampled. May range from 1 to 10000; smaller is slower, [DEFAULT = 20]

threads

an integer value indicating the number of workers to be used. If NULL then one less than the maximum number of cores will be used. [DEFAULT = NULL].

Details

This script runs the fastp tool and requires installation of fastp. Pre-compiled binaries and installation instructions may be found at https://github.com/OpenGene/fastp

fastp path

If the executable is in $PATH, then the default value for paths ("fastp") will work. If it is not in $PATH, then the absolute path should be given. If using Windows 10, it is assumed that fastp has been installed in WSL, and the same rules apply.

References

Shifu Chen, Yanqing Zhou, Yaru Chen, Jia Gu (2018): fastp: an ultra-fast all-in-one FASTQ preprocessor. BioRxiv 274100; https://doi.org/10.1101/274100


anilchalisey/chompR documentation built on May 9, 2019, 3:59 a.m.