make_cutadapt: Trims/filter fastq using external...

View source: R/make_cutadapt.R

make_cutadaptR Documentation

Trims/filter fastq using external cutadapt/fastq_quality_filter

Description

make_cuadapt cutadapt/fastq_quality_filter

Usage

make_cutadapt(input, output, parse = NULL, threads = 1)

Arguments

input

Character path to a directory containing input fastq-files. The script will recursively search this directory for the .fastq|.fastq.gz extension.

output

Character path to the output directory where trimmed fastq files will be stored and temporary files will be generated.

parse

List with two character string expressions. The first will be parsed to cutadapt while the other is be parsed to fastq_quality_filter. If any is NULL, then the function will not pass the command and the trimming or filtering will not be applied. Thus, if parse = list(cutadapt=NULL, fastq_quality_filter="-q 20 -p 80"), then only the quality filter will be applied.

threads

Integer stating the number of parallel jobs. Note, that reading multiple fastq files drains memory fast, using up to 10Gb per fastq file. To avoid crashing the system due to memory shortage, make sure that each thread on the machine have at least 10 Gb of memory availabe, unless your fastq files are very small. Use parallel::detectcores() to see available threads on the machine.

Details

Given a path to sequence files in fastq format this function will trim adaptor and remove sequences with low quality.

Value

Externally the function will generate trimmed and/or quality filtered fastq files in the output folder. Internally, a list of logs that can be used to generate a progress report is returned.

See Also

https://cutadapt.readthedocs.io/en/stable/ for download and documentation on cutadapt. http://hannonlab.cshl.edu/fastx_toolkit/commandline.html for download and documentation on fastq_quality_filter. https://github.com/Danis102 for updates on seqpac.

Other PAC generation: PAC_check(), create_PAC(), make_PAC(), make_counts(), make_pheno(), make_trim(), merge_lanes()

Examples

 
############################################################      
### Principle of trimming using the make_cutadapt function
### (Important: Need external installations of cutadapt 
###  and fastq_quality_filter to work)
#  
#   input = "/some/path/to/input/folder"
#   output =  "/some/path/to/output/folder"
# 
## Parse for make_cutadapt is a list of 2 character string expressions.
## The first is parsed to cutadapt and the other to fastq_quality_filter 
## For parallel processes '-j 1' is recommended since seqpac will   
## parallelize across samples and not within.
## Run system2("cutadapt -h", stdout=TRUE) and 
## system("fastq_quality_filter -h", stdout=TRUE) 
## for more options.
#   
## String to parse to cutadapt:
# cut_prs <- paste0("-j 1 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCACAT",
#                    " --discard-untrimmed --nextseq-trim=20",
#                    " -O 10 -m 7 -M 70")
# 
## Add string to parse to fastq_quality_filter:
#  parse = list(
#            cutadapt=cut_prs,
#            fastq_quality_filter="-q 20 -p 80")
#               
#  logs  <-  make_cutadapt(input, output, threads=8, parse=parse)

#' # Clean up temp
closeAllConnections()
fls_temp  <- list.files(tempdir(), recursive=TRUE, full.names = TRUE)
file.remove(fls_temp, showWarnings=FALSE)   
 

Danis102/seqpac documentation built on Aug. 26, 2023, 10:15 a.m.