collapse.fastq: Very fast fastq/fasta collapser

Description Usage Arguments Value Examples

View source: R/fastq_helpers.R

Description

For each unique read in the file, collapse into 1 and state in the fasta header how many reads existed of that type. This is done after trimming usually, works best for reads < 50 read length. Not so effective for 150 bp length mRNA-seq etc.

Usage

1
2
3
4
5
6
collapse.fastq(
  files,
  outdir = file.path(dirname(files[1]), "collapsed"),
  header.out.format = "ribotoolkit",
  compress = FALSE
)

Arguments

files

paths to fasta / fastq files to collapse.

outdir

outdir to save files, default: file.path(dirname(files[1]), "collapsed"). Inside same folder as input files, then create subfolder "collapsed", and add a prefix of "collapsed_" to the output names in that folder.

header.out.format

character, default "ribotoolkit", else must be "fastx". How the read header of the output fasta should be formated: ribotoolkit: "<seq1_x55", sequence 1 has 55 duplicated reads collapsed. fastx: "<1-55", sequence 1 has 55 duplicated reads collapsed

compress

logical, default FALSE

Value

invisible(NULL)

Examples

1
2
3
fastq.folder <- tempdir() # <- Your fastq files
infiles <- dir(fastq.folder, "*.fastq", full.names = TRUE)
# collapse.fastq(infiles)

ORFik documentation built on March 27, 2021, 6 p.m.