clean_fq: Removes the noise of an individual fastq file

Description Usage Arguments Details Value Examples

View source: R/clean_fq.R

Description

This function reads the fastq file of an individual and clean it by removing:

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
clean_fq(
  fq.files,
  min.coverage.threshold = 2L,
  max.coverage.threshold = "high.coverage.unique.reads",
  remove.unique.reads = TRUE,
  write.blacklist = TRUE,
  write.blacklist.fasta = TRUE,
  compress = FALSE,
  output.dir = NULL,
  parallel.core = parallel::detectCores() - 1
)

Arguments

fq.files

(character, path). The path to the individual fastq file to check. Default: fq.files = "my-sample.fq.gz".

min.coverage.threshold

(integer). Minimum coverage threshold. The function will remove distinct reads with coverage <= to the threshold. To turn off, min.coverage.threshold = NULL or 0L. Default: min.coverage.threshold = 2L.

max.coverage.threshold

(integer, character). Maximum coverage threshold. The function will remove distinct reads with coverage >= than this threshold. To turn off, max.coverage.threshold = NULL. The default, use the starting depth where high coverage unique reads are observed. Default: max.coverage.threshold = "high.coverage.unique.reads".

remove.unique.reads

(logical). Remove distinct unique reads with high coverage. Likely paralogs or Transposable elements. Default: remove.unique.reads = TRUE.

write.blacklist

(logical). Write the blacklisted reads to a file. Default: write.blacklist = FALSE.

write.blacklist.fasta

(logical). Write the blacklisted reads to a fasta file. Default: write.blacklist.fasta = FALSE.

compress

(logical) To compress the output files. If you have the disk space, don't compress, it's way faster this way to write. Default: compress = FALSE.

output.dir

(path) Write the cleaned fq files in a specific directory. Default: output.dir = NULL, uses the working directory.

parallel.core

(integer) Enable parallel execution with the number of threads. Default: parallel.core = parallel::detectCores() - 1.

Details

coming soon, just try it in the meantime...

Value

The function returns a cleaned fq file with the name of the sample and -cleaned appended to the filename.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
## Not run: 
require(vroom)

# for one sample
clean.id <- stackr::clean_fq(
  fq.files = "my-sample.fq.gz",
  min.coverage.threshold = 7L,
  max.coverage.threshold = "high.coverage.unique.reads"
  )

# for multiple samples in parallel
# require(progressr)

 progressr::with_progress({
   clean <- stackr::clean_fq(
     fq.files = 04_process_radtags,
      min.coverage.threshold = 2L,
      max.coverage.threshold = "high.coverage.unique.reads",
      write.blacklist = TRUE,
      write.blacklist.fasta = TRUE,
      compress = FALSE,
      output.dir = "04_process_radtags/cleaned_fq"
 )
 })

## End(Not run)

thierrygosselin/stackr documentation built on Nov. 11, 2020, 11 a.m.