raw2data.proc: From raw data to data.proc()
In carlopacioni/amplicR: An R package to process amplicon data

raw2data.proc

R Documentation

From raw data to data.proc()

Description

This function is a wrapper for rmEndAdapter, deconv and data.proc. It takes in a raw fastq file, removes the end adapter, separates the reads based on their forward primers. Within each of the identified group, separates the reads based on barcodes (indexes) and eventually calls data.proc to process (quality checking, denoising and chimeras filtering) the retained data from the NGS run.

Usage

raw2data.proc(
  fn,
  nRead = 1e+08,
  rmEnd = FALSE,
  EndAdapter = "P7_last10",
  adapter.mismatch = 0,
  info.file,
  sample.IDs = "Sample_IDs",
  Fprimer = "F_Primer",
  Rprimer = "R_Primer",
  primer.mismatch = 0,
  Find = "F_ind",
  Rind = "R_ind",
  index.mismatch = 0,
  gene = "Gene",
  amplic.size = "Amplicon",
  truncQ = 2,
  qrep = FALSE,
  dada = TRUE,
  pool = FALSE,
  plot.err = FALSE,
  chim = TRUE,
  orderBy = "abundance"
)

Arguments

`fn`	Fully qualified name (i.e. the complete path) of the fastq file
`nRead`	The number of bytes or characters to be read at one time. See `FastqStreamer` for details
`rmEnd`	Whether `rmEndAdapter` should be performed (default: FALSE)
`EndAdapter`	A character vector with the sequence of the end adapter, "P7" or "P7_last10" (See details)
`adapter.mismatch`	The maximum number of allowed mismatch (See details)
`info.file`	Fully qualified name (i.e. the complete path) of the CSV file with the information needed on primers, indexes etc. (See details)
`sample.IDs`	A character vector with the name of the column in info.file containing the sample IDs
`Fprimer, Rprimer`	A character vector with the name of the column in info.file containing the forward and reverse primer sequence, respectively
`primer.mismatch`	The maximum number of primer mismatch
`Find, Rind`	A character vector with the name of the column in info.file containing the forward and reverse index sequence respectively
`index.mismatch`	The maximum number of index mismatch
`gene`	A character vector with the name of the column in info.file containing the name of the gene or other group idenifiers (see details)
`amplic.size`	A character vector with the name of the column in info.file containing the amplicon size of the PCR product
`truncQ`	Truncate reads at the first instance of a quality score less than or equal to truncQ when conducting quality filtering. See `fastqFilter` for details
`qrep`	Logical. Should the quality report be generated? (default `FALSE`)
`dada`	Logical. Should the dada analysis be conducted? (default `TRUE`)
`pool`	Logical. Should samples be pooled together prior to sample inference? (default `FALSE`). See `dada` for details
`plot.err`	Logical. Whether error rates obtained from `dada` should be plotted
`chim`	Logical. Should the bimera search and removal be performed? (default `TRUE`)
`orderBy`	Character vector specifying how the returned sequence table should be sorted. Default "abundance". See `makeSequenceTable` for details

Details

Note that the amplicon size for data.proc is obtained from the comma delimited file info.file, searching in the column with the heading indicated in amplic.size. Zeros can be used in this column if no truncation is wanted. For each entry in the column indicated with the argument gene, the function will use the first entry found in amplic.size for the relevant gene. If the same gene identifier is used for multiple forward primers, refer to the documentation for the deconv to see how multiple PCR product can be grouped together using the gene column). Note that withing each gene, the same amplicon length is used raw2data.proc. To use different amplicon sizes within a gene, run the three functions (rmEndAdapter, deconv and data.proc) manually, rather than with raw2data.proc.

By default, dir.out is set to the location where the input file is and verbose=FALSE.

Please, see documentations for each functions for more information.

Value

A list that has for elements the output of data.proc for each PCR product

Also, in addition to the output files described in the documentations for rmEndAdapter, deconv and data.proc, a text file, named "summary_nReads.txt" is saved in the same location where the raw data are, summarising the number of reads retained in each step of the analysis

carlopacioni/amplicR documentation built on Aug. 19, 2023, 7:59 p.m.