raw2data.proc | R Documentation |
This function is a wrapper for rmEndAdapter
,
deconv
and data.proc
. It takes in a raw fastq
file, removes the end adapter, separates the reads based on their forward
primers. Within each of the identified group, separates the reads based on
barcodes (indexes) and eventually calls data.proc
to process
(quality checking, denoising and chimeras filtering) the retained data from
the NGS run.
raw2data.proc(
fn,
nRead = 1e+08,
rmEnd = FALSE,
EndAdapter = "P7_last10",
adapter.mismatch = 0,
info.file,
sample.IDs = "Sample_IDs",
Fprimer = "F_Primer",
Rprimer = "R_Primer",
primer.mismatch = 0,
Find = "F_ind",
Rind = "R_ind",
index.mismatch = 0,
gene = "Gene",
amplic.size = "Amplicon",
truncQ = 2,
qrep = FALSE,
dada = TRUE,
pool = FALSE,
plot.err = FALSE,
chim = TRUE,
orderBy = "abundance"
)
fn |
Fully qualified name (i.e. the complete path) of the fastq file |
nRead |
The number of bytes or characters to be read at one time. See
|
rmEnd |
Whether |
EndAdapter |
A character vector with the sequence of the end adapter, "P7" or "P7_last10" (See details) |
adapter.mismatch |
The maximum number of allowed mismatch (See details) |
info.file |
Fully qualified name (i.e. the complete path) of the CSV file with the information needed on primers, indexes etc. (See details) |
sample.IDs |
A character vector with the name of the column in info.file containing the sample IDs |
Fprimer, Rprimer |
A character vector with the name of the column in info.file containing the forward and reverse primer sequence, respectively |
primer.mismatch |
The maximum number of primer mismatch |
Find, Rind |
A character vector with the name of the column in info.file containing the forward and reverse index sequence respectively |
index.mismatch |
The maximum number of index mismatch |
gene |
A character vector with the name of the column in info.file containing the name of the gene or other group idenifiers (see details) |
amplic.size |
A character vector with the name of the column in info.file containing the amplicon size of the PCR product |
truncQ |
Truncate reads at the first instance of a quality score less
than or equal to truncQ when conducting quality filtering. See
|
qrep |
Logical. Should the quality report be generated? (default
|
dada |
Logical. Should the dada analysis be conducted? (default
|
pool |
Logical. Should samples be pooled together prior to sample
inference? (default |
plot.err |
Logical. Whether error rates obtained from |
chim |
Logical. Should the bimera search and removal be performed?
(default |
orderBy |
Character vector specifying how the returned sequence table
should be sorted. Default "abundance". See
|
Note that the amplicon size for data.proc
is obtained from the
comma delimited file info.file
, searching in the column with the
heading indicated in amplic.size
. Zeros can be used in this column if
no truncation is wanted. For each entry in the column indicated with the
argument gene
, the function will use the first entry found in
amplic.size
for the relevant gene
. If the same gene identifier
is used for multiple forward primers, refer to the documentation for the
deconv
to see how multiple PCR product can be grouped together
using the gene
column). Note that withing each gene, the same
amplicon length is used raw2data.proc
. To use different amplicon sizes
within a gene, run the three functions (rmEndAdapter
,
deconv
and data.proc
) manually,
rather than with raw2data.proc
.
By default, dir.out
is set to the location where the input file is and
verbose=FALSE
.
Please, see documentations for each functions for more information.
A list that has for elements the output of data.proc
for each PCR product
Also, in addition to the output files described in the documentations for
rmEndAdapter
, deconv
and
data.proc
, a text file, named "summary_nReads.txt" is saved
in the same location where the raw data are, summarising the number of
reads retained in each step of the analysis
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.