demultiplex | R Documentation |
De-multiplexing Illumina data based on the extra forward barcode used by MiDiv lab.
demultiplex(metadata.tbl, in.folder, out.folder, trim.primers = TRUE)
metadata.tbl |
Table with data for each sample, see Details below. |
in.folder |
Name of folder where raw fastq files are located. |
out.folder |
Name of folder to output de-multiplexed fastq files. |
trim.primers |
Logical indicating if PCR-primers should be trimmed from start of R1 and R2 reads. |
compress.out |
Logical to indicate compressed output or not. |
pattern |
The pattern to recognize the raw fastq files from other files |
The input metadata.tbl
must be a table (tibble or data.frame)
with one row for each sample. It must follow the MiDiv metadata table standard
format. The columns used by this function are:
* ProjectID
* SequencingRunID
* SampleID
* Rawfile_R1
* Rawfile_R2
* Barcode
* Forward_primer
* Reverse_primer
The ProjectID, SequencingRunID and SampleID should all be a short text (sampleID may be just an integer). The names of the de-multiplexed fastq files will follow the format: ProjectID_SequencingRunID_SampleID_Rx.fastq.gz, where x is 1 or 2, so avoid using symbols not recommended in filenames (e.g. space, slash).
De-multiplexing means extracting subsets of reads from raw fastq files, those named in columns Rawfile_R1 and Rawfile_R2 (if single-end reads, only Rawfile_R1). The subset of read-pairs for each sample is identified by a barcode sequence, and this must be listed in the Barcode column. The Barcode sequence is matched at the start of the R1-reads only.
If trim.primers=TRUE
the start of the R1 sequence is trimmed by the
length of Forward_primer, and the start of the R2 read trimmed by the length
of Reverse_primer. NOTE: There is no primer-matching here. No reads are discarded,
only trimmed by primer lengths.
The files listed in Rawfile_R1 and Rawfile_R2 must all be in the in.folder
.
These files may be compressed (.gz).
The function will output the de-multiplexed fastq-files to the
out.folder
. The name of each file consists of the corresponding
ProjectID_SequencingRunID_SampleID, with the extensions _R1.fastq.gz
or _R2.fastq.gz
.
The function will return in R a table with the number of read-pairs for each
sample. You may then add this as a new column to the existing
metadata.tbl
by
full_join(metadata.tbl, demultiplex.tbl, by = c("ProjectID", "SequencingRunID", "SampleID")
,
where demultiplex.tbl
indicates the output from this function.
Lars Snipen.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.