demultiplex: Demultiplexing sequencing reads

View source: R/demultiplex.R

demultiplexR Documentation

Demultiplexing sequencing reads

Description

Function for demultiplexing sequencing reads arranged in a common format provided by sequencers (such as Illumina) generally for 16S data. This function takes a matrix of sample names/barcodes, a .fastq file of barcodes by sequence header, and a .fastq file of reads corresponding to the barcodes. Based on the barcodes given, the function extracts all reads for the indexed barcode and writes all the reads from that barcode to separate .fastq files.

Usage

demultiplex(
  barcodeFile,
  indexFile,
  readFile,
  rcBarcodes = TRUE,
  location = "./demultiplex_fastq",
  cores = 1,
  hammingDist = 0
)

Arguments

barcodeFile

File name for a file containing a .tsv matrix with a header row, and then sample names (column 1) and barcodes (column 2).

indexFile

Location to a .fastq file that contains the barcodes for each read. The headers should be the same (and in the same order) as readFile, and the sequence in the indexFile should be the corresponding barcode for each read. Quality scores are not considered.

readFile

Location to the sequencing read .fastq file that corresponds to the indexFile.

rcBarcodes

Should the barcode indexes in the barcodeFile be reverse complemented to match the sequences in the indexFile? Defaults to TRUE.

location

A directory location to store the demultiplexed read files. Defaults to generate a new subdirectory at './demultiplex_fastq'

cores

The number of cores to use for parallelization (BiocParallel). This function will parallelize over the barcodes and extract reads for each barcode separately and write them to separate demultiplexed files.

hammingDist

Uses a Hamming Distance or number of base differences to allow for inexact matches for the barcodes/indexes. Defaults to 0. Warning: if the Hamming Distance is >=1 and this leads to inexact index matches to more than one barcode, that read will be written to more than one demultiplexed read files

Value

Returns multiple .fastq files that contain all reads whose index matches the barcodes given. These files will be written to the location directory, and will be named based on the given sampleNames and barcodes, e.g. './demultiplex_fastq/SampleName1_GGAATTATCGGT.fastq.gz'

Examples


## Get barcode, index, and read data locations
barcodePath <- system.file("extdata", "barcodes.txt", package = "MetaScope")
indexPath <- system.file("extdata", "virus_example_index.fastq",
                         package = "MetaScope")
readPath <- system.file("extdata", "virus_example.fastq",
                         package = "MetaScope")

## Get barcode, index, and read data locations
demult <- demultiplex(barcodePath, indexPath, readPath, rcBarcodes = FALSE,
                      hammingDist = 2)
demult


compbiomed/MetaScope documentation built on Aug. 9, 2022, 10:41 a.m.