demultiplex: Demultiplex cell barcodes and assign cell specific reads

View source: R/demultiplex.R

demultiplexR Documentation

Demultiplex cell barcodes and assign cell specific reads

Description

Demultiplex fastq files and write cell specific reads in compressed fastq format to output directory

Usage

demultiplex(
  project = paste0("project_", Sys.Date()),
  experiment,
  lane,
  read1Path,
  read2Path,
  bc,
  bcStart,
  bcStop,
  bcEdit = 0,
  umiStart,
  umiStop,
  keep,
  minQual = 10,
  yieldReads = 1e+06,
  outDir = "./Demultiplex",
  summaryPrefix = "demultiplex",
  overwrite = FALSE,
  cores = max(1, parallelly::availableCores() - 2),
  verbose = FALSE,
  logfilePrefix = format(Sys.time(), "%Y%m%d_%H%M%S")
)

Arguments

project

The project name. Default is paste0("project_", Sys.Date()).

experiment

A character vector of experiment names. Represents the group label for each FASTQ file, e.g. "patient1, patient2, ...". The number of cells in a experiment equals the length of cell barcodes bc. The length of experiment equals the number of FASTQ files to be processed.

lane

A character or character vector of flow cell lane numbers. FASTQ files from lanes having the same experiment will be concatenated. If FASTQ files from multiple lanes are already concatenated, any placeholder would be sufficient, e.g. "L001".

read1Path

A character vector of file paths to the read 1 FASTQ files. These are the read files containing UMI and cell barcode sequences.

read2Path

A character vector of file paths to the read 2 FASTQ files. These read files contain genomic transcript sequences.

bc

A character vector of pre-determined cell barcodes. For example, see ?barcodeExample.

bcStart

Integer or vector of integers containing the cell barcode start positions (inclusive, one-based numbering).

bcStop

Integer or vector of integers containing the cell barcode stop positions (inclusive, one-based numbering).

bcEdit

Maximally allowed Hamming distance for barcode correction. Barcodes with mismatches equal or fewer than this will be assigned a corrected barcode if the inferred barcode matches uniquely in the provided predetermined barcode list. Default is 0, meaning no cell barcode correction is performed.

umiStart

Integer or vector of integers containing the start positions (inclusive, one-based numbering) of UMI sequences.

umiStop

Integer or vector of integers containing the stop positions (inclusive, one-based numbering) of UMI sequences.

keep

Read trimming. Read length or number of nucleotides to keep for read 2 (the read that contains transcript sequence information). Longer reads will be clipped at 3' end. Shorter reads will not be affected.

minQual

Minimally acceptable Phred quality score for barcode and UMI sequences. Phread quality scores are calculated for each nucleotide in the sequence. Sequences with at least one nucleotide with score lower than this will be filtered out. Default is 10.

yieldReads

The number of reads to yield when drawing successive subsets from a fastq file, providing the number of successive records to be returned on each yield. This parameter is passed to the n argument of the FastqStreamer function in ShortRead package. Default is 1e06.

outDir

Output folder path for demultiplex results. Demultiplexed cell specifc FASTQ files will be stored in folders in this path, respectively. Make sure the folder is empty. Default is "./Demultiplex".

summaryPrefix

Prefix for demultiplex summary filename. Default is "demultiplex".

overwrite

Boolean indicating whether to overwrite the output directory. Default is FALSE.

cores

Number of cores used for parallelization. Default is max(1, parallelly::availableCores() - 2), i.e. the number of available cores minus 2.

verbose

Poolean indicating whether to print log messages. Useful for debugging. Default to FALSE.

logfilePrefix

Prefix for log file. Default is current date and time in the format of format(Sys.time(), "%Y%m%d_%H%M%S").

Value

A SingleCellExperiment object containing the demultiplex summary information in the colData slot.

Examples

# Demultiplex example FASTQ files
data(barcodeExample, package = "scruff")
fastqs <- list.files(system.file("extdata", package = "scruff"),
    pattern = "\\.fastq\\.gz", full.names = TRUE)

de <- demultiplex(
    project = "example",
    experiment = c("1h1"),
    lane = c("L001"),
    read1Path = c(fastqs[1]),
    read2Path = c(fastqs[2]),
    barcodeExample,
    bcStart = 1,
    bcStop = 8,
    umiStart = 9,
    umiStop = 12,
    keep = 75,
    overwrite = TRUE)

campbio/scruff documentation built on April 2, 2024, 12:53 a.m.