countUMI: Count the number of UMIs for each gene and output count...
In 87875172/scuff: Single Cell RNA-Seq UMI Filtering Facilitator (scruff)

countUMI

R Documentation

Count the number of UMIs for each gene and output count matrix

Description

Count unique UMI:gene pairs for single cell RNA-sequencing alignment files. Write resulting count matrix to output directory. Columns are samples (cells) and rows are gene IDs. The input sequence alignment files must be generated using FASTQ files generated by the demultiplex function in scruff package. Return a SingleCellExperiment object containing the count matrix, cell and gene annotations, and all QC metrics.

Usage

countUMI(
  sce,
  reference,
  umiEdit = 0,
  format = "BAM",
  outDir = "./Count",
  cellPerWell = 1,
  cores = max(1, parallelly::availableCores() - 2),
  outputPrefix = "countUMI",
  verbose = FALSE,
  logfilePrefix = format(Sys.time(), "%Y%m%d_%H%M%S")
)

Arguments

`sce`	A `SingleCellExperiment` object of which the `colData` slot contains the alignment_path column with paths to input cell-specific sequence alignment files (BAM or SAM format).
`reference`	Path to the reference GTF file. The TxDb object of the GTF file will be generated and saved in the current working directory with ".sqlite" suffix.
`umiEdit`	Maximally allowed Hamming distance for UMI correction. For read alignments in each gene, by comparing to a more abundant UMI with more reads, UMIs having fewer reads and with mismatches equal or fewer than `umiEdit` will be assigned a corrected UMI (the UMI with more reads). Default is 0, meaning no UMI correction is performed. Doing UMI correction will decrease the number of transcripts per gene.
`format`	Format of input sequence alignment files. "BAM" or "SAM". Default is "BAM".
`outDir`	Output directory for UMI counting results. UMI corrected count matrix will be stored in this directory. Default is `"./Count"`.
`cellPerWell`	Number of cells per well. Can be an integer (e.g. 1) indicating the number of cells in each well or an vector with length equal to the total number of cells in the input alignment files specifying the number of cells in each file. Default is 1.
`cores`	Number of cores used for parallelization. Default is `max(1, parallelly::availableCores() - 2)`, i.e. the number of available cores minus 2.
`outputPrefix`	Prefix for expression table filename. Default is `"countUMI"`.
`verbose`	Print log messages. Useful for debugging. Default to FALSE.
`logfilePrefix`	Prefix for log file. Default is current date and time in the format of `format(Sys.time(), "%Y%m%d_%H%M%S")`.

Value

A SingleCellExperiment object.

Examples

## Not run: 
data(barcodeExample, package = "scruff")
# The SingleCellExperiment object returned by alignRsubread function and the
# alignment BAM files are required for running countUMI function
# First demultiplex example FASTQ files
fastqs <- list.files(system.file("extdata", package = "scruff"),
    pattern = "\\.fastq\\.gz", full.names = TRUE)

de <- demultiplex(
    project = "example",
    experiment = c("1h1"),
    lane = c("L001"),
    read1Path = c(fastqs[1]),
    read2Path = c(fastqs[2]),
    barcodeExample,
    bcStart = 1,
    bcStop = 8,
    umiStart = 9,
    umiStop = 12,
    keep = 75,
    overwrite = TRUE)

# Alignment
library(Rsubread)
# Create index files for GRCm38_MT.
fasta <- system.file("extdata", "GRCm38_MT.fa", package = "scruff")
# Specify the basename for Rsubread index
indexBase <- "GRCm38_MT"
buildindex(basename = indexBase, reference = fasta, indexSplit = FALSE)

al <- alignRsubread(de, indexBase, overwrite = TRUE)

# Counting
gtf <- system.file("extdata", "GRCm38_MT.gtf", package = "scruff")
sce = countUMI(al, gtf, cellPerWell=c(rep(1, 46), 0, 0))

## End(Not run)

# or use the built-in SingleCellExperiment object generated using
# example dataset (see ?sceExample)
data(sceExample, package = "scruff")

87875172/scuff documentation built on July 28, 2024, 6:11 p.m.