tenxBamqc: Generate and output 10X read alignment data quality metrics
In compbiomed/scuff: Single Cell RNA-Seq UMI Filtering Facilitator (scruff)

tenxBamqc

R Documentation

Generate and output 10X read alignment data quality metrics

Description

Read BAM file generated by Cell Ranger pipeline and output QC metrics including number of aligned reads and reads aligned to an gene.

Usage

tenxBamqc(
  bam,
  experiment,
  filter,
  validCb = NA,
  tags = c("NH", "GX", "CB", "MM"),
  yieldSize = 1e+06,
  outDir = "./",
  cores = max(1, parallelly::availableCores() - 2)
)

Arguments

`bam`	Paths to input BAM files generated by Cell Ranger pipeline. These files are usually named "possorted_genome_bam.bam" in the "outs" folder of the top-level project output folders, respectively.
`experiment`	A character vector of experiment names. Represents the group label for each BAM file, e.g. "patient1, patient2, ...". The length of `experiment` equals the number of BAM files to be processed.
`filter`	Paths to the filtered barcode files. Should be in the same length and order of the input BAM files. These files are named "barcodes.tsv" located at outs/filtered_gene_bc_matrices/<reference_genome>/barcodes.tsv.
`validCb`	Path to the cell barcode whitelist file. By default uses the file "737K-august-2016.txt" which is compatible with the v2 chemistry protocol. The file can be inspected by calling `data(validCb, package = "scruff")`. If the library is generated using the v1 chemistry protocol, the path to the v1 barcode whitelist file ("737K-april-2014_rc.txt") needs to be provided. For library generated by v3 chemistry protocol, path to "3M-february-2018.txt" is needed.
`tags`	BAM tags used for collecting QC metrics. Contains non-standard tags locally-defined by Cell Ranger pipeline. Should not be changed in most cases.
`yieldSize`	The number of records (alignments) to yield when drawing successive subsets from a BAM file, providing the number of successive records to be returned on each yield. This parameter is passed to the `yieldSize` argument of the `BamFile` function in `Rsamtools` package. Default is 1e06.
`outDir`	Output directory. The location to write resulting QC table.
`cores`	Number of cores used for parallelization. Default is `max(1, parallelly::availableCores() - 2)`, i.e. the number of available cores minus 2.

Value

A SingleCellExperiment object. The colData contains the number of aligned reads (reads_mapped_to_genome) and reads aligned to genes (reads_mapped_to_genes).

Examples

# first 5000 records in the bam file downloaded from here:
# http://sra-download.ncbi.nlm.nih.gov/srapub_files/
# SRR5167880_E18_20160930_Neurons_Sample_01.bam
# see details here:
# https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP096558
# and here:
# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93421
bamfile10x <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_5000.bam",
    package = "scruff")

# library(TENxBrainData)
# library(data.table)
# tenx <- TENxBrainData()
# # get filtered barcodes for sample 01
# filteredBcIndex <- tstrsplit(colData(tenx)[, "Barcode"], "-")[[2]] == 1
# filteredBc <- colData(tenx)[filteredBcIndex, ][["Barcode"]]

filteredBc <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_filtered_barcode.tsv",
    package = "scruff")
# QC results are saved to current working directory
sce <- tenxBamqc(bam = bamfile10x,
    experiment = "Neurons_Sample_01",
    filter = filteredBc)
sce

compbiomed/scuff documentation built on March 28, 2024, 10:54 a.m.