tenxBamqc: Generate and output 10X read alignment data quality metrics

View source: R/tenxBamqc.R

tenxBamqcR Documentation

Generate and output 10X read alignment data quality metrics

Description

Read BAM file generated by Cell Ranger pipeline and output QC metrics including number of aligned reads and reads aligned to an gene.

Usage

tenxBamqc(
  bam,
  experiment,
  filter,
  validCb = NA,
  tags = c("NH", "GX", "CB", "MM"),
  yieldSize = 1e+06,
  outDir = "./",
  cores = max(1, parallelly::availableCores() - 2)
)

Arguments

bam

Paths to input BAM files generated by Cell Ranger pipeline. These files are usually named "possorted_genome_bam.bam" in the "outs" folder of the top-level project output folders, respectively.

experiment

A character vector of experiment names. Represents the group label for each BAM file, e.g. "patient1, patient2, ...". The length of experiment equals the number of BAM files to be processed.

filter

Paths to the filtered barcode files. Should be in the same length and order of the input BAM files. These files are named "barcodes.tsv" located at outs/filtered_gene_bc_matrices/<reference_genome>/barcodes.tsv.

validCb

Path to the cell barcode whitelist file. By default uses the file "737K-august-2016.txt" which is compatible with the v2 chemistry protocol. The file can be inspected by calling data(validCb, package = "scruff"). If the library is generated using the v1 chemistry protocol, the path to the v1 barcode whitelist file ("737K-april-2014_rc.txt") needs to be provided. For library generated by v3 chemistry protocol, path to "3M-february-2018.txt" is needed.

tags

BAM tags used for collecting QC metrics. Contains non-standard tags locally-defined by Cell Ranger pipeline. Should not be changed in most cases.

yieldSize

The number of records (alignments) to yield when drawing successive subsets from a BAM file, providing the number of successive records to be returned on each yield. This parameter is passed to the yieldSize argument of the BamFile function in Rsamtools package. Default is 1e06.

outDir

Output directory. The location to write resulting QC table.

cores

Number of cores used for parallelization. Default is max(1, parallelly::availableCores() - 2), i.e. the number of available cores minus 2.

Value

A SingleCellExperiment object. The colData contains the number of aligned reads (reads_mapped_to_genome) and reads aligned to genes (reads_mapped_to_genes).

Examples

# first 5000 records in the bam file downloaded from here:
# http://sra-download.ncbi.nlm.nih.gov/srapub_files/
# SRR5167880_E18_20160930_Neurons_Sample_01.bam
# see details here:
# https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP096558
# and here:
# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93421
bamfile10x <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_5000.bam",
    package = "scruff")

# library(TENxBrainData)
# library(data.table)
# tenx <- TENxBrainData()
# # get filtered barcodes for sample 01
# filteredBcIndex <- tstrsplit(colData(tenx)[, "Barcode"], "-")[[2]] == 1
# filteredBc <- colData(tenx)[filteredBcIndex, ][["Barcode"]]

filteredBc <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_filtered_barcode.tsv",
    package = "scruff")
# QC results are saved to current working directory
sce <- tenxBamqc(bam = bamfile10x,
    experiment = "Neurons_Sample_01",
    filter = filteredBc)
sce

campbio/scruff documentation built on April 2, 2024, 12:53 a.m.