tenxBamqc: Generate and output 10X read alignment data quality metrics

Description Usage Arguments Value Examples

View source: R/tenxBamqc.R

Description

Read BAM file generated by Cell Ranger pipeline and output QC metrics including number of aligned reads and reads aligned to an gene.

Usage

1
2
3
tenxBamqc(bam, experiment, filter, validCb = NA, tags = c("NH", "GX", "CB",
  "MM"), yieldSize = 1e+06, outDir = "./", cores = max(1,
  parallel::detectCores() - 2))

Arguments

bam

Paths to input BAM files generated by Cell Ranger pipeline. These files are usually named "possorted_genome_bam.bam" in the "outs" folder of the top-level project output folders, respectively.

experiment

A character vector of experiment names. Represents the group label for each BAM file, e.g. "patient1, patient2, ...". The length of experiment equals the number of BAM files to be processed.

filter

Paths to the filtered barcode files. Should be in the same length and order of the input BAM files. These files are named "barcodes.tsv" located at outs/filtered_gene_bc_matrices/<reference_genome>/.

validCb

Path to the cell barcode whitelist file. By default uses the file "737K-august-2016.txt" which is compatible with the v2 chemistry protocol. The file can be inspected by calling data(validCb, package = "scruff"). If the library is generated using the v1 chemistry protocol, the path to the v1 barcode whitelist file ("737K-april-2014_rc.txt") needs to be provided.

tags

BAM tags used for collecting QC metrics. Contains non-standard tags locally-defined by Cell Ranger pipeline. Should not be changed in most cases.

yieldSize

The number of records (alignments) to yield when drawing successive subsets from a BAM file, providing the number of successive records to be returned on each yield. This parameter is passed to the yieldSize argument of the BamFile function in Rsamtools package. Default is 1e06.

outDir

Output directory. The location to write resulting QC table.

cores

Number of cores used for parallelization. Default is max(1, parallel::detectCores() - 2), i.e. the number of available cores minus 2.

Value

ggplot object showing the number of aligned reads and reads aligned to an gene.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# first 5000 records in the bam file downloaded from here:
# http://sra-download.ncbi.nlm.nih.gov/srapub_files/
# SRR5167880_E18_20160930_Neurons_Sample_01.bam
# see details here:
# https://trace.ncbi.nlm.nih.gov/Traces/sra/?study=SRP096558
# and here:
# https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE93421
bamfile10x <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_5000.bam",
    package = "scruff")

# library(TENxBrainData)
# library(data.table)
# tenx <- TENxBrainData()
# # get filtered barcodes for sample 01
# filteredBcIndex <- tstrsplit(colData(tenx)[, "Barcode"], "-")[[2]] == 1
# filteredBc <- colData(tenx)[filteredBcIndex, ][["Barcode"]]

filteredBc <- system.file("extdata",
    "SRR5167880_E18_20160930_Neurons_Sample_01_filtered_barcode.tsv",
    package = "scruff")
# QC results are saved to current working directory
qcDt <- tenxBamqc(bam = bamfile10x,
    experiment = "Neurons_Sample_01",
    filter = filteredBc)
qcDt

compbiomed/scruff documentation built on May 30, 2019, 12:48 p.m.