bam2R_10x: Read nucleotide counts from a 10x Genomics .bam file

View source: R/bam_functions.R

bam2R_10xR Documentation

Read nucleotide counts from a 10x Genomics .bam file

Description

This function uses a C interface to read the nucleotide counts on each position of a .bam alignment. The counts are individually tabulated for each cell barcode as specified by the user. The counts of both strands are reported separately and nucleotides below a quality cutoff are masked.

Usage

bam2R_10x(
  file,
  sites = "MT:1-16569",
  q = 25,
  mq = 0,
  s = 2,
  head.clip = 0,
  max.depth = 1e+06,
  verbose = FALSE,
  mask = 0,
  keepflag = 0,
  max.mismatches = NULL,
  ncores = 1,
  ignore_nonstandard = FALSE
)

Arguments

file

The file location of the BAM file as a string.

sites

The chromosome locations of interest in BED format as a string. Alternatively a single GRanges object will also work.

q

An optional cutoff for the nucleotide Phred quality. Default q = 25. Nucleotides with Q < q will be masked by 'N'.

mq

An optional cutoff for the read mapping quality. Default mq = 0 (no filter). reads with MQ < mq will be discarded.

s

Optional choice of the strand. Defaults to s = 2 (both).

head.clip

Should n nucleotides from the head of reads be clipped? Default 0.

max.depth

The maximal depth for the pileup command. Default 1,000,000.

verbose

Boolean. Set to TRUE if you want to get additional output.

mask

Integer indicating which flags to filter. Default 0 (no mask). Try 1796 (BAM_DEF_MASK).

keepflag

Integer indicating which flags to keep. Default 0 (no mask). Try 3 (PAIRED|PROPERLY_PAIRED).

max.mismatches

Integer indicating maximum MN value to allow in a read. Default NULL (no filter).

ncores

Integer indicating the number of threads to use for the parallel function call that summarize the results for each bam file. Default 1.

ignore_nonstandard

Boolean indicating whether or not gapped alignments, insertions, or deletions should be included in the final output. Default FALSE. If you have an inflation of spliced mitochondrial reads it is recommended to set this to TRUE.

Details

This code is an adaption of code that was originally written by Moritz Gerstung for the deepSNV package

Value

A named list of matrix with rows corresponding to genomic positions and columns for the nucleotide counts (A, T, C, G, -), masked nucleotides (N), (INS)ertions, (DEL)etions that count how often a read begins and ends at the given position, respectively. Each member of the list corresponds to an invididual cells or entity based on the cell barcode of interest. The names of the elements of the list correspond to the respective cell barcodes. For the intents and purposes of the mitoClone2 package this object is equivalent to the output from the baseCountsFromBamList function. The returned list has a variable length depending on the ignore_nonstandard parameter and each element contains a matrix has 8 columns and (stop - start + 1) rows. The two strands have their counts merged. If no counts are present in the provided sites parameter nothing will be returned. IMPORTANT: The names of the list will NOT reflect the source filename and will exclusively be named based on the respective the barcodes extracted from said file. If merging multiple datasets, it is important to change the list's names once imported to avoid naming collisions.

Author(s)

Benjamin Story (adapted from original code with permission from Moritz Gerstung)

Examples

bamCounts <- bam2R_10x(file = system.file("extdata",
"mm10_10x.bam", package="mitoClone2"), sites="chrM:1-15000")

benstory/mitoClone2 documentation built on Oct. 30, 2024, 3:20 p.m.