bam2R: Read nucleotide counts from a .bam file

View source: R/deepSNV-functions.R

bam2RR Documentation

Read nucleotide counts from a .bam file

Description

This function uses a C interface to read the nucleotide counts on each position of a .bam alignment. The counts of both strands are reported separately and nucleotides below a quality cutoff are masked. It is called by deepSNV to parse the alignments of the test and control experiments, respectively.

Usage

bam2R(
  file,
  chr,
  start,
  stop,
  q = 25,
  mq = 0,
  s = 2,
  head.clip = 0,
  max.depth = 1e+06,
  verbose = FALSE,
  mask = 0,
  keepflag = 0,
  max.mismatches = NULL
)

Arguments

file

The name of the .bam file as a string.

chr

The chromosome as a string.

start

The start position (1-indexed).

stop

The end position (1-indexed).

q

An optional cutoff for the nucleotide Phred quality. Default q = 25. Nucleotides with Q < q will be masked by 'N'.

mq

An optional cutoff for the read mapping quality. Default mq = 0 (no filter). reads with MQ < mq will be discarded.

s

Optional choice of the strand. Defaults to s = 2 (both).

head.clip

Should n nucleotides from the head of reads be clipped? Default 0.

max.depth

The maximal depth for the pileup command. Default 1,000,000.

verbose

Boolean. Set to TRUE if you want to get additional output.

mask

Integer indicating which flags to filter. Default 0 (no mask). Try 3844 (UNMAP|SECONDARY|QCFAIL|DUP|SUPPLEMENTARY).

keepflag

Integer indicating which flags to keep. Default 0 (no mask). Try 3 (PAIRED|PROPERLY_PAIRED).

max.mismatches

Integer indicating maximum NM value to allow in a read. Default NULL (no filter).

Value

A named matrix with rows corresponding to genomic positions and columns for the nucleotide counts (A, T, C, G, -), masked nucleotides (N), (INS)ertions, (DEL)etions, (HEAD)s and (TAIL)s that count how often a read begins and ends at the given position, respectively, and the sum of alignment (QUAL)ities, which can be indicative of alignment problems. Counts from matches on the reference strand (s=0) are uppercase, counts on the complement (s=1) are lowercase. The returned matrix has 11 * 2 (strands) = 22 columns and (stop - start + 1) rows.

Author(s)

Moritz Gerstung

Examples

## Simple example:
counts <- bam2R(file = system.file("extdata", "test.bam", package="deepSNV"), chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 3120, stop=3140, q = 10, mask = 3844)
show(counts)
## Not run: Requires an internet connection, but try yourself.
# bam <- bam2R(file = "http://www.bsse.ethz.ch/cbg/software/deepSNV/data/test.bam", chr="B.FR.83.HXB2_LAI_IIIB_BRU_K034", start = 2074, stop=3585, q=10, mask = 3844)
# head(bam)

gerstung-lab/deepSNV documentation built on June 3, 2022, 3:05 p.m.