bamsignals-methods: Functions for extracting count signals from a bam file.
In bamsignals: Extract read count signals from bam files

Description Usage Arguments Details Value See Also Examples

Functions for extracting count signals from a bam file.

bamCount: for each range, count the reads whose 5' end map in it.

bamProfile: for each base pair in the ranges, compute the number of reads whose 5' end maps there.

bamCoverage: for each base pair in the ranges, compute the number of reads covering it.

## S4 method for signature 'character,GenomicRanges'
bamCount(bampath, gr, mapqual = 0,
  shift = 0, ss = FALSE, paired.end = c("ignore", "filter", "midpoint"),
  tlenFilter = NULL, filteredFlag = -1, verbose = TRUE)

## S4 method for signature 'character,GenomicRanges'
bamProfile(bampath, gr, binsize = 1,
  mapqual = 0, shift = 0, ss = FALSE, paired.end = c("ignore", "filter",
  "midpoint"), tlenFilter = NULL, filteredFlag = -1, verbose = TRUE)

## S4 method for signature 'character,GenomicRanges'
bamCoverage(bampath, gr, mapqual = 0,
  paired.end = c("ignore", "extend"), tlenFilter = NULL,
  filteredFlag = -1, verbose = TRUE)

`bampath`	path to the bam file storing the read. The file must be indexed.
`gr`	GenomicRanges object used to specify the regions. If a range is on the negative strand the profile will be reverse-complemented.
`mapqual`	discard reads with mapping quality strictly lower than this parameter. The value 0 ensures that no read will be discarded, the value 254 that only reads with the highest possible mapping quality will be considered.
`shift`	shift the read position by a user-defined number of basepairs. This can be handy in the analysis of chip-seq data.
`ss`	produce a strand-specific profile or ignore the strand of the read. This option does not have any effect on the strand of the region. Strand-specific profiles are twice as long then strand-independent profiles.
`paired.end`	a character string indicating how to handle paired-end reads. If `paired.end!="ignore"` then only first reads in proper mapped pairs will be considered (SAMFLAG 66, i.e. in the flag of the read, the bits in the mask 66 must be all ones). If `paired.end=="midpoint"` then the midpoint of a filtered fragment is considered, where `mid = fragment_start + int(abs(tlen)/2)`, and where tlen is the template length stored in the bam file. For even tlen, the given midpoint will be moved of 0.5 basepairs in the 3' direction (bamCount, bamProfile). If `paired.end=="extend"` then the whole fragment is treated as a single read (bamCoverage).
`tlenFilter`	A filter on fragment length as estimated from alignment in paired end experiments (TLEN). If set to `c(min,max)` only reads are considered where `min <= TLEN <= max`. If `paired.end=="ignore"`, this argument is set to `NULL` and no filtering is done. If `paired.end!="ignore"`, this argument defaults to `c(0,1000)`.
`filteredFlag`	Filter out reads with a certain flag set, e.g. "1024" to filter out PCR or optical duplicates.
`verbose`	a logical value indicating whether verbose output is desired
`binsize`	If the value is set to 1, the method will return basepair-resolution read densities, for bigger values the density profiles will be binned (and the memory requirements will scale accordingly).

A read position is always specified by its 5' end, so a read mapping to the reference strand is positioned at its leftmost coordinate, a read mapping to the alternative strand is positioned at its rightmost coordinate. This can be changed using the shift parameter.

bamProfile and bamCoverage: a CountSignals object with a signal per region
bamCount: a vector with one element per region or, if ss==TRUE, a matrix with one column per region and two rows (sense and antisense).

CountSignals for handling the return value of bamProfile and bamCoverage

## TOY DATA ##
library(GenomicRanges)
bampath <- 
system.file("extdata", "randomBam.bam", package="bamsignals")
genes <- 
get(load(system.file("extdata", "randomAnnot.Rdata", package="bamsignals")))


## THE FUNCTION 'count' ##
#count how many reads map in each region (according to 5' end)
v <- bamCount(bampath, genes)
#plot it
labs <- paste0(seqnames(genes), ":", start(genes), "-", end(genes))
par(mar=c(5, 6, 4, 2))
barplot(v, names.arg=labs, main="read counts per region", las=2, 
	horiz=TRUE, cex.names=.6)

#distinguish between strands
v2 <- bamCount(bampath, genes, ss=TRUE)
#plot it
par(mar=c(5, 6, 4, 2))
barplot(v2, names.arg=labs, main="read counts per region", las=2, 
	horiz=TRUE, cex.names=.6, col=c("blue", "red"), legend=TRUE)



## THE FUNCTIONS 'bamProfile' and 'bamCoverage' ##
#count how many reads map to each base pair (according to 5' end)
pu <- bamProfile(bampath, genes)
#count how many reads cover each base pair
du <- bamCoverage(bampath, genes)
#plot it
xlab <- "offset from start of the region"
ylab <- "reads per base pair"
main <- paste0("read coverage and profile of the region ", labs[1])
plot(du[1], ylim=c(0, max(du[1])), ylab=ylab, xlab=xlab, main=main, type="l")
lines(pu[1], lty=2)
llab <- c("covering the base pair", "5' end maps to the base pair")
legend("topright", llab, lty=c(1,2), bg="white")



## REGIONS OF THE SAME SIZE AND OPTIONS FOR 'bamProfile' ##
proms <- promoters(genes, upstream=150, downstream=150)
#pileup according to strand
pu_ss <- bamProfile(bampath, proms, ss=TRUE)
#compute average over regions
avg_ss <- apply(alignSignals(pu_ss), 2, rowMeans)

#profile using a strand-specific shift
pu_shift <- bamProfile(bampath, proms, shift=75)
#compute average over regions
avg_shift <- rowMeans(alignSignals(pu_shift))

#profile using a strand-specific shift and a binning scheme
binsize <- 20
pu_shift_bin <- bamProfile(bampath, proms, shift=75, binsize=binsize)
#compute average over regions
avg_shift_bin <- rowMeans(alignSignals(pu_shift_bin))

#plot it
xs <- -149:150
main <- paste0("average read profile over ", length(proms), " promoters")
xlab <- "distance from TSS"
ylab <- "average reads per base pair"
plot(xs, avg_shift, xlab=xlab, ylab=ylab, main=main, type="l", 
	ylim=c(0, max(avg_shift)))
lines(xs, avg_ss["sense",], col="blue")
lines(xs, avg_ss["antisense",], col="red")
lines(xs, rep(avg_shift_bin/binsize, each=binsize), lty=2)
llabs <- 
c("sense reads", "antisense reads", "with shift", "binned and with shift")
legend("topright", llabs, col=c("blue", "red", "black", "black"),
	lty=c(1,1,1,2), bg="white")