Apply a user-provided function to calculate pile-up statistics across multiple BAM files.

Share:

Description

applyPileups scans one or more BAM files, returning position-specific sequence and quality summaries.

Usage

1
applyPileups(files, FUN, ..., param)

Arguments

files

A PileupFiles instances.

FUN

A function of 1 argument, x, to be evaluated for each yield (see yieldSize, yieldBy, yieldAll). The argument x is a list, with elements describing the current pile-up. The elements of the list are determined by the argument what, and include:

seqnames:

(Always returned) A named integer() representing the seqnames corresponding to each position reported in the pile-up. This is a run-length encoding, where the names of the elements represent the seqnames, and the values the number of successive positions corresponding to that seqname.

pos:

Always returned) A integer() representing the genomic coordinate of each pile-up position.

seq:

An array of dimensions nucleotide x file x position.

The ‘nucleotide’ dimension is length 5, corresponding to ‘A’, ‘C’, ‘G’, ‘T’, and ‘N’ respectively.

Entries in the array represent the number of times the nucleotide occurred in reads in the file overlapping the position.

qual:

Like seq, but summarizing quality; the first dimension is the Phred-encoded quality score, ranging from ‘!’ (0) to ‘~’ (93).

...

Additional arguments, passed to methods.

param

An instance of the object returned by ApplyPileupsParam.

Details

Regardless of param values, the algorithm follows samtools by excluding reads flagged as unmapped, secondary, duplicate, or failing quality control.

Value

applyPileups returns a list equal in length to the number of times FUN has been called, with each element containing the result of FUN.

ApplyPileupsParam returns an object describing the parameters.

Author(s)

Martin Morgan

References

http://samtools.sourceforge.net/

See Also

ApplyPileupsParam.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)

fls <- PileupFiles(c(fl, fl))

calcInfo <-
    function(x)
{
    ## information at each pile-up position
    info <- apply(x[["seq"]], 2, function(y) {
        y <- y[c("A", "C", "G", "T"),,drop=FALSE]
        y <- y + 1L                     # continuity
        cvg <- colSums(y)
        p <- y / cvg[col(y)]
        h <- -colSums(p * log(p))
        ifelse(cvg == 4L, NA, h)
    })
    list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info)
}
which <- GRanges(c("seq1", "seq2"), IRanges(c(1000, 1000), 2000))
param <- ApplyPileupsParam(which=which, what="seq")
res <- applyPileups(fls, calcInfo, param=param)
str(res)
head(res[[1]][["pos"]])		# positions matching param
head(res[[1]][["info"]])		# inforamtion in each file

## 'param' as part of 'files'
fls1 <- PileupFiles(c(fl, fl), param=param)
res1 <- applyPileups(fls1, calcInfo)
identical(res, res1)

## yield by position, across ranges
param <- ApplyPileupsParam(which=which, yieldSize=500L,
                           yieldBy="position", what="seq")
res <- applyPileups(fls, calcInfo, param=param)
sapply(res, "[[", "seqnames")

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker.