applyPileups: Apply a user-provided function to calculate pile-up...

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

applyPileups scans one or more BAM files, returning position-specific sequence and quality summaries.

Usage

1
applyPileups(files, FUN, ..., param)

Arguments

files

A PileupFiles instances.

FUN

A function of 1 argument, x, to be evaluated for each yield (see yieldSize, yieldBy, yieldAll). The argument x is a list, with elements describing the current pile-up. The elements of the list are determined by the argument what, and include:

seqnames:

(Always returned) A named integer() representing the seqnames corresponding to each position reported in the pile-up. This is a run-length encoding, where the names of the elements represent the seqnames, and the values the number of successive positions corresponding to that seqname.

pos:

Always returned) A integer() representing the genomic coordinate of each pile-up position.

seq:

An array of dimensions nucleotide x file x position.

The ‘nucleotide’ dimension is length 5, corresponding to ‘A’, ‘C’, ‘G’, ‘T’, and ‘N’ respectively.

Entries in the array represent the number of times the nucleotide occurred in reads in the file overlapping the position.

qual:

Like seq, but summarizing quality; the first dimension is the Phred-encoded quality score, ranging from ‘!’ (0) to ‘~’ (93).

...

Additional arguments, passed to methods.

param

An instance of the object returned by ApplyPileupsParam.

Details

Regardless of param values, the algorithm follows samtools by excluding reads flagged as unmapped, secondary, duplicate, or failing quality control.

Value

applyPileups returns a list equal in length to the number of times FUN has been called, with each element containing the result of FUN.

ApplyPileupsParam returns an object describing the parameters.

Author(s)

Martin Morgan

References

http://samtools.sourceforge.net/

See Also

ApplyPileupsParam.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
## The examples below are currently broken and have been disabled for now
## Not run: 
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)

fls <- PileupFiles(c(fl, fl))

calcInfo <-
    function(x)
{
    ## information at each pile-up position
    info <- apply(x[["seq"]], 2, function(y) {
        y <- y[c("A", "C", "G", "T"),,drop=FALSE]
        y <- y + 1L                     # continuity
        cvg <- colSums(y)
        p <- y / cvg[col(y)]
        h <- -colSums(p * log(p))
        ifelse(cvg == 4L, NA, h)
    })
    list(seqnames=x[["seqnames"]], pos=x[["pos"]], info=info)
}
which <- GRanges(c("seq1", "seq2"), IRanges(c(1000, 1000), 2000))
param <- ApplyPileupsParam(which=which, what="seq")
res <- applyPileups(fls, calcInfo, param=param)
str(res)
head(res[[1]][["pos"]])		# positions matching param
head(res[[1]][["info"]])		# inforamtion in each file

## 'param' as part of 'files'
fls1 <- PileupFiles(c(fl, fl), param=param)
res1 <- applyPileups(fls1, calcInfo)
identical(res, res1)

## yield by position, across ranges
param <- ApplyPileupsParam(which=which, yieldSize=500L,
                           yieldBy="position", what="seq")
res <- applyPileups(fls, calcInfo, param=param)
sapply(res, "[[", "seqnames")

## End(Not run)

Example output

Loading required package: GenomeInfoDb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, basename, cbind, colMeans, colSums, colnames,
    dirname, do.call, duplicated, eval, evalq, get, grep, grepl,
    intersect, is.unsorted, lapply, lengths, mapply, match, mget,
    order, paste, pmax, pmax.int, pmin, pmin.int, rank, rbind,
    rowMeans, rowSums, rownames, sapply, setdiff, sort, table, tapply,
    union, unique, unsplit, which, which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit


Attaching package: 'Rsamtools'

The following object is masked from 'package:BiocGenerics':

    path

List of 2
 $ :List of 3
  ..$ seqnames: Named int 570
  .. ..- attr(*, "names")= chr "seq1"
  ..$ pos     : int [1:570] 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 ...
  ..$ info    : num [1:570, 1:2] 0.356 0.356 0.336 0.336 0.33 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "ex1.bam" "ex1.bam"
 $ :List of 3
  ..$ seqnames: Named int 568
  .. ..- attr(*, "names")= chr "seq2"
  ..$ pos     : int [1:568] 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 ...
  ..$ info    : num [1:568, 1:2] 0.238 0.235 0.295 0.241 0.235 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : NULL
  .. .. ..$ : chr [1:2] "ex1.bam" "ex1.bam"
[1] 1000 1001 1002 1003 1004 1005
       ex1.bam   ex1.bam
[1,] 0.3556980 0.3556980
[2,] 0.3556980 0.3556980
[3,] 0.3357909 0.3357909
[4,] 0.3357909 0.3357909
[5,] 0.3296843 0.3296843
[6,] 0.3296843 0.3296843
[1] TRUE
[[1]]
seq1 
 500 

[[2]]
seq1 seq2 
  70  430 

[[3]]
seq2 
 138 

Rsamtools documentation built on Nov. 8, 2020, 8:11 p.m.