ScanBamParam-class: Parameters for scanning BAM files

Description Usage Arguments Objects from the Class Slots Functions and methods Author(s) See Also Examples

Description

Use ScanBamParam() to create a parameter object influencing what fields and which records are imported from a (binary) BAM file. Use of which requires that a BAM index file (<filename>.bai) exists.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
# Constructor
ScanBamParam(flag = scanBamFlag(), simpleCigar = FALSE,
    reverseComplement = FALSE, tag = character(0), tagFilter = list(),
    what = character(0), which, mapqFilter=NA_integer_)

# Constructor helpers
scanBamFlag(isPaired = NA, isProperPair = NA, isUnmappedQuery = NA, 
    hasUnmappedMate = NA, isMinusStrand = NA, isMateMinusStrand = NA,
    isFirstMateRead = NA, isSecondMateRead = NA, isNotPrimaryRead = NA,
    isSecondaryAlignment = NA, isNotPassingQualityControls = NA,
    isDuplicate = NA, isSupplementaryAlignment = NA)

scanBamWhat()

# Accessors
bamFlag(object, asInteger=FALSE)
bamFlag(object) <- value
bamReverseComplement(object)
bamReverseComplement(object) <- value
bamSimpleCigar(object)
bamSimpleCigar(object) <- value
bamTag(object)
bamTag(object) <- value
bamTagFilter(object)
bamTagFilter(object) <- value
bamWhat(object)
bamWhat(object) <- value
bamWhich(object)
bamWhich(object) <- value
bamMapqFilter(object)
bamMapqFilter(object) <- value

## S4 method for signature 'ScanBamParam'
show(object)

# Flag utils
bamFlagAsBitMatrix(flag, bitnames=FLAG_BITNAMES)
bamFlagAND(flag1, flag2)
bamFlagTest(flag, value)

Arguments

flag

For ScanBamParam, an integer(2) vector used to filter reads based on their 'flag' entry. This is most easily created with the scanBamFlag() helper function.

For bamFlagAsBitMatrix, bamFlagTest an integer vector where each element represents a 'flag' entry.

simpleCigar

A logical(1) vector which, when TRUE, returns only those reads for which the cigar (run-length encoded representation of the alignment) is missing or contains only matches / mismatches ('M').

reverseComplement

A logical(1) vectors. BAM files store reads mapping to the minus strand as though they are on the plus strand. Rsamtools obeys this convention by default (reverseComplement=FALSE), but when this value is set to TRUE returns the sequence and quality scores of reads mapped to the minus strand in the reverse complement (sequence) and reverse (quality) of the read as stored in the BAM file. This might be useful if wishing to recover read and quality scores as represented in fastq files, but is NOT appropriate for variant calling or other alignment-based operations.

tag

A character vector naming tags to be extracted. A tag is an optional field, with arbitrary information, stored with each record. Tags are identified by two-letter codes, so all elements of tag must have exactly 2 characters.

tagFilter

A named list of atomic vectors. The name of each list element is the tag name (two-letter code), and the corresponding atomic vector is the set of acceptable values for the tag. Only reads with specified tags are included. NULLs, NAs, and empty strings are not allowed in the atomic vectors.

what

A character vector naming the fields to return scanBamWhat() returns a vector of available fields. Fields are described on the scanBam help page.

mapqFilter

A non-negative integer(1) specifying the minimum mapping quality to include. BAM records with mapping qualities less than mapqFilter are discarded.

which

A GRanges, IntegerRangesList, or any object that can be coerced to a IntegerRangesList, or missing object, from which a IRangesList instance will be constructed. Names of the IRangesList correspond to reference sequences, and ranges to the regions on that reference sequence for which matches are desired. Because data types are coerced to IRangesList, which does not include strand information (use the flag argument instead). Only records with a read overlapping the specified ranges are returned. All ranges must have ends less than or equal to 536870912. When one record overlaps two ranges in which, the record is returned twice.

isPaired

A logical(1) indicating whether unpaired (FALSE), paired (TRUE), or any (NA) read should be returned.

isProperPair

A logical(1) indicating whether improperly paired (FALSE), properly paired (TRUE), or any (NA) read should be returned. A properly paired read is defined by the alignment algorithm and might, e.g., represent reads aligning to identical reference sequences and with a specified distance.

isUnmappedQuery

A logical(1) indicating whether unmapped (TRUE), mapped (FALSE), or any (NA) read should be returned.

hasUnmappedMate

A logical(1) indicating whether reads with mapped (FALSE), unmapped (TRUE), or any (NA) mate should be returned.

isMinusStrand

A logical(1) indicating whether reads aligned to the plus (FALSE), minus (TRUE), or any (NA) strand should be returned.

isMateMinusStrand

A logical(1) indicating whether mate reads aligned to the plus (FALSE), minus (TRUE), or any (NA) strand should be returned.

isFirstMateRead

A logical(1) indicating whether the first mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA).

isSecondMateRead

A logical(1) indicating whether the second mate read should be returned (TRUE) or not (FALSE), or whether mate read number should be ignored (NA).

isNotPrimaryRead

Deprecated; use isSecondaryAlignment.

isSecondaryAlignment

A logical(1) indicating whether alignments that are secondary (TRUE), are not (FALSE) or whose secondary status does not matter (NA) should be returned. A non-primary alignment (“secondary alignment” in the SAM specification) might result when a read aligns to multiple locations. One alignment is designated as primary and has this flag set to FALSE; the remainder, for which this flag is TRUE, are designated by the aligner as secondary.

isNotPassingQualityControls

A logical(1) indicating whether reads passing quality controls (FALSE), reads not passing quality controls (TRUE), or any (NA) read should be returned.

isDuplicate

A logical(1) indicating that un-duplicated (FALSE), duplicated (TRUE), or any (NA) reads should be returned. 'Duplicated' reads may represent PCR or optical duplicates.

isSupplementaryAlignment

A logical(1) indicating that non-supplementary (FALSE), supplementary (TRUE), or any (NA) reads should be returned. The SAM specification indicates that 'supplementary' reads are part of a chimeric alignment.

object

An instance of class ScanBamParam.

value

An instance of the corresponding slot, to be assigned to object or, for bamFlagTest, a character(1) name of the flag to test, e.g., “isUnmappedQuery”, from the arguments to scanBamFlag.

asInteger

logical(1) indicating whether ‘flag’ should be returned as an encoded integer vector (TRUE) or human-readable form (FALSE).

bitnames

Names of the flag bits to extract. Will be the colnames of the returned matrix.

flag1, flag2

Integer vectors containing ‘flag’ entries.

Objects from the Class

Objects are created by calls of the form ScanBamParam().

Slots

flag

Object of class integer encoding flags to be kept when they have their '0' (keep0) or '1' (keep1) bit set.

simpleCigar

Object of class logical indicating, when TRUE, that only 'simple' cigars (empty or 'M') are returned.

reverseComplement

Object of class logical indicating, when TRUE, that reads on the minus strand are to be reverse complemented (sequence) and reversed (quality).

tag

Object of class character indicating what tags are to be returned.

tagFilter

Object of class list (named) indicating tags to filter by, and the set of acceptable values for each tag.

what

Object of class character indicating what fields are to be returned.

which

Object of class IntegerRangesList indicating which reference sequence and coordinate reads must overlap.

mapqFilter

Object of class integer indicating the minimum mapping quality required for input, or NA to indicate no filtering.

Functions and methods

See 'Usage' for details on invocation.

Constructor:

ScanBamParam:

Returns a ScanBamParam object. The which argument to the constructor can be one of several different types, as documented above.

Accessors:

bamTag, bamTag<-

Returns or sets a character vector of tags to be extracted.

bamTagFilter, bamTagFilter<-

Returns or sets a named list of tags to filter by, and the set of their acceptable values.

bamWhat, bamWhat<-

Returns or sets a character vector of fields to be extracted.

bamWhich, bamWhich<-

Returns or sets a IntegerRangesList of bounds on reads to be extracted. A length 0 IntegerRangesList represents all reads.

bamFlag, bamFlag<-

Returns or sets an integer(2) representation of reads flagged to be kept or excluded.

bamSimpleCigar, bamSimpleCigar<-

Returns or sets a logical(1) vector indicating whether reads without indels or clipping be kept.

bamReverseComplement, bamReverseComplement<-

Returns or sets a logical(1) vector indicating whether reads on the minus strand will be returned with sequence reverse complemented and quality reversed.

Methods:

show

Compactly display the object.

Author(s)

Martin Morgan

See Also

scanBam

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
## defaults
p0 <- ScanBamParam()

## subset of reads based on genomic coordinates
which <- IRangesList(seq1=IRanges(1000, 2000),
                     seq2=IRanges(c(100, 1000), c(1000, 2000)))
p1 <- ScanBamParam(what=scanBamWhat(), which=which)

## subset of reads based on 'flag' value
p2 <- ScanBamParam(what=scanBamWhat(),
                   flag=scanBamFlag(isMinusStrand=FALSE))

## subset of fields
p3 <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth"))
                
## use
fl <- system.file("extdata", "ex1.bam", package="Rsamtools",
                  mustWork=TRUE)
res <- scanBam(fl, param=p2)[[1]]
lapply(res, head)

## tags; NM: edit distance; H1: 1-difference hits
p4 <- ScanBamParam(tag=c("NM", "H1"), what="flag")
bam4 <- scanBam(fl, param=p4)
str(bam4[[1]][["tag"]])

## tagFilter
p5 <- ScanBamParam(tag=c("NM", "H1"), tagFilter=list(NM=c(2, 3, 4)))
bam5 <- scanBam(fl, param=p5)
table(bam5[[1]][["tag"]][["NM"]])

## flag utils
flag <- scanBamFlag(isUnmappedQuery=FALSE, isMinusStrand=TRUE)

p6 <- ScanBamParam(what="flag")
bam6 <- scanBam(fl, param=p6)
flag6 <- bam6[[1]][["flag"]]
head(bamFlagAsBitMatrix(flag6[1:9]))
colSums(bamFlagAsBitMatrix(flag6))
flag
bamFlagAsBitMatrix(flag)

Example output

Loading required package: GenomeInfoDb
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: IRanges
Loading required package: GenomicRanges
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit

$qname
[1] "B7_591:4:96:693:509"    "EAS54_65:7:152:368:113" "EAS51_64:8:5:734:57"   
[4] "B7_591:1:289:587:906"   "EAS56_59:8:38:671:758"  "EAS56_61:6:18:467:281" 

$flag
[1]  73  73 137 137 137  73

$rname
[1] seq1 seq1 seq1 seq1 seq1 seq1
Levels: seq1 seq2

$strand
[1] + + + + + +
Levels: + - *

$pos
[1]  1  3  5  6  9 13

$qwidth
[1] 36 35 35 36 35 35

$mapq
[1] 99 99 99 63 99 99

$cigar
[1] "36M" "35M" "35M" "36M" "35M" "35M"

$mrnm
[1] <NA> <NA> <NA> <NA> <NA> <NA>
Levels: seq1 seq2

$mpos
[1] NA NA NA NA NA NA

$isize
[1] NA NA NA NA NA NA

$seq
  A DNAStringSet instance of length 6
    width seq
[1]    36 CACTAGTGGCTCATTGTAAATGTGTGGTTTAACTCG
[2]    35 CTAGTGGCTCATTGTAAATGTGTGGTTTAACTCGT
[3]    35 AGTGGCTCATTGTAAATGTGTGGTTTAACTCGTCC
[4]    36 GTGGCTCATTGTAATTTTTTGTTTTAACTCTTCTCT
[5]    35 GCTCATTGTAAATGTGTGGTTTAACTCGTCCATGG
[6]    35 ATTGTAAATGTGTGGTTTAACTCGTCCCTGGCCCA

$qual
  A PhredQuality instance of length 6
    width seq
[1]    36 <<<<<<<<<<<<<<<;<<<<<<<<<5<<<<<;:<;7
[2]    35 <<<<<<<<<<0<<<<655<<7<<<:9<<3/:<6):
[3]    35 <<<<<<<<<<<7;71<<;<;;<7;<<3;);3*8/5
[4]    36 (-&----,----)-)-),'--)---',+-,),''*,
[5]    35 <<<<<<<<<<<<<<<;<;7<<<<<<<<7<<;:<5%
[6]    35 <<<<<<<<;<<<8<<<<<;8:;6/686&;(16666

List of 2
 $ NM: int [1:3307] 0 0 0 5 0 1 0 1 0 1 ...
 $ H1: int [1:3307] 0 0 0 0 0 1 0 1 0 0 ...

  2   3   4 
104  45  22 
     isPaired isProperPair isUnmappedQuery hasUnmappedMate isMinusStrand
[1,]        1            0               0               1             0
[2,]        1            0               0               1             0
[3,]        1            0               0               1             0
[4,]        1            0               0               1             0
[5,]        1            0               0               1             0
[6,]        1            0               0               1             0
     isMateMinusStrand isFirstMateRead isSecondMateRead isSecondaryAlignment
[1,]                 0               1                0                    0
[2,]                 0               1                0                    0
[3,]                 0               0                1                    0
[4,]                 0               0                1                    0
[5,]                 0               0                1                    0
[6,]                 0               1                0                    0
     isNotPassingQualityControls isDuplicate
[1,]                           0           0
[2,]                           0           0
[3,]                           0           0
[4,]                           0           0
[5,]                           0           0
[6,]                           0           0
                   isPaired                isProperPair 
                       3307                        3144 
            isUnmappedQuery             hasUnmappedMate 
                         36                         127 
              isMinusStrand           isMateMinusStrand 
                       1641                        1606 
            isFirstMateRead            isSecondMateRead 
                       1654                        1653 
       isSecondaryAlignment isNotPassingQualityControls 
                          0                           0 
                isDuplicate 
                          0 
keep0 keep1 
 2031  2043 
      isPaired isProperPair isUnmappedQuery hasUnmappedMate isMinusStrand
keep0        1            1               1               1             0
keep1        1            1               0               1             1
      isMateMinusStrand isFirstMateRead isSecondMateRead isSecondaryAlignment
keep0                 1               1                1                    1
keep1                 1               1                1                    1
      isNotPassingQualityControls isDuplicate
keep0                           1           1
keep1                           1           1

Rsamtools documentation built on Nov. 8, 2020, 8:11 p.m.