Import binary ‘BAM’ files into a list structure, with facilities for selecting what fields and which records are imported, and other operations to manipulate BAM files.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
scanBam(file, index=file, ..., param=ScanBamParam(what=scanBamWhat())) countBam(file, index=file, ..., param=ScanBamParam()) idxstatsBam(file, index=file, ...) scanBamHeader(files, ...) ## S4 method for signature 'character' scanBamHeader(files, ...) asBam(file, destination, ...) ## S4 method for signature 'character' asBam(file, destination, ..., overwrite=FALSE, indexDestination=TRUE) asSam(file, destination, ...) ## S4 method for signature 'character' asSam(file, destination, ..., overwrite=FALSE) filterBam(file, destination, index=file, ...) ## S4 method for signature 'character' filterBam(file, destination, index=file, ..., filter=FilterRules(), indexDestination=TRUE, param=ScanBamParam(what=scanBamWhat())) sortBam(file, destination, ...) ## S4 method for signature 'character' sortBam(file, destination, ..., byQname=FALSE, maxMemory=512) indexBam(files, ...) ## S4 method for signature 'character' indexBam(files, ...) mergeBam(files, destination, ...) ## S4 method for signature 'character' mergeBam(files, destination, ..., region = GRanges(), overwrite = FALSE, header = character(), byQname = FALSE, addRG = FALSE, compressLevel1 = FALSE, indexDestination = FALSE)
The character(1) file name of the ‘BAM’ ('SAM' for
The character() file names of the ‘BAM’ file to be
The character(1) name of the index file of the 'BAM' file being processed; this is given without the '.bai' extension.
The character(1) file name of the location where
the sorted, filtered, or merged output file will be created. For
A GRanges() instance with
Additional arguments, passed to methods.
A logical(1) indicating whether the destination can be over-written if it already exists.
A logical(1) indicating whether the created destination file should also be indexed.
A logical(1) indicating whether the sorted destination file should be sorted by Query-name (TRUE) or by mapping position (FALSE).
A character(1) file path for the header information to be used in the merged BAM file.
A logical(1) indicating whether the file name should be used as RG (read group) tag in the merged BAM file.
A logical(1) indicating whether the merged BAM file should be compressed to zip level 1.
A numerical(1) indicating the maximal amount of memory (in MB) that the function is allowed to use.
An instance of
scanBam function parses binary BAM files; text SAM files
can be parsed using R's
scan function, especially with
what to control the fields that are parsed.
countBam returns a count of records consistent with
idxstatsBam visit the index in
index(file), and quickly
returns the number of mapped and unmapped reads on each seqname.
scanBamHeader visits the header information in a BAM file,
returning for each file a list containing elements
text, as described below. The SAM / BAM specification does not
require that the content of the header be consistent with the content
of the file, e.g., more targets may be present that are represented by
reads in the file. An optional character vector argument containing
one or two elements of
what=c("targets", "text") can be used to
specify which elements of the header are returned.
asBam converts 'SAM' files to 'BAM' files, equivalent to
samtools view -Sb file > destination. The 'BAM' file is sorted
and an index created on the destination (with extension '.bai') when
asSam converts 'BAM' files to 'SAM' files, equivalent to
samtools view file > destination.
filterBam parses records in
file. Records satisfying the
bamSimpleCigar criteria of
param are accumulated to a default of
1000000 records (change this by specifying
BamFile instance; see
BamFile-class). These records are then parsed to
DataFrame and made available for further filtering by
FilterRules. Functions in the
instance should expect a single
DataFrame argument representing
all information specified by
param. Each function must return a
logical vector equal to the number of rows of the
DataFrame. Return values are used to include (when
corresponding records in the filtered BAM file. The BAM file is
destination. An index file is created on the
indexDestination=TRUE. It is more space- and
time-efficient to filter using
bamSimpleCigar, if appropriate, than to supply
filter may be a list of
instances, in which case
destination must be a character vector
of equal length. The original
file is then separately filtered
filter[[i]] as the filter
sortBam sorts the BAM file given as its first argument,
analogous to the “samtools sort” function.
indexBam creates an index for each BAM file specified,
analogous to the ‘samtools index’ function.
mergeBam merges 2 or more sorted BAM files. As with samtools,
the RG (read group) dictionary in the header of the BAM files is not
Details of the
ScanBamParam class are provide on its help page;
several salient points are reiterated here.
contain a field
what, specifying the components of the BAM
records to be returned. Valid values of
what are available with
ScanBamParam can contain an argument
which that specifies a subset of reads to return. This requires
that the BAM file be indexed, and that the file be named following
samtools convention as
can contain an argument
tag to specify which tags will be
scanBam,character-method returns a list of lists. The outer
list groups results from each
Ranges list of
bamWhich(param); the outer list is of length one when
bamWhich(param) has length 0. Each inner list contains elements
scanBamWhat(); elements omitted from
bamWhat(param) are removed. The content of non-null elements
are as follows, taken from the description in the samtools API
qname: This is the QNAME field in SAM Spec v1.4. The query name, i.e., identifier, associated with the read.
flag: This is the FLAG field in SAM Spec v1.4.
A numeric value summarizing details of the read. See
ScanBamParam and the
flag argument, and
rname: This is the RNAME field in SAM Spec v1.4. The name of the reference to which the read is aligned.
strand: The strand to which the read is aligned.
pos: This is the POS field in SAM Spec v1.4.
The genomic coordinate at the start of the alignment.
Coordinates are ‘left-most’, i.e., at the 3' end of a
read on the '-' strand, and 1-based. The position excludes
clipped nucleotides, even though soft-clipped nucleotides are
qwidth: The width of the query, as calculated from the
cigar encoding; normally equal to the width of the query
mapq: This is the MAPQ field in SAM Spec v1.4. The MAPping Quality.
cigar: This is the CIGAR field in SAM Spec v1.4. The CIGAR string.
mrnm: This is the RNEXT field in SAM Spec v1.4. The reference to which the mate (of a paired end or mate pair read) aligns.
mpos: This is the PNEXT field in SAM Spec v1.4. The position to which the mate aligns.
isize: This is the TLEN field in SAM Spec v1.4. Inferred insert size for paired end alignments.
seq: This is the SEQ field in SAM Spec v1.4. The query sequence, in the 5' to 3' orientation. If aligned to the minus strand, it is the reverse complement of the original sequence.
qual: This is the QUAL field in SAM Spec v1.4.
Phred-encoded, phred-scaled base quality score, oriented as
groupid: This is an integer vector of unique group ids
asMates=TRUE in a BamFile object.
groupid values are used to create the partitioning
mate_status: Returned (always) when
asMates=TRUE in a BamFile
object. This is a factor indicating status (
unmated) of each record.
idxstatsBam returns a
data.frame with columns
mapped (number of mapped
unmapped (number of unmapped
scanBamHeader returns a list, with one element for each file
files. The list contains two element. The
targets element contains target (reference) sequence
text element is itself a list with each element a
list corresponding to tags (e.g., ‘@SQ’) found in the header,
and the associated tag values.
asSam return the file name of the destination file.
sortBam returns the file name of the sorted file.
indexBam returns the file name of the index file created.
filterBam returns the file name of the destination file
Martin Morgan <email@example.com>. Thomas Unterhiner
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79
fl <- system.file("extdata", "ex1.bam", package="Rsamtools", mustWork=TRUE) ## ## scanBam ## res0 <- scanBam(fl)[] # always list-of-lists names(res0) length(res0[["qname"]]) lapply(res0, head, 3) table(width(res0[["seq"]])) # query widths table(res0[["qwidth"]], useNA="always") # query widths derived from cigar table(res0[["cigar"]], useNA="always") table(res0[["strand"]], useNA="always") table(res0[["flag"]], useNA="always") which <- RangesList(seq1=IRanges(1000, 2000), seq2=IRanges(c(100, 1000), c(1000, 2000))) p1 <- ScanBamParam(which=which, what=scanBamWhat()) res1 <- scanBam(fl, param=p1) names(res1) names(res1[]) p2 <- ScanBamParam(what=c("rname", "strand", "pos", "qwidth")) res2 <- scanBam(fl, param=p2) p3 <- ScanBamParam( what="flag", # information to query from BAM file flag=scanBamFlag(isMinusStrand=FALSE)) length(scanBam(fl, param=p3)[]$flag) ## ## idxstatsBam ## idxstatsBam(fl) ## ## filterBam ## param <- ScanBamParam( flag=scanBamFlag(isUnmappedQuery=FALSE), what="seq") dest <- filterBam(fl, tempfile(), param=param) countBam(dest) ## 3271 records ## filter to a single file filter <- FilterRules(list(MinWidth = function(x) width(x$seq) > 35)) dest <- filterBam(fl, tempfile(), param=param, filter=filter) countBam(dest) ## 398 records res3 <- scanBam(dest, param=ScanBamParam(what="seq"))[] table(width(res3$seq)) ## filter 1 file to 2 destinations filters <- list( FilterRules(list(long=function(x) width(x$seq) > 35)), FilterRules(list(short=function(x) width(x$seq) <= 35)) ) destinations <- replicate(2, tempfile()) dest <- filterBam(fl, destinations, param=param, filter=filters) lapply(dest, countBam) ## ## sortBam ## sorted <- sortBam(fl, tempfile()) ## ## scanBamParam re-orders 'which'; recover original order ## gwhich <- as(which, "GRanges")[c(2, 1, 3)] # example data cnt <- countBam(fl, param=ScanBamParam(which=gwhich)) reorderIdx <- unlist(split(seq_along(gwhich), seqnames(gwhich))) cnt cnt[reorderIdx,]