countBamInGRanges: Count reads from BAM file in genomic ranges

Description Usage Arguments Value See Also Examples

Description

Counts the number of reads with a specified minimum mapping quality from BAM files in genomic ranges specified by a GRanges object. This is a convenience function for counting the reads in ranges covering the targeted regions, such as the exons in exome enrichment experiments, from each sample. These read counts are used by exomeCopy in predicting CNVs in samples.

With the default setting (read.width=1), only the read starts are used for counting purposes (the leftmost position regardless of the strandedness of the read).

With the accurate read width, or with get.width = TRUE, then the function returns the number of overlapping reads, as returned by countOverlaps in the GenomicRanges package.

The function subdivideGRanges can be used first to subdivide ranges of different size into ranges of nearly equal width.

The BAM file requires a associated index file (see the man page for indexBam in the Rsamtools package).

Usage

1
  countBamInGRanges(bam.file,granges,min.mapq=1,read.width=1,stranded.start=FALSE,get.width=FALSE,remove.dup=FALSE)

Arguments

bam.file

The path of the BAM file for the sample to be counted.

granges

An object of type GRanges with the ranges in which to count reads.

min.mapq

The minimum mapping quality to count a read. Defaults to 1. Set to 0 for counting all reads.

read.width

The width of a read, used in counting overlaps of mapped reads with the genomic ranges. The default is 1, resulting in the counting of only read starts in genomic ranges. If the length of fixed width reads is used, e.g. 100 for 100bp reads, then the function will return the count of all overlapping reads with the genomic ranges. However, counting all overlapping reads introduces dependency between the counts in adjacent windows.

stranded.start

If true, the function will create reads of length read.width using the strand to determine the read location. A read with + or * strand will start at the given start position, and a read with - strand will end at (start position + CIGAR width - 1).

get.width

If true, the function should retrieve the read width from the CIGAR encoding rather than assign the value from read.width.

remove.dup

If true, the function will count only one read for each unique combination of position, strand and read width.

Value

An integer vector giving the number of reads over the input GRanges

See Also

Rsamtools GRanges subdivideGRanges

Examples

1
2
3
4
5
6
7
8
9
  ## get subdivided genomic ranges covering targeted region
  ## using subdivideGRanges()
  example(subdivideGRanges)

  ## BAM file included in Rsamtools package
  bam.file <- system.file("extdata", "mapping.bam", package="exomeCopy")

  ## extract read counts from the BAM file in these genomic ranges
  mcols(target.sub)$sample <- countBamInGRanges(bam.file,target.sub)

Example output

Loading required package: IRanges
Loading required package: BiocGenerics
Loading required package: parallel

Attaching package: 'BiocGenerics'

The following objects are masked from 'package:parallel':

    clusterApply, clusterApplyLB, clusterCall, clusterEvalQ,
    clusterExport, clusterMap, parApply, parCapply, parLapply,
    parLapplyLB, parRapply, parSapply, parSapplyLB

The following objects are masked from 'package:stats':

    IQR, mad, sd, var, xtabs

The following objects are masked from 'package:base':

    Filter, Find, Map, Position, Reduce, anyDuplicated, append,
    as.data.frame, cbind, colMeans, colSums, colnames, do.call,
    duplicated, eval, evalq, get, grep, grepl, intersect, is.unsorted,
    lapply, lengths, mapply, match, mget, order, paste, pmax, pmax.int,
    pmin, pmin.int, rank, rbind, rowMeans, rowSums, rownames, sapply,
    setdiff, sort, table, tapply, union, unique, unsplit, which,
    which.max, which.min

Loading required package: S4Vectors
Loading required package: stats4

Attaching package: 'S4Vectors'

The following object is masked from 'package:base':

    expand.grid

Loading required package: GenomicRanges
Loading required package: GenomeInfoDb
Loading required package: Rsamtools
Loading required package: Biostrings
Loading required package: XVector

Attaching package: 'Biostrings'

The following object is masked from 'package:base':

    strsplit


sbdvGR>   ## read in target region BED file
sbdvGR>   target.file <- system.file("extdata", "targets.bed", package="exomeCopy")

sbdvGR>   target.df <- read.delim(target.file, header=FALSE,
sbdvGR+ col.names=c("seqname","start","end")) 

sbdvGR>   ## create GRanges object with 5 ranges over 2 sequences
sbdvGR>   target <- GRanges(seqname=target.df$seqname,
sbdvGR+                IRanges(start=target.df$start,end=target.df$end))

sbdvGR>   ## subdivide into 7 smaller genomic ranges
sbdvGR>   target.sub <- subdivideGRanges(target)

exomeCopy documentation built on Nov. 8, 2020, 7:45 p.m.