unmapped_read: Extract unmapped reads with a mapped mate from a bam file
In cancer-genomics/trellis: Somatic structural variant analysis

unmapped_read

R Documentation

Extract unmapped reads with a mapped mate from a bam file

Description

This function extracts all unmapped reads with a mate that overlaps with a set of query genomic intervals. Internally, this function uses scanBam to scan the bam files. In our application, we typically create a bam file containing only mapped-unmapped read pairs as this greatly reduces the size of the bam file to query. In particular, we create a bam file with the following set of flags:

Usage

unmapped_read(
  bam.file,
  query,
  yield_size = 1e+06,
  maxgap = 500,
  what = scanBamWhat()
)

Arguments

`bam.file`	character-string providing complete path to BAM file
`query`	a `GRanges` representation of genomic intervals to query for a mapped read with an unmapped mate. For example, a set of rearrangement intervals.
`yield_size`	the number of reads to extract from the bam file at once using `scanBam`
`maxgap`	the gap allowed between the query interval and the mapped read to consider the two intervals overlapping
`what`	a character vector of fields to keep from the bam file. Defaults to `scanBamWhat()`.

Details

samtools view -b -f 4 -F 8 $input > "unmapped-mapped/${input}"

The GRanges object returned by this function includes the sequence of the reads so that the sequences can be subsequently written to disk in fasta format and realigned with a local alignment algorithm such as BLAT that allows for split read alignments.

Value

a GRanges object of mapped reads with unmapped mates.

Examples

extdata <- system.file("extdata", package="svbams")
bam <- file.path(extdata, "cgov44t_revised.bam")
region <- GRanges(seqnames = "chr8",
ranges = IRanges(start = 128691748, end = 128692097))
unmapped_read(bam.file = bam, query = region, yield_size = 1e6)

cancer-genomics/trellis documentation built on Aug. 20, 2024, 5:48 p.m.