filter_junction_reads: Junction reads from BAM file

Description Usage Arguments Details Value

View source: R/junction_reads.R

Description

The function takes a BAM file as input and returns all reads with "N" in the CIGAR string and that are properly paired.

Usage

1
2
3
4
5
6
7
8
filter_junction_reads(
  bam,
  lib_type = "PE",
  stranded = "reverse",
  cores = 1,
  yield_size = 2e+05,
  tile_width = 1e+07
)

Arguments

bam

Character string. The path to the BAM file.

lib_type

Character string. Type of the sequencing library: either "SE" (single-end) or "PE" (paired-end). Default "PE".

stranded

Character string. Strand type of the sequencing protocol: "unstranded" for unstranded protocols; "forward" or "reverse" for stranded protocols. In a "forward" protocol, the first read in a pair (or sinlge-end reads) comes from the forward (sense) strand and in a "reverse" protocol from the reverse (antisense) strand. See the Salmon documentation for an explanation of the different fragment library types. Default "reverse".

cores

Integer scalar. Number of cores to use. Default 1.

yield_size

Integer scalar. Read the BAM file in chunks of this size.

tile_width

Integer scalar. The genome will be partitioned into tiles of this size. The reads in the bam file within each tile will be consecutively imported (or in parallel if cores > 1). Default 1e7.

Details

The yield_size param determines the runtime: The bigger, the faster. If possible, use at least 200000. The same goes for the tile_width param: the bigger the faster. Also, the BAM file can be read in parallel (with mclapply) if cores > 1.

Per tile, we only keep the reads that have their end location inside the tile. This prevents having duplicate reads in the output, in case the read overlaps the tile boundary.

Note: The human gneome has ~3 billion base pairs. If we have a BAM file with 100 million reads –> assuming uniform read coverage: 0.033 reads per bp a genome tile of 1e7 contains 333k reads.

Value

GAlignments object with all junction reads from the bam file.


khembach/DISCERNS documentation built on June 23, 2020, 3:35 p.m.