Description Usage Arguments Details Value
View source: R/junction_reads.R
The function takes a BAM file as input and returns all reads with "N" in the CIGAR string and that are properly paired.
1 2 3 4 5 6 7 8 | filter_junction_reads(
bam,
lib_type = "PE",
stranded = "reverse",
cores = 1,
yield_size = 2e+05,
tile_width = 1e+07
)
|
bam |
Character string. The path to the BAM file. |
lib_type |
Character string. Type of the sequencing library: either "SE" (single-end) or "PE" (paired-end). Default "PE". |
stranded |
Character string. Strand type of the sequencing protocol: "unstranded" for unstranded protocols; "forward" or "reverse" for stranded protocols. In a "forward" protocol, the first read in a pair (or sinlge-end reads) comes from the forward (sense) strand and in a "reverse" protocol from the reverse (antisense) strand. See the Salmon documentation for an explanation of the different fragment library types. Default "reverse". |
cores |
Integer scalar. Number of cores to use. Default 1. |
yield_size |
Integer scalar. Read the BAM file in chunks of this size. |
tile_width |
Integer scalar. The genome will be partitioned into tiles
of this size. The reads in the |
The yield_size
param determines the runtime: The bigger, the faster. If
possible, use at least 200000. The same goes for the tile_width
param: the
bigger the faster. Also, the BAM file can be read in parallel (with mclapply)
if cores
> 1.
Per tile, we only keep the reads that have their end location inside the tile. This prevents having duplicate reads in the output, in case the read overlaps the tile boundary.
Note: The human gneome has ~3 billion base pairs. If we have a BAM file with 100 million reads –> assuming uniform read coverage: 0.033 reads per bp a genome tile of 1e7 contains 333k reads.
GAlignments object with all junction reads from the bam
file.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.