contig.support: contig.support

contig.supportR Documentation

contig.support

Description

Takes as input a GRanges of bam alignments (e.g. outputted from bamUtils::read.bam) and a GRanges of rearranged reference aligned contigs (e.g. output of RSeqLib::BWA).

It identifies the subset of reads that support each of the contigs and "lifts" those reads through the read –> contig and contig –> reference alignments, returning supporting reads in reference coordinates.

The criteria for support include min.bases aligning to at least two chunks of the rearranged contig, and requirement that min.aligned.frac fraction of bases in every supporting read is aligned to that contig.

Additional requirements for support include not allowing split alignment of individual reads to the contigs (note: this does not mean we don't detect split reads that support the structural variant, this is captured by the contig -> reference alignment, we are just requiring the reads align (near) perfectly to the contig). and requiring alla alignments from a read pair (oriented to R1 frame of the fragment) to align to the same strand of the contig.

Finally, reads are not included in support if they align better to the reference than their native alignment, which is determined by comparing the $AS of their contig alignment with their original alignment score, stored in the provided metadata $AS field. If reference AS is not provided as metadata, it will is assumed to be zero.

$AS can be optionally recomputed against a DNAStringSet "ref" that represent the reference sequence. (Note that this "ref" does not have to be the full genome reference, it is just used to compute the alignment scores, and in fact for this to work efficiently, it's recommended that the provided reference sequence is local to the regions of interest, e.g. a few kb flanking each SV breakpoint, rather than the whole genome.)

The outputted reads include additional metadata including number of bases aligning to each chunk of the aligned contig.

Usage

contig.support(
  reads,
  contig,
  ref = NULL,
  chimeric = TRUE,
  strict = TRUE,
  cg.contig = gChain::cgChain(contig),
  isize.diff = 1000,
  min.bases = 20,
  min.aligned.frac = 0.95,
  new = TRUE,
  verbose = TRUE
)

Arguments

reads

GRanges in SAM / BAM format e.g. output of read.bam or BWA, with fields $qname, $cigar, $flag $seq all populated in standard fashion, and optionally $AS

contig

GRanges in SAM / BAM format wth fields $qname, $cigar and $seq all [populated

ref

optional DNAStringSet representing a reference sequence to compute alignments against

chimeric

logical flag whether to require reads to support junctions in chimericcontigs (ie discontiguous chunks on the reference), chimeric = FALSE

strict

strict requires that the alignment score of the read to contig alignment needs to be better for at least one read (and also not worse for any of the reads)

Value

reads re-aligned to the reference through the contigs with additional metadata describing features of the alignment

Author(s)

Marcin Imielinski mimielinski Saturday, Feb 12, 2022 01:14:28 AM WTF?


mskilab/skitools documentation built on Aug. 31, 2023, 1:13 p.m.