svOverlap: Overlap SVs

View source: R/svOverlap.R

svOverlapR Documentation

Overlap SVs

Description

Overlap SVs by SV type using one of the overlap approaches (see below). Variants covering a genomic range (e.g. deletions, duplications, inversions) are overlapped while insertions are clustered (using max.ins.dist) and their size or sequence (if ins.seq.comp) are compared.

Usage

svOverlap(
  query,
  subject,
  min.ol = 0.5,
  method = c("reciprocal", "coverage", "bipartite"),
  max.ins.dist = 20,
  min.del.rol = 0.1,
  range.seq.comp = FALSE,
  ins.seq.comp = FALSE,
  simprep = NULL,
  nb.cores = 1,
  log.level = c("CRITICAL", "WARNING", "INFO")
)

Arguments

query

a GRanges object with SVs

subject

another GRanges object with SVs

min.ol

the minimum overlap/coverage to be considered a match. Default is 0.5

method

the method to annotate the overlap. Either 'coverage' (default) for the cumulative coverage (e.g. to deal with fragmented calls); or 'bipartite' for a 1-to-1 matching of variants in the calls and truth sets.

max.ins.dist

maximum distance for insertions to be clustered. Default is 20.

min.del.rol

minimum reciprocal overlap for deletions. Default is 0.1

range.seq.comp

compare sequence instead of overlapping deletions/inversions/etc. Default is FALSE.

ins.seq.comp

compare sequence instead of insertion sizes. Default is FALSE.

simprep

optional simple repeat annotation. Default is NULL. If non-NULL, GRanges to be used to extend variants when overlapping/clustering

nb.cores

number of processors to use. Default is 1.

log.level

the level of information in the log. Default is "CRITICAL" (basically no log).

Details

Available overlap approaches, passed with method=, include: reciprocal, coverage, bipartite. If you are using this function directly, you might be interested in the 'reciprocal' method (default). When evaluating SVs versus a truthset, svevalOl uses the 'coverage' method to compare calls (absence/presence) and 'bipartite' to compare genotypes (when run with the recommended settings).

The "reciprocal" method corresponds to the simple reciprocal overlap for the variants covering a genomic range (e.g. deletions, duplications, inversions), or the reciprocal size/sequence similarity for insertions.

With the "coverage" approach, a variant needs to be covered enough by variants from the other set to be counted "matched" or "overlapped". Here again, the ranges are overlapped for SV spanning a genomic region while for insertions, the size or aligned sequences are summed.

With the "bipartite" approach, the variants are first matched using the reciprocal overlap method (see "reciprocal"), and then matched one-to-one using bipartite clustering. This ensures that a variant in one set is only matched to one variant in the other set. Useful when comparing genotypes for example when redundancy should be penalized.

Equivalent SVs are sometimes recorded as quite different variants because placed at different locations of a short tandem repeat. For example, imagine a large 100 bp tandem repeat in the reference genome. An expansion of 50 bp might be represented as a 50 bp insertion at the beginning of the repeat in the callset but at the end of the repeat in the truth set. Because they are distant by 100 bp they might not match. Instead of increasing the distance threshold too much, passing an annotation of known simple repeats in the simprep= parameter provides a more flexible way of matching variants by first extending them with nearby simple repeats. In this example, because we know of this tandem repeat, both insertions will be extended to span the full annotated reference repeat, hence ensuring that they are matched and compared (e.g. by reciprocal size or sequence alignment distance) short tandem repeat.

Value

a GRanges with information about pairs of SVs in query and subject that overlap

GRange

intersected ranges (informative for "ranges" SVs)

queryHits

the id of the input query

subjectHits

the id of the input subject

querSize

the size of the input query

subjectSize

the size of the input subject

interSize

the size of the intersection (e.g. range, ins size, ins seq alignment)

olScore

the overlap score (usually the value of the reciprocal overlap)

type

the SV type of the pair

Author(s)

Jean Monlong


jmonlong/sveval documentation built on July 31, 2023, 7:50 p.m.