svOverlap | R Documentation |
Overlap SVs by SV type using one of the overlap approaches (see below). Variants covering a
genomic range (e.g. deletions, duplications, inversions) are overlapped while insertions are
clustered (using max.ins.dist
) and their size or sequence (if ins.seq.comp
)
are compared.
svOverlap(
query,
subject,
min.ol = 0.5,
method = c("reciprocal", "coverage", "bipartite"),
max.ins.dist = 20,
min.del.rol = 0.1,
range.seq.comp = FALSE,
ins.seq.comp = FALSE,
simprep = NULL,
nb.cores = 1,
log.level = c("CRITICAL", "WARNING", "INFO")
)
query |
a GRanges object with SVs |
subject |
another GRanges object with SVs |
min.ol |
the minimum overlap/coverage to be considered a match. Default is 0.5 |
method |
the method to annotate the overlap. Either 'coverage' (default) for the cumulative coverage (e.g. to deal with fragmented calls); or 'bipartite' for a 1-to-1 matching of variants in the calls and truth sets. |
max.ins.dist |
maximum distance for insertions to be clustered. Default is 20. |
min.del.rol |
minimum reciprocal overlap for deletions. Default is 0.1 |
range.seq.comp |
compare sequence instead of overlapping deletions/inversions/etc. Default is FALSE. |
ins.seq.comp |
compare sequence instead of insertion sizes. Default is FALSE. |
simprep |
optional simple repeat annotation. Default is NULL. If non-NULL, GRanges to be used to extend variants when overlapping/clustering |
nb.cores |
number of processors to use. Default is 1. |
log.level |
the level of information in the log. Default is "CRITICAL" (basically no log). |
Available overlap approaches, passed with method=
, include: reciprocal, coverage,
bipartite. If you are using this function directly, you might be interested in the
'reciprocal' method (default). When evaluating SVs versus a truthset, svevalOl
uses
the 'coverage' method to compare calls (absence/presence) and 'bipartite' to compare
genotypes (when run with the recommended settings).
The "reciprocal" method corresponds to the simple reciprocal overlap for the variants covering a genomic range (e.g. deletions, duplications, inversions), or the reciprocal size/sequence similarity for insertions.
With the "coverage" approach, a variant needs to be covered enough by variants from the other set to be counted "matched" or "overlapped". Here again, the ranges are overlapped for SV spanning a genomic region while for insertions, the size or aligned sequences are summed.
With the "bipartite" approach, the variants are first matched using the reciprocal overlap method (see "reciprocal"), and then matched one-to-one using bipartite clustering. This ensures that a variant in one set is only matched to one variant in the other set. Useful when comparing genotypes for example when redundancy should be penalized.
Equivalent SVs are sometimes recorded as quite different variants because placed at
different locations of a short tandem repeat. For example, imagine a large 100 bp
tandem repeat in the reference genome. An expansion of 50 bp might be represented
as a 50 bp insertion at the beginning of the repeat in the callset but at the end
of the repeat in the truth set. Because they are distant by 100 bp they might not
match. Instead of increasing the distance threshold too much, passing an annotation of
known simple repeats in the simprep=
parameter provides
a more flexible way of matching variants by first extending them with nearby simple
repeats. In this example, because we know of this tandem repeat, both insertions will
be extended to span the full annotated reference repeat, hence ensuring that they are
matched and compared (e.g. by reciprocal size or sequence alignment distance)
short tandem repeat.
a GRanges with information about pairs of SVs in query and subject that overlap
GRange |
intersected ranges (informative for "ranges" SVs) |
queryHits |
the id of the input query |
subjectHits |
the id of the input subject |
querSize |
the size of the input query |
subjectSize |
the size of the input subject |
interSize |
the size of the intersection (e.g. range, ins size, ins seq alignment) |
olScore |
the overlap score (usually the value of the reciprocal overlap) |
type |
the SV type of the pair |
Jean Monlong
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.