clusterSVs | R Documentation |
Cluster SVs based on overlap/similarity
clusterSVs(
svs.gr,
min.rol = 0.8,
max.ins.dist = 20,
range.seq.comp = FALSE,
ins.seq.comp = FALSE,
simprep = NULL,
nb.cores = 1,
batch.maxsize = 5000,
log.level = c("CRITICAL", "WARNING", "INFO")
)
svs.gr |
A GRanges with SVs (for example read by |
min.rol |
minimum reciprocal overlap for deletions and other "ranges" SVs. Default is 0.1 |
max.ins.dist |
maximum distance for insertions to be clustered. |
range.seq.comp |
compare sequence instead of overlapping deletions/inversion/etc. Default is FALSE. |
ins.seq.comp |
compare sequence instead of insertion sizes. Default is FALSE. |
simprep |
optional simple repeat annotation. Default is NULL. If non-NULL, GRanges to be used to |
nb.cores |
number of processors to use. Default is 1. |
batch.maxsize |
batch size to aim. To reduce memory usage, see Details. Default is 5000. |
log.level |
the level of information in the log. Default is "CRITICAL" (basically no log). |
SVs are overlapped with each other. A graph is then built where nodes (SVs) are connected is they overlap/match. A cluster is a component in this graph.
To reduce the memory usage, the SVs are first grossly clustered into batches. The actual clustering (and graph construction) is then performed separately for each batch, potentially in parallel.
the svs.gr object annotated with two new columns:
svsite |
the ID of the cluster (or SV site) |
clique |
is this cluster a clique, i.e. all SVs overlapping/matching all other SVs in the cluster |
Jean Monlong
## Not run:
svs = readSVvcf('svs.vcf.gz', keep.ids=TRUE)
svs = clusterSVs(svs)
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.