clusterSVs: Cluster SVs based on overlap/similarity

View source: R/clusterSVs.R

clusterSVsR Documentation

Cluster SVs based on overlap/similarity

Description

Cluster SVs based on overlap/similarity

Usage

clusterSVs(
  svs.gr,
  min.rol = 0.8,
  max.ins.dist = 20,
  range.seq.comp = FALSE,
  ins.seq.comp = FALSE,
  simprep = NULL,
  nb.cores = 1,
  batch.maxsize = 5000,
  log.level = c("CRITICAL", "WARNING", "INFO")
)

Arguments

svs.gr

A GRanges with SVs (for example read by readSVvcf or readSVvcf.multisamps)

min.rol

minimum reciprocal overlap for deletions and other "ranges" SVs. Default is 0.1

max.ins.dist

maximum distance for insertions to be clustered.

range.seq.comp

compare sequence instead of overlapping deletions/inversion/etc. Default is FALSE.

ins.seq.comp

compare sequence instead of insertion sizes. Default is FALSE.

simprep

optional simple repeat annotation. Default is NULL. If non-NULL, GRanges to be used to

nb.cores

number of processors to use. Default is 1.

batch.maxsize

batch size to aim. To reduce memory usage, see Details. Default is 5000.

log.level

the level of information in the log. Default is "CRITICAL" (basically no log).

Details

SVs are overlapped with each other. A graph is then built where nodes (SVs) are connected is they overlap/match. A cluster is a component in this graph.

To reduce the memory usage, the SVs are first grossly clustered into batches. The actual clustering (and graph construction) is then performed separately for each batch, potentially in parallel.

Value

the svs.gr object annotated with two new columns:

svsite

the ID of the cluster (or SV site)

clique

is this cluster a clique, i.e. all SVs overlapping/matching all other SVs in the cluster

Author(s)

Jean Monlong

Examples

## Not run: 

svs = readSVvcf('svs.vcf.gz', keep.ids=TRUE)
svs = clusterSVs(svs)


## End(Not run)

jmonlong/sveval documentation built on July 31, 2023, 7:50 p.m.