cluster_multihits: Identified clusters for multihit integration sites.

Description Usage Arguments Details Author(s)

View source: R/cluster_multihits.R


Given a GRanges object of integration sites which align to multiple location on the reference genome (multihit sites), this function will append a column with cluster designation. Multihits are considered to be part of the same cluster if they share any alignments with other multihits. This function tackles this memory intensive process through iterive measures, involving randomly sampling from the given set of data to start cluster formation, refinement, and then resamples from the remaining data till all information has been used.


cluster_multihits(multihits, read_col = NULL)

cluster_multihits(multihits, read_col = NULL, max_gap = 5L, iterations = 5L)



a GRanges object containing sets of integration sites or ranges (one alignment per row) and containing several columns, inlcluding "ID" (read identifier) and "key_pair" (unique identifier R2 and R1 sequences).


an integer designating the nucleotide distance or window for which to group integration sites.


integer The number of interations of cluster generation to perform and the number of random subsets that will be made from the data. More iterations leads to less overhead memory and more time required.


character string matching the name of the column of the GRanges object given in 'multihits' which designates which ranges are associated together. For example, the read name can be used here to show which alignments were associated with the same read.


cluster_multihits returns a GRanges object with cluster information.


Christopher Nobles, Ph.D.

cnobles/gintools documentation built on Aug. 22, 2019, 10:36 a.m.