normalize_multihit_clusters: Normalize multihit cluster IDs from multiple samples.
In cnobles/gintools: Genomic DNA Integration Analysis Tools

Description Usage Arguments Details Author(s) Examples

View source: R/normalize_multihit_clusters.R

As the INSPIIRED pipeline calls multihits, or integration sites that can not be placed in a single location on the reference genome, it assigns multihitID's to the various locations that the integration site may exist. As each replicate is individually analyzed, multihitIDs for each replicate are different, even through they may refer to the same integration site. For this reason, normalize_multihit_clusters uses the previously assigned multihitID and genomic positions to reassign multihitIDs across multiple samples. Input for the function needs to be a GRanges object with a metadata column labeled as "multihitid". Due to the large amount of computation, this function requires the 'parallel' package and the number of cores to run.

1
2
3

normalize_multihit_clusters(multihits.gr)

normalize_multihit_clusters(multihits.gr, gap = 5L, grouping = NULL, cores = NULL)

`multihits.gr`	GRanges object with a column named 'multihitid'.
`gap`	integer designating the range to which consider sites identical.
`grouping`	Character, name of the column used to assign groups that will not be compared to one another. Such as 'patient'.
`cores`	integer, the number of cores to use during processing. Data will be split by grouping and each group will be processed on a single core.

normalize_multihit_clusters will normalize multihit clusterIDs across multiple samples so that multihit sites can be identified across time points, cell types, etc.

Christopher Nobles, Ph.D.

dfr <- data.frame(
  "chr" = c("chr1", "chr2", "chr2", "chr3"),
  "position" = c(5379927, 92775920, 2719573, 7195924),
  "breakpoint" = c(5380070, 92775995, 2719450, 7195890),
  "strand" = c("+", "+", "-", "-"),
  "sampleName" = rep("GTSP1234-1", 4),
  stringsAsFactors = FALSE)

gr1 <- granges(db_to_granges(dfr))
gr2 <- gr1
gr1$multihitid <- c(1, 1, 2, 3)
gr2$multihitid <- c(4, 4, 5, 6)
gr1$patient <- rep(1, 4)
gr2$patient <- rep(2, 4)
gr <- c(gr1, gr2)

normalize_multihit_clusters(gr)

# Group by patient will keep the two samples from being normalized to
# eachother

normalize_multihit_clusters(gr, grouping = 'patient', cores = 2)