normalize_multihit_clusters: Normalize multihit cluster IDs from multiple samples.

Description Usage Arguments Details Author(s) Examples

View source: R/normalize_multihit_clusters.R

Description

As the INSPIIRED pipeline calls multihits, or integration sites that can not be placed in a single location on the reference genome, it assigns multihitID's to the various locations that the integration site may exist. As each replicate is individually analyzed, multihitIDs for each replicate are different, even through they may refer to the same integration site. For this reason, normalize_multihit_clusters uses the previously assigned multihitID and genomic positions to reassign multihitIDs across multiple samples. Input for the function needs to be a GRanges object with a metadata column labeled as "multihitid". Due to the large amount of computation, this function requires the 'parallel' package and the number of cores to run.

Usage

1
2
3
normalize_multihit_clusters(multihits.gr)

normalize_multihit_clusters(multihits.gr, gap = 5L, grouping = NULL, cores = NULL)

Arguments

multihits.gr

GRanges object with a column named 'multihitid'.

gap

integer designating the range to which consider sites identical.

grouping

Character, name of the column used to assign groups that will not be compared to one another. Such as 'patient'.

cores

integer, the number of cores to use during processing. Data will be split by grouping and each group will be processed on a single core.

Details

normalize_multihit_clusters will normalize multihit clusterIDs across multiple samples so that multihit sites can be identified across time points, cell types, etc.

Author(s)

Christopher Nobles, Ph.D.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
dfr <- data.frame(
  "chr" = c("chr1", "chr2", "chr2", "chr3"),
  "position" = c(5379927, 92775920, 2719573, 7195924),
  "breakpoint" = c(5380070, 92775995, 2719450, 7195890),
  "strand" = c("+", "+", "-", "-"),
  "sampleName" = rep("GTSP1234-1", 4),
  stringsAsFactors = FALSE)

gr1 <- granges(db_to_granges(dfr))
gr2 <- gr1
gr1$multihitid <- c(1, 1, 2, 3)
gr2$multihitid <- c(4, 4, 5, 6)
gr1$patient <- rep(1, 4)
gr2$patient <- rep(2, 4)
gr <- c(gr1, gr2)

normalize_multihit_clusters(gr)

# Group by patient will keep the two samples from being normalized to
# eachother

normalize_multihit_clusters(gr, grouping = 'patient', cores = 2)

cnobles/gintools documentation built on Aug. 22, 2019, 10:36 a.m.