resolve_cluster_sources: Resolve primary sources from clusters with multiple souce...

Description Usage Arguments Details Author(s) Examples

View source: R/resolve_cluster_sources.R

Description

Given a list of unique integration site positions (reduced GRanges object) and a directed graph of connected components, this function identifies clusters where multiple source nodes exist and then identifies which source should be considered the primary source node, first based on abundance and then

Usage

1
resolve_cluster_sources(red.sites, graph)

Arguments

red.sites

GRanges object which has been reduced to single nt positions and contains the revmap from the original GRanges object. The object must also contain a column for cluster membership (clusID) and a column for abundance (fragLengths).

graph

a directed graph built from the red.sites object. Each node corresponds to a row in the red.sites object.

bias

either "upsteam" or "downstream", designating which position to choose if other decision metrics are tied.

Details

resolve_cluster_sources returns a graph where each cluster only has a single primary source node.

Author(s)

Christopher Nobles, Ph.D.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
gr <- gintools:::generate_test_granges(stdev = 3)
red.sites <- reduce(
  flank(gr, -1, start = TRUE),
  min.gapwidth = 0L,
  with.revmap = TRUE)
red.sites$siteID <- seq_along(red.sites)
revmap <- as.list(red.sites$revmap)
red.sites$abundance <- lengths(revmap)
red.hits <- GenomicRanges::as.data.frame(
  findOverlaps(red.sites, maxgap = 0L, drop.self = TRUE))
red.hits <- red.hits %>%
  mutate(q_pos = start(red.sites[queryHits])) %>%
  mutate(s_pos = start(red.sites[subjectHits])) %>%
  mutate(q_abund = red.sites[queryHits]$abundance) %>%
  mutate(s_abund = red.sites[subjectHits]$abundance) %>%
  mutate(strand = unique(strand(
    c(red.sites[queryHits], red.sites[subjectHits])))) %>%
  mutate(is.upstream = ifelse(
    strand == "+",
    q_pos < s_pos,
    q_pos > s_pos)) %>%
  mutate(keep = q_abund > s_abund) %>%
  mutate(keep = ifelse(
    q_abund == s_abund,
    is.upstream,
    keep)) %>%
  filter(keep)
g <- make_empty_graph(n = length(red.sites), directed = TRUE) %>%
  add_edges(unlist(mapply(
    c, red.hits$queryHits, red.hits$subjectHits, SIMPLIFY = FALSE)))
red.sites$clusID <- clusters(g)$membership
g <- connect_satalite_vertices(red.sites, g, gap = 2L, "upstream")
red.sites$clusID <- clusters(g)$membership
g <- break_connecting_source_paths(red.sites, g, "upstream")
red.sites$clusID <- clusters(g)$membership
g <- connect_adjacent_clusters(red.sites, g, gap = 5L, "upstream")
red.sites$clusID <- clusters(g)$membership

resolve_cluster_sources(red.sites, g, "upstream")

cnobles/gintools documentation built on Aug. 22, 2019, 10:36 a.m.