establish_overlap1d: Establish m:n Mapping Between Peaks from Replicate 1 and 2

Description Usage Arguments Value Examples

View source: R/auxiliary.R

Description

This method returns all overlapping interactions between two replicates. For each pair of overlapping interactions, the ambiguity resolution value (ARV) is calculated, which helps to reduce the m:n mapping to a 1:1 mapping. The semantics of the ARV depend on the specified ambiguity_resolution_method, but in general interaction pairs with lower ARVs have priority over interaction pairs with higher ARVs when the bijective mapping is established.

Usage

1
2
3
4
5
6
establish_overlap1d(
  rep1_df,
  rep2_df,
  ambiguity_resolution_method = c("overlap", "midpoint", "value"),
  max_gap = -1L
)

Arguments

rep1_df

data frame of observations (i.e., genomic peaks) of replicate 1, with at least the following columns (position of columns matter, column names are irrelevant):

column 1: chr character; genomic location of peak - chromosome (e.g., "chr3")
column 2: start integer; genomic location of peak - start coordinate
column 3: end integer; genomic location of peak - end coordinate
column 4: value numeric; p-value, FDR, or heuristic used to rank the interactions
rep2_df

data frame of observations (i.e., genomic peaks) of replicate 2, with the following columns (position of columns matter, column names are irrelevant):

column 1: chr character; genomic location of peak - chromosome (e.g., "chr3")
column 2: start integer; genomic location of peak - start coordinate
column 3: end integer; genomic location of peak - end coordinate
column 4: value numeric; p-value, FDR, or heuristic used to rank the interactions
ambiguity_resolution_method

defines how ambiguous assignments (when one interaction in replicate 1 overlaps with multiple interactions in replicate 2 or vice versa) are resolved. Available methods:

"value" interactions are prioritized by ascending or descending value column (see sorting_direction), e.g., if two interactions in replicate 1 overlap with one interaction in replicate 2, the interaction from replicate 1 is chosen which has a lower (if sorting_direction is "ascending") or higher (if "descending") value
"overlap" the interaction pair is chosen which has the highest relative overlap, i.e., overlap in nucleotides of replicate 1 interaction anchor A and replicate 2 interaction anchor A, plus replicate 1 interaction anchor B and replicate 2 interaction anchor B, normalized by their lengths
"midpoint" the interaction pair is chosen which has the smallest distance between their anchor midpoints, i.e., distance from midpoint of replicate 1 interaction anchor A to midpoint of replicate 2 interaction anchor A, plus distance from midpoint of replicate 1 interaction anchor B to midpoint of replicate 2 interaction anchor B
max_gap

integer; maximum gap in nucleotides allowed between two anchors for them to be considered as overlapping (defaults to -1, i.e., overlapping anchors)

Value

data frame with the following columns:

column 1: rep1_idx index of interaction in replicate 1 (i.e., row index in rep1_df)
column 2: rep2_idx index of interaction in replicate 2 (i.e., row index in rep2_df)
column 3: arv ambiguity resolution value used turn m:n mapping into 1:1 mapping. Interaction pairs with lower arv are prioritized.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
rep1_df <- idr2d:::chipseq$rep1_df
rep1_df$value <- preprocess(rep1_df$value, "log_additive_inverse")

rep2_df <- idr2d:::chipseq$rep2_df
rep2_df$value <- preprocess(rep2_df$value, "log_additive_inverse")

# shuffle to break preexisting order
rep1_df <- rep1_df[sample.int(nrow(rep1_df)), ]
rep2_df <- rep2_df[sample.int(nrow(rep2_df)), ]

# sort by value column
rep1_df <- dplyr::arrange(rep1_df, value)
rep2_df <- dplyr::arrange(rep2_df, value)

pairs_df <- establish_overlap1d(rep1_df, rep2_df)

idr2d documentation built on Nov. 8, 2020, 6:16 p.m.