establish_bijection2d: Finds One-to-One Correspondence between Interactions from...
In kkrismer/idr2d: Irreproducible Discovery Rate for Genomic Interactions Data

establish_bijection2d

R Documentation

Finds One-to-One Correspondence between Interactions from Replicate 1 and 2

Description

This method establishes a bijective assignment between interactions from replicate 1 and 2. An interaction in replicate 1 is assigned to an interaction in replicate 2 if and only if (1) both anchors of the interactions overlap (or the gap between anchor A/B in replicate 1 and 2 is less than or equal to max_gap), and (2) there is no other interaction in replicate 2 that overlaps with the interaction in replicate 1 and has a lower ambiguity resolution value.

Usage

establish_bijection2d(
  rep1_df,
  rep2_df,
  ambiguity_resolution_method = c("overlap", "midpoint", "value"),
  max_gap = -1L
)

Arguments

rep1_df

data frame of observations (i.e., genomic interactions) of replicate 1, with at least the following columns (position of columns matter, column names are irrelevant):

column 1:	`chr_a`	character; genomic location of anchor A - chromosome (e.g., `"chr3"`)
column 2:	`start_a`	integer; genomic location of anchor A - start coordinate
column 3:	`end_a`	integer; genomic location of anchor A - end coordinate
column 4:	`chr_b`	character; genomic location of anchor B - chromosome (e.g., `"chr3"`)
column 5:	`start_b`	integer; genomic location of anchor B - start coordinate
column 6:	`end_b`	integer; genomic location of anchor B - end coordinate
column 7:	`value`	numeric; p-value, FDR, or heuristic used to rank the interactions

rep2_df

data frame of observations (i.e., genomic interactions) of replicate 2, with the following columns (position of columns matter, column names are irrelevant):

column 1:	`chr_a`	character; genomic location of anchor A - chromosome (e.g., `"chr3"`)
column 2:	`start_a`	integer; genomic location of anchor A - start coordinate
column 3:	`end_a`	integer; genomic location of anchor A - end coordinate
column 4:	`chr_b`	character; genomic location of anchor B - chromosome (e.g., `"chr3"`)
column 5:	`start_b`	integer; genomic location of anchor B - start coordinate
column 6:	`end_b`	integer; genomic location of anchor B - end coordinate
column 7:	`value`	numeric; p-value, FDR, or heuristic used to rank the interactions

ambiguity_resolution_method

defines how ambiguous assignments (when one interaction in replicate 1 overlaps with multiple interactions in replicate 2 or vice versa) are resolved. Available methods:

`"value"`	interactions are prioritized by ascending or descending `value` column (see `sorting_direction`), e.g., if two interactions in replicate 1 overlap with one interaction in replicate 2, the interaction from replicate 1 is chosen which has a lower (if `sorting_direction` is `"ascending"`) or higher (if `"descending"`) value
`"overlap"`	the interaction pair is chosen which has the highest relative overlap, i.e., overlap in nucleotides of replicate 1 interaction anchor A and replicate 2 interaction anchor A, plus replicate 1 interaction anchor B and replicate 2 interaction anchor B, normalized by their lengths
`"midpoint"`	the interaction pair is chosen which has the smallest distance between their anchor midpoints, i.e., distance from midpoint of replicate 1 interaction anchor A to midpoint of replicate 2 interaction anchor A, plus distance from midpoint of replicate 1 interaction anchor B to midpoint of replicate 2 interaction anchor B

max_gap

integer; maximum gap in nucleotides allowed between two anchors for them to be considered as overlapping (defaults to -1, i.e., overlapping anchors)

Value

Data frames rep1_df and rep2_df with the following columns:

column 1:	`chr_a`	character; genomic location of anchor A - chromosome (e.g., `"chr3"`)
column 2:	`start_a`	integer; genomic location of anchor A - start coordinate
column 3:	`end_a`	integer; genomic location of anchor A - end coordinate
column 4:	`chr_b`	character; genomic location of anchor B - chromosome (e.g., `"chr3"`)
column 5:	`start_b`	integer; genomic location of anchor B - start coordinate
column 6:	`end_b`	integer; genomic location of anchor B - end coordinate
column 7:	`value`	numeric; p-value, FDR, or heuristic used to rank the interactions
column 8:	`"rep_value"`	numeric; value of corresponding replicate interaction. If no corresponding interaction was found, `rep_value` is set to `NA`.
column 9:	`"rank"`	integer; rank of the interaction, established by value column, ascending order
column 10:	`"rep_rank"`	integer; rank of corresponding replicate interaction. If no corresponding interaction was found, `rep_rank` is set to `NA`.
column 11:	`"idx"`	integer; interaction index, primary key
column 12:	`"rep_idx"`	integer; specifies the index of the corresponding interaction in the other replicate (foreign key). If no corresponding interaction was found, `rep_idx` is set to `NA`.

Examples

rep1_df <- idr2d:::chiapet$rep1_df
rep1_df$fdr <- preprocess(rep1_df$fdr, "log_additive_inverse")

rep2_df <- idr2d:::chiapet$rep2_df
rep2_df$fdr <- preprocess(rep2_df$fdr, "log_additive_inverse")

mapping <- establish_bijection2d(rep1_df, rep2_df)

kkrismer/idr2d documentation built on Feb. 7, 2024, 2:23 p.m.