IDR2D is an extension of the original method IDR [@li2011], which was intended for ChIP-seq peaks (or one-dimensional genomic data). This package applies the method to two-dimensional genomic data, such as interactions between two genomic loci (also called anchors). Genomic interaction data is generated by genome-wide methods such as Hi-C [@pmid20461051], ChIA-PET [@pmid19247990], and HiChIP [@pmid25128017].
knitr::opts_chunk$set(fig.width = 7, fig.height = 7, echo = FALSE, warning = FALSE, message = FALSE, dev = 'png', out.extra = 'style="border-width: 0;"')
Load example data:
rep1_df <- idr2d:::chiapet$rep1_df rep2_df <- idr2d:::chiapet$rep2_df
library(DT) header <- htmltools::withTags(table( class = 'display', thead( tr( th(colspan = 3, "anchor A"), th(colspan = 3, "anchor B"), th(rowspan = 2, "FDR") ), tr( lapply(rep(c("chr.", "start coordinate", "end coordinate"), 2), th) ) ) )) datatable(rep1_df[seq_len(min(nrow(rep1_df), 1000)), ], container = header, rownames = FALSE, options = list(searching = FALSE)) %>% formatRound("fdr", 3)
Only the first 1000 interactions are shown.
datatable(rep2_df[seq_len(min(nrow(rep2_df), 1000)), ], container = header, rownames = FALSE, options = list(searching = FALSE)) %>% formatRound("fdr", 3)
Only the first 1000 interactions are shown.
Load the package:
library(idr2d)
Estimate IDR:
idr_results <- estimate_idr2d(rep1_df, rep2_df, value_transformation = "log_additive_inverse") rep1_idr_df <- idr_results$rep1_df
Important to note here is that the appropriate value transformation depends
on the semantics of the value column (always the seventh column) in rep1_df
and rep2_df
. This column is used to establish a ranking between interactions,
with highly significant interactions on top of the list and least significant
interactions (i.e., most likely noise) at the bottom of the list. The ranking
is established by the value column, sorted in descending order. Since our
value column contains FDRs (the lower, the more significant), we need to
transform the values to comply with the assumption that high values indicate
high significance. For p-values and p-value derived measures (like Q values),
the log_additive_inverse
transformation (-log(x)
) is recommended.
# avoid CRAN warnings rank <- rep_rank <- value <- rep_value <- idr <- NULL header <- htmltools::withTags(table( class = 'display', thead( tr( th("rank in R1"), th("rank in R2"), th("transformed value in R1"), th("transformed value in R2"), th("IDR") ) ) )) df <- dplyr::select(rep1_idr_df, rank, rep_rank, value, rep_value, idr) datatable(df[seq_len(min(nrow(df), 1000)), ], rownames = FALSE, options = list(searching = FALSE), container = header) %>% formatRound(c("value", "rep_value", "idr"), 3)
Only the first 1000 observations are shown.
summary(idr_results)
draw_idr_distribution_histogram(rep1_idr_df)
draw_rank_idr_scatterplot(rep1_idr_df)
draw_value_idr_scatterplot(rep1_idr_df)
Most of the functionality of the IDR2D package is also offered through the website at https://idr2d.mit.edu.
For a more detailed discussion on IDR2D, please have a look at the IDR2D paper:
IDR2D identifies reproducible genomic interactions
Konstantin Krismer, Yuchun Guo, and David K. Gifford
Nucleic Acids Research, Volume 48, Issue 6, 06 April 2020, Page e31; DOI: https://doi.org/10.1093/nar/gkaa030
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.