estimate_idr2d: Estimates IDR for Genomic Interaction Data

View source: R/main.R

estimate_idr2dR Documentation

Estimates IDR for Genomic Interaction Data

Description

This method estimates Irreproducible Discovery Rates (IDR) between two replicates of experiments identifying genomic interactions, such as Hi-C, ChIA-PET, and HiChIP.

Usage

estimate_idr2d(
  rep1_df,
  rep2_df,
  value_transformation = c("identity", "additive_inverse", "multiplicative_inverse",
    "log", "log_additive_inverse"),
  ambiguity_resolution_method = c("overlap", "midpoint", "value"),
  remove_nonstandard_chromosomes = TRUE,
  max_factor = 1.5,
  jitter_factor = 1e-04,
  max_gap = -1L,
  mu = 0.1,
  sigma = 1,
  rho = 0.2,
  p = 0.5,
  eps = 0.001,
  max_iteration = 30,
  local_idr = TRUE
)

Arguments

rep1_df

data frame of observations (i.e., genomic interactions) of replicate 1, with at least the following columns (position of columns matter, column names are irrelevant):

column 1: chr_a character; genomic location of anchor A - chromosome (e.g., "chr3")
column 2: start_a integer; genomic location of anchor A - start coordinate
column 3: end_a integer; genomic location of anchor A - end coordinate
column 4: chr_b character; genomic location of anchor B - chromosome (e.g., "chr3")
column 5: start_b integer; genomic location of anchor B - start coordinate
column 6: end_b integer; genomic location of anchor B - end coordinate
column 7: value numeric; p-value, FDR, or heuristic used to rank the interactions
rep2_df

data frame of observations (i.e., genomic interactions) of replicate 2, with the following columns (position of columns matter, column names are irrelevant):

column 1: chr_a character; genomic location of anchor A - chromosome (e.g., "chr3")
column 2: start_a integer; genomic location of anchor A - start coordinate
column 3: end_a integer; genomic location of anchor A - end coordinate
column 4: chr_b character; genomic location of anchor B - chromosome (e.g., "chr3")
column 5: start_b integer; genomic location of anchor B - start coordinate
column 6: end_b integer; genomic location of anchor B - end coordinate
column 7: value numeric; p-value, FDR, or heuristic used to rank the interactions
value_transformation

the values in x have to be transformed in a way such that when ordered in descending order, more significant interactions end up on top of the list. If the values in x are p-values, "log_additive_inverse" is recommended. The following transformations are supported:

"identity" no transformation is performed on x
"additive_inverse" x. = -x
"multiplicative_inverse" x. = 1 / x
"log" x. = log(x). Note: zeros are replaced by .Machine$double.xmin
"log_additive_inverse" x. = -log(x), recommended if x are p-values. Note: zeros are replaced by .Machine$double.xmin

either "ascending" (more significant interactions have lower value in value column) or "descending" (more significant interactions have higher value in value column)

ambiguity_resolution_method

defines how ambiguous assignments (when one interaction in replicate 1 overlaps with multiple interactions in replicate 2 or vice versa) are resolved. Available methods:

"value" interactions are prioritized by ascending or descending value column (see sorting_direction), e.g., if two interactions in replicate 1 overlap with one interaction in replicate 2, the interaction from replicate 1 is chosen which has a lower (if sorting_direction is "ascending") or higher (if "descending") value
"overlap" the interaction pair is chosen which has the highest relative overlap, i.e., overlap in nucleotides of replicate 1 interaction anchor A and replicate 2 interaction anchor A, plus replicate 1 interaction anchor B and replicate 2 interaction anchor B, normalized by their lengths
"midpoint" the interaction pair is chosen which has the smallest distance between their anchor midpoints, i.e., distance from midpoint of replicate 1 interaction anchor A to midpoint of replicate 2 interaction anchor A, plus distance from midpoint of replicate 1 interaction anchor B to midpoint of replicate 2 interaction anchor B
remove_nonstandard_chromosomes

removes interactions containing genomic locations on non-standard chromosomes using keepStandardChromosomes (default is TRUE)

max_factor

numeric; controls the replacement values for Inf and -Inf. Inf are replaced by max(x) * max_factor and -Inf are replaced by min(x) / max_factor.

jitter_factor

numeric; controls the magnitude of the noise that is added to x. This is done to break ties in x. Set jitter_factor = NULL for no jitter.

max_gap

integer; maximum gap in nucleotides allowed between two anchors for them to be considered as overlapping (defaults to -1, i.e., overlapping anchors)

mu

a starting value for the mean of the reproducible component.

sigma

a starting value for the standard deviation of the reproducible component.

rho

a starting value for the correlation coefficient of the reproducible component.

p

a starting value for the proportion of reproducible component.

eps

Stopping criterion. Iterations stop when the increment of log-likelihood is < eps*log-likelihood, Default=0.001.

max_iteration

integer; maximum number of iterations for IDR estimation (defaults to 30)

local_idr

see est.IDR

Value

List with three components, (rep1_df, rep2_df, and analysis_type) containing the interactions from input data frames rep1_df and rep2_df with the following additional columns:

column 1: chr_a character; genomic location of anchor A - chromosome (e.g., "chr3")
column 2: start_a integer; genomic location of anchor A - start coordinate
column 3: end_a integer; genomic location of anchor A - end coordinate
column 4: chr_b character; genomic location of anchor B - chromosome (e.g., "chr3")
column 5: start_b integer; genomic location of anchor B - start coordinate
column 6: end_b integer; genomic location of anchor B - end coordinate
column 7: value numeric; p-value, FDR, or heuristic used to rank the interactions
column 8: "rep_value" numeric; value of corresponding replicate interaction. If no corresponding interaction was found, rep_value is set to NA.
column 9: "rank" integer; rank of the interaction, established by value column, ascending order
column 10: "rep_rank" integer; rank of corresponding replicate interaction. If no corresponding interaction was found, rep_rank is set to NA.
column 11: "idx" integer; interaction index, primary key
column 12: "rep_idx" integer; specifies the index of the corresponding interaction in the other replicate (foreign key). If no corresponding interaction was found, rep_idx is set to NA.
idr IDR of the interaction and the corresponding interaction in the other replicate. If no corresponding interaction was found, idr is set to NA.

References

Q. Li, J. B. Brown, H. Huang and P. J. Bickel. (2011) Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics, Vol. 5, No. 3, 1752-1779.

Examples

idr_results <- estimate_idr2d(idr2d:::chiapet$rep1_df,
                              idr2d:::chiapet$rep2_df,
                              value_transformation = "log_additive_inverse")
summary(idr_results)


kkrismer/idr2d documentation built on Feb. 7, 2024, 2:23 p.m.