estimate_idr1d: Estimates IDR for Genomic Peak Data
In idr2d: Irreproducible Discovery Rate for Genomic Interactions Data

Description Usage Arguments Value References Examples

This method estimates Irreproducible Discovery Rates (IDR) for peaks in replicated ChIP-seq experiments.

estimate_idr1d(
  rep1_df,
  rep2_df,
  value_transformation = c("identity", "additive_inverse", "multiplicative_inverse",
    "log", "log_additive_inverse"),
  ambiguity_resolution_method = c("overlap", "midpoint", "value"),
  remove_nonstandard_chromosomes = TRUE,
  max_factor = 1.5,
  jitter_factor = 1e-04,
  max_gap = -1L,
  mu = 0.1,
  sigma = 1,
  rho = 0.2,
  p = 0.5,
  eps = 0.001,
  max_iteration = 30,
  local_idr = TRUE
)

rep1_df

data frame of observations (i.e., genomic peaks) of replicate 1, with at least the following columns (position of columns matter, column names are irrelevant):

column 1:	`chr`	character; genomic location of peak - chromosome (e.g., `"chr3"`)
column 2:	`start`	integer; genomic location of peak - start coordinate
column 3:	`end`	integer; genomic location of peak - end coordinate
column 4:	`value`	numeric; p-value, FDR, or heuristic used to rank the interactions

rep2_df

data frame of observations (i.e., genomic peaks) of replicate 2, with the following columns (position of columns matter, column names are irrelevant):

column 1:	`chr`	character; genomic location of peak - chromosome (e.g., `"chr3"`)
column 2:	`start`	integer; genomic location of peak - start coordinate
column 3:	`end`	integer; genomic location of peak - end coordinate
column 4:	`value`	numeric; p-value, FDR, or heuristic used to rank the interactions

value_transformation

the values in x have to be transformed in a way such that when ordered in descending order, more significant interactions end up on top of the list. If the values in x are p-values, "log_additive_inverse" is recommended. The following transformations are supported:

`"identity"`	no transformation is performed on `x`
`"additive_inverse"`	`x. = -x`
`"multiplicative_inverse"`	`x. = 1 / x`
`"log"`	`x. = log(x)`. Note: zeros are replaced by `.Machine$double.xmin`
`"log_additive_inverse"`	`x. = -log(x)`, recommended if `x` are p-values. Note: zeros are replaced by `.Machine$double.xmin`

either "ascending" (more significant interactions have lower value in value column) or "descending" (more significant interactions have higher value in value column)

ambiguity_resolution_method

defines how ambiguous assignments (when one interaction in replicate 1 overlaps with multiple interactions in replicate 2 or vice versa) are resolved. Available methods:

`"value"`	interactions are prioritized by ascending or descending `value` column (see `sorting_direction`), e.g., if two interactions in replicate 1 overlap with one interaction in replicate 2, the interaction from replicate 1 is chosen which has a lower (if `sorting_direction` is `"ascending"`) or higher (if `"descending"`) value
`"overlap"`	the interaction pair is chosen which has the highest relative overlap, i.e., overlap in nucleotides of replicate 1 interaction anchor A and replicate 2 interaction anchor A, plus replicate 1 interaction anchor B and replicate 2 interaction anchor B, normalized by their lengths
`"midpoint"`	the interaction pair is chosen which has the smallest distance between their anchor midpoints, i.e., distance from midpoint of replicate 1 interaction anchor A to midpoint of replicate 2 interaction anchor A, plus distance from midpoint of replicate 1 interaction anchor B to midpoint of replicate 2 interaction anchor B

remove_nonstandard_chromosomes

removes peaks containing genomic locations on non-standard chromosomes using keepStandardChromosomes (default is TRUE)

max_factor

numeric; controls the replacement values for Inf and -Inf. Inf are replaced by max(x) * max_factor and -Inf are replaced by min(x) / max_factor.

jitter_factor

numeric; controls the magnitude of the noise that is added to x. This is done to break ties in x. Set jitter_factor = NULL for no jitter.

max_gap

integer; maximum gap in nucleotides allowed between two anchors for them to be considered as overlapping (defaults to -1, i.e., overlapping anchors)

mu

a starting value for the mean of the reproducible component.

sigma

a starting value for the standard deviation of the reproducible component.

rho

a starting value for the correlation coefficient of the reproducible component.

p

a starting value for the proportion of reproducible component.

eps

Stopping criterion. Iterations stop when the increment of log-likelihood is < eps*log-likelihood, Default=0.001.

max_iteration

integer; maximum number of iterations for IDR estimation (defaults to 30)

local_idr

see est.IDR

List with three components, (rep1_df, rep2_df, and analysis_type) containing the interactions from input data frames rep1_df and rep2_df with the following additional columns:

column 1:	`chr`	character; genomic location of peak - chromosome (e.g., `"chr3"`)
column 2:	`start`	integer; genomic location of peak - start coordinate
column 3:	`end`	integer; genomic location of peak - end coordinate
column 4:	`value`	numeric; p-value, FDR, or heuristic used to rank the peaks
column 5:	`rep_value`	numeric; value of corresponding replicate peak. If no corresponding peak was found, `rep_value` is set to `NA`.
column 6:	`rank`	integer; rank of the peak, established by value column, ascending order
column 7:	`rep_rank`	integer; rank of corresponding replicate peak. If no corresponding peak was found, `rep_rank` is set to `NA`.
column 8:	`idx`	integer; peak index, primary key
column 9:	`rep_idx`	integer; specifies the index of the corresponding peak in the other replicate (foreign key). If no corresponding peak was found, `rep_idx` is set to `NA`.
column 10:	`idr`	IDR of the peak and the corresponding peak in the other replicate. If no corresponding peak was found, `idr` is set to `NA`.

Q. Li, J. B. Brown, H. Huang and P. J. Bickel. (2011) Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics, Vol. 5, No. 3, 1752-1779.

idr_results <- estimate_idr1d(idr2d:::chipseq$rep1_df,
                              idr2d:::chipseq$rep2_df,
                              value_transformation = "log")
summary(idr_results)