estimate_idr1d: Estimates IDR for Genomic Peak Data

Description Usage Arguments Value References Examples

View source: R/main.R

Description

This method estimates Irreproducible Discovery Rates (IDR) for peaks in replicated ChIP-seq experiments.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
estimate_idr1d(
  rep1_df,
  rep2_df,
  value_transformation = c("identity", "additive_inverse", "multiplicative_inverse",
    "log", "log_additive_inverse"),
  ambiguity_resolution_method = c("overlap", "midpoint", "value"),
  remove_nonstandard_chromosomes = TRUE,
  max_factor = 1.5,
  jitter_factor = 1e-04,
  max_gap = -1L,
  mu = 0.1,
  sigma = 1,
  rho = 0.2,
  p = 0.5,
  eps = 0.001,
  max_iteration = 30,
  local_idr = TRUE
)

Arguments

rep1_df

data frame of observations (i.e., genomic peaks) of replicate 1, with at least the following columns (position of columns matter, column names are irrelevant):

column 1: chr character; genomic location of peak - chromosome (e.g., "chr3")
column 2: start integer; genomic location of peak - start coordinate
column 3: end integer; genomic location of peak - end coordinate
column 4: value numeric; p-value, FDR, or heuristic used to rank the interactions
rep2_df

data frame of observations (i.e., genomic peaks) of replicate 2, with the following columns (position of columns matter, column names are irrelevant):

column 1: chr character; genomic location of peak - chromosome (e.g., "chr3")
column 2: start integer; genomic location of peak - start coordinate
column 3: end integer; genomic location of peak - end coordinate
column 4: value numeric; p-value, FDR, or heuristic used to rank the interactions
value_transformation

the values in x have to be transformed in a way such that when ordered in descending order, more significant interactions end up on top of the list. If the values in x are p-values, "log_additive_inverse" is recommended. The following transformations are supported:

"identity" no transformation is performed on x
"additive_inverse" x. = -x
"multiplicative_inverse" x. = 1 / x
"log" x. = log(x). Note: zeros are replaced by .Machine$double.xmin
"log_additive_inverse" x. = -log(x), recommended if x are p-values. Note: zeros are replaced by .Machine$double.xmin

either "ascending" (more significant interactions have lower value in value column) or "descending" (more significant interactions have higher value in value column)

ambiguity_resolution_method

defines how ambiguous assignments (when one interaction in replicate 1 overlaps with multiple interactions in replicate 2 or vice versa) are resolved. Available methods:

"value" interactions are prioritized by ascending or descending value column (see sorting_direction), e.g., if two interactions in replicate 1 overlap with one interaction in replicate 2, the interaction from replicate 1 is chosen which has a lower (if sorting_direction is "ascending") or higher (if "descending") value
"overlap" the interaction pair is chosen which has the highest relative overlap, i.e., overlap in nucleotides of replicate 1 interaction anchor A and replicate 2 interaction anchor A, plus replicate 1 interaction anchor B and replicate 2 interaction anchor B, normalized by their lengths
"midpoint" the interaction pair is chosen which has the smallest distance between their anchor midpoints, i.e., distance from midpoint of replicate 1 interaction anchor A to midpoint of replicate 2 interaction anchor A, plus distance from midpoint of replicate 1 interaction anchor B to midpoint of replicate 2 interaction anchor B
remove_nonstandard_chromosomes

removes peaks containing genomic locations on non-standard chromosomes using keepStandardChromosomes (default is TRUE)

max_factor

numeric; controls the replacement values for Inf and -Inf. Inf are replaced by max(x) * max_factor and -Inf are replaced by min(x) / max_factor.

jitter_factor

numeric; controls the magnitude of the noise that is added to x. This is done to break ties in x. Set jitter_factor = NULL for no jitter.

max_gap

integer; maximum gap in nucleotides allowed between two anchors for them to be considered as overlapping (defaults to -1, i.e., overlapping anchors)

mu

a starting value for the mean of the reproducible component.

sigma

a starting value for the standard deviation of the reproducible component.

rho

a starting value for the correlation coefficient of the reproducible component.

p

a starting value for the proportion of reproducible component.

eps

Stopping criterion. Iterations stop when the increment of log-likelihood is < eps*log-likelihood, Default=0.001.

max_iteration

integer; maximum number of iterations for IDR estimation (defaults to 30)

local_idr

see est.IDR

Value

List with three components, (rep1_df, rep2_df, and analysis_type) containing the interactions from input data frames rep1_df and rep2_df with the following additional columns:

column 1: chr character; genomic location of peak - chromosome (e.g., "chr3")
column 2: start integer; genomic location of peak - start coordinate
column 3: end integer; genomic location of peak - end coordinate
column 4: value numeric; p-value, FDR, or heuristic used to rank the peaks
column 5: rep_value numeric; value of corresponding replicate peak. If no corresponding peak was found, rep_value is set to NA.
column 6: rank integer; rank of the peak, established by value column, ascending order
column 7: rep_rank integer; rank of corresponding replicate peak. If no corresponding peak was found, rep_rank is set to NA.
column 8: idx integer; peak index, primary key
column 9: rep_idx integer; specifies the index of the corresponding peak in the other replicate (foreign key). If no corresponding peak was found, rep_idx is set to NA.
column 10: idr IDR of the peak and the corresponding peak in the other replicate. If no corresponding peak was found, idr is set to NA.

References

Q. Li, J. B. Brown, H. Huang and P. J. Bickel. (2011) Measuring reproducibility of high-throughput experiments. Annals of Applied Statistics, Vol. 5, No. 3, 1752-1779.

Examples

1
2
3
4
idr_results <- estimate_idr1d(idr2d:::chipseq$rep1_df,
                              idr2d:::chipseq$rep2_df,
                              value_transformation = "log")
summary(idr_results)

idr2d documentation built on Nov. 8, 2020, 6:16 p.m.