BBUM_corr: BBUM FDR correction of p values
In wyppeter/bbum: BBUM correction for significance testing of p values

BBUM_corr

R Documentation

BBUM FDR correction of p values

Description

Fits the BBUM model on the dataset, and transforms raw p values into values corrected for false discovery rate (FDR) of null and secondary signal according to the BBUM model, as multiple testing correction. Optionally, it automatically detects extreme value outliers among the background set, and resolves correction issues by trimming the outliers.

Usage

BBUM_corr(
  pvals,
  signal_set,
  add_starts = list(),
  only_start = FALSE,
  limits = list(),
  pBBUM.alpha = 0.05,
  auto_outliers = TRUE,
  rthres = 1,
  rtrimmax = 0.05,
  atrimmax = 10,
  two_tailed = FALSE,
  quiet = FALSE
)

Arguments

`pvals`	A vector of all numerical p values, including both signal and background sets.
`signal_set`	A vector of booleans signifying which values among `dt_all` are signal set data points. Used only in conjunction with `dt_all`. Should be same length as `dt_all`.
`add_starts`	List of named vectors for additional starts of fitting algorithm beyond the default set.
`only_start`	Whether the algorithm should only use the given starts (`add_starts`) to fit.
`limits`	Named list of custom limits for specific paramters. Parameters not mentioned would be default values.
`pBBUM.alpha`	Cutoff level of BBUM-FDR-adjusted p values for significance testing. Only used here to generate appropriate default limits.
`auto_outliers`	Toggle automatic outlier trimming.
`rthres`	Threshold value of `r` parameter to trigger a failed `r.pass` value. For automatic trimming methods in other functions and not meant for use in isolation.
`rtrimmax`	Maximum fraction of data points allowed to be outliers in the background set of data (to be trimmed).
`atrimmax`	Maximum absolute number of data points allowed to be outliers in the background set of data (to be trimmed).
`two_tailed`	Toggle the "two-tailed" case of BBUM correction, if the background assumption is weak and bona fide hits in the background class are relevant. See Details. Default behavior is off.
`quiet`	Suppress printed messages and warnings.

Details

pBBUM represents the expected overall FDR level if the cutoff were set at that particular p value. This is similar to the interpretation of p values corrected through the typical p.adjust(method = "fdr").

pBBUM values are designed for the signal set p values only, Values for the background set are given but not valid as significance testing adjustment, and so should not be used to call any hits. They are provided primarily to compare the equivalent transformation against the signal set to assess the adjustment strategy. The background set should not be considered for hits.

BBUM_corr functions best with p values filtered for poor quality data points in prior. Such points tend to have high p values and may disrupt the uniform null distribution.

Default starts for BBUM fitting are implemented. If additional starts should included, or only custom starts should be considered, make use of add_starts and/or only_start arguments.

If more than one start achieved the identical likelihood, a random start is chosen among them.

Automatic outlier detection relies on the model fitting a value of r > 1. Such a result suggests that a stronger signal (presumably outliers) exists in the background set than in the signal set, which violates the assumptions of the model. This is a conservative strategy. The ideal way to deal with outliers is to identify and handle them before any statistical analyses. For benchmarking of the trimming strategy, see Wang & Bartel, 2022.

Adding too many starts or allowing too much outlier trimming can increase computation time.

If the background assumption is weak, such that a small number of bona fide hits are anticipated and relevant to the hypothesis at hand among the data points designated "background class", the FDR could be made to include the background class. This is akin to a two-tailed test (despite a one-tailed assumption to begin with). This would allow the generation of genuine FDR-corrected p values for the background class points as well. Toggle this using the two_tailed value.

Due to the asymptotic behavior of the function when any p values = 0, any p values < .Machine$double.xmin*10 would be constrained to .Machine$double.xmin*10.

Value

A named list with the following items:

pvals: Vector of input p values.
pBBUMs: Vector of p values corrected for FDR by BBUM modeling.
estim: A named list of fitted parameter values.
LL: Value of the maximized log-likelihood.
convergence: Convergence code from optim.
outlier_trim: Number of outliers trimmed in the background set.
r.passed: Boolean for whether the fitted r value was under the threshold for flagging outliers.

Examples

BBUM_corr(
  pvals         = c(0.501, 0.203, 0.109, 0.071, 0.019, 0.031, 0.001,
                    0.000021, 0.00010, 0.03910,
                    0.0001,
                    0.11, 0.27, 0.36, 0.43, 0.50, 0.61, 0.77, 0.87, 0.91,
                    0.13, 0.21, 0.38, 0.42, 0.52, 0.60, 0.73, 0.81, 0.97),
  signal_set    = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE,
                    TRUE,  TRUE,  TRUE,
                    FALSE,
                    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
                    TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE),
  add_starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1)),
  limits = list(a = c(0.1,0.7))
)

wyppeter/bbum documentation built on Oct. 3, 2023, 3:29 p.m.