BBUM_fit: BBUM statistical modeling
In wyppeter/bbum: BBUM correction for significance testing of p values

BBUM_fit

R Documentation

BBUM statistical modeling

Description

Fitting the BBUM model on dataset containing a set with primary signal (signal set) and a set without (background set), using maximum likelihood estimation (MLE) with the BFGS algorithm (optim). It chooses the best solution among all starts provided.

Usage

BBUM_fit(
  dt_signal_set = NULL,
  dt_bg_set = NULL,
  dt_all = NULL,
  signal_set = NULL,
  starts,
  limits = list(),
  pBBUM.alpha = 0.05,
  rcap = TRUE,
  outlier_trim = 0,
  rthres = 1
)

Arguments

`dt_signal_set, dt_bg_set`	Vectors of numerical p values, belonging to the signal set and background set respectively.
`dt_all`	A vector of all numerical p values, including both signal and background sets.
`signal_set`	A vector of booleans signifying which values among `dt_all` are signal set data points. Used only in conjunction with `dt_all`. Should be same length as `dt_all`.
`starts`	A list of named vectors of starts for the four BBUM parameters.
`limits`	Named list of custom limits for specific paramters. Parameters not mentioned would be default values.
`pBBUM.alpha`	Cutoff level of BBUM-FDR-adjusted p values for significance testing. Only used here to generate appropriate default limits.
`rcap`	Whether the parameter r should have a stringent upper bound in this instance (for smart toggling of outlier detection).
`outlier_trim`	Number of strongest points among the background class to be trimmed as outliers. For automatic trimming methods in other functions and not meant for use in isolation.
`rthres`	Threshold value of `r` parameter to trigger a failed `r.pass` value. For automatic trimming methods in other functions and not meant for use in isolation.

Details

Either use dt_signal_set and dt_bg_set to input data separately, or use dt_all and signal_set to input data. When both pairs are defined, dt_all and signal_set are ignored.

If more than one start achieved the identical maximum likelihood, A random start is chosen among them.

Both sets should have at least 10 points each for modeling.

rcap is used internally to decide on the default limits for r.

A failed r.pass code is not triggered if lambda is too big for a reliable fitting of a to begin with.

Due to the asymptotic behavior of the function when any p values = 0, any p values < .Machine$double.xmin*10 would be constrained to .Machine$double.xmin*10.

Value

A named list with the following items:

estim: A named list of fitted parameter values.
LL: Value of the maximized log-likelihood.
convergence: Convergence code from optim.
outlier_trim: The input value of the outlier_trim argument.
r.passed: Boolean for whether the fitted r value was under the threshold for flagging outliers.

Examples

BBUM_fit(
  dt_signal_set = c(0.000021, 0.00010, 0.03910, 0.031, 0.001,
                    0.13, 0.21, 0.38, 0.42, 0.52, 0.60, 0.73, 0.81, 0.97),
  dt_bg_set     = c(0.501, 0.203, 0.109, 0.071, 0.019,
                    0.11, 0.27, 0.36, 0.43, 0.50, 0.61, 0.77, 0.87, 0.91),
  starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1))
)
BBUM_fit(
  dt_all        = c(0.501, 0.203, 0.109, 0.071, 0.019, 0.031, 0.001,
                    0.000021, 0.00010, 0.03910,
                    0.0001,
                    0.11, 0.27, 0.36, 0.43, 0.50, 0.61, 0.77, 0.87, 0.91,
                    0.13, 0.21, 0.38, 0.42, 0.52, 0.60, 0.73, 0.81, 0.97),
  signal_set    = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE,
                    TRUE,  TRUE,  TRUE,
                    FALSE,
                    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
                    TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE),
  starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1)),
  outlier_trim = 1
)

wyppeter/bbum documentation built on Oct. 3, 2023, 3:29 p.m.