BBUM_fit: BBUM statistical modeling

View source: R/BBUM_fit.R

BBUM_fitR Documentation

BBUM statistical modeling

Description

Fitting the BBUM model on dataset containing a set with primary signal (signal set) and a set without (background set), using maximum likelihood estimation (MLE) with the BFGS algorithm (optim). It chooses the best solution among all starts provided.

Usage

BBUM_fit(
  dt_signal_set = NULL,
  dt_bg_set = NULL,
  dt_all = NULL,
  signal_set = NULL,
  starts,
  limits = list(),
  pBBUM.alpha = 0.05,
  rcap = TRUE,
  outlier_trim = 0,
  rthres = 1
)

Arguments

dt_signal_set, dt_bg_set

Vectors of numerical p values, belonging to the signal set and background set respectively.

dt_all

A vector of all numerical p values, including both signal and background sets.

signal_set

A vector of booleans signifying which values among dt_all are signal set data points. Used only in conjunction with dt_all. Should be same length as dt_all.

starts

A list of named vectors of starts for the four BBUM parameters.

limits

Named list of custom limits for specific paramters. Parameters not mentioned would be default values.

pBBUM.alpha

Cutoff level of BBUM-FDR-adjusted p values for significance testing. Only used here to generate appropriate default limits.

rcap

Whether the parameter r should have a stringent upper bound in this instance (for smart toggling of outlier detection).

outlier_trim

Number of strongest points among the background class to be trimmed as outliers. For automatic trimming methods in other functions and not meant for use in isolation.

rthres

Threshold value of r parameter to trigger a failed r.pass value. For automatic trimming methods in other functions and not meant for use in isolation.

Details

Either use dt_signal_set and dt_bg_set to input data separately, or use dt_all and signal_set to input data. When both pairs are defined, dt_all and signal_set are ignored.

If more than one start achieved the identical maximum likelihood, A random start is chosen among them.

Both sets should have at least 10 points each for modeling.

rcap is used internally to decide on the default limits for r.

A failed r.pass code is not triggered if lambda is too big for a reliable fitting of a to begin with.

Due to the asymptotic behavior of the function when any p values = 0, any p values < .Machine$double.xmin*10 would be constrained to .Machine$double.xmin*10.

Value

A named list with the following items:

  • estim: A named list of fitted parameter values.

  • LL: Value of the maximized log-likelihood.

  • convergence: Convergence code from optim.

  • outlier_trim: The input value of the outlier_trim argument.

  • r.passed: Boolean for whether the fitted r value was under the threshold for flagging outliers.

Examples

BBUM_fit(
  dt_signal_set = c(0.000021, 0.00010, 0.03910, 0.031, 0.001,
                    0.13, 0.21, 0.38, 0.42, 0.52, 0.60, 0.73, 0.81, 0.97),
  dt_bg_set     = c(0.501, 0.203, 0.109, 0.071, 0.019,
                    0.11, 0.27, 0.36, 0.43, 0.50, 0.61, 0.77, 0.87, 0.91),
  starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1))
)
BBUM_fit(
  dt_all        = c(0.501, 0.203, 0.109, 0.071, 0.019, 0.031, 0.001,
                    0.000021, 0.00010, 0.03910,
                    0.0001,
                    0.11, 0.27, 0.36, 0.43, 0.50, 0.61, 0.77, 0.87, 0.91,
                    0.13, 0.21, 0.38, 0.42, 0.52, 0.60, 0.73, 0.81, 0.97),
  signal_set    = c(FALSE, FALSE, FALSE, FALSE, FALSE, TRUE, TRUE,
                    TRUE,  TRUE,  TRUE,
                    FALSE,
                    FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE, FALSE,
                    TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE,  TRUE),
  starts = list(c(lambda = 0.9, a = 0.6, theta = 0.1, r = 0.1)),
  outlier_trim = 1
)


wyppeter/bbum documentation built on Oct. 3, 2023, 3:29 p.m.