gps: Empirical Bayes Gamma-Poisson Shrinker
In mdsstat: Statistical Trending for Medical Devices Surveillance

Description Usage Arguments Details Value Methods (by class) References Examples

Test on device-events using William DuMouchel's Empirical Bayes Gamma-Poisson Shrinker. From the family of disproportionality analyses (DPA) used to generate signals of disproportionate reporting (SDRs).

gps(df, ...)

## S3 method for class 'mds_ts'
gps(df, ts_event = c(Count = "nA"), analysis_of = NA, ...)

## Default S3 method:
gps(
  df,
  analysis_of = NA,
  eval_period = 1,
  null_ratio = 1,
  cred_interval = 0.9,
  init_prior = c(0.2, 0.02, 2, 4, 1/3),
  gamma_lower = 1e-05,
  gamma_upper = 20,
  quantiles = c(0.05, 0.95),
  cont_adj = 0,
  ...
)

`df`	Required input data frame of class `mds_ts` or, for generic usage, any data frame with the following columns: time Unique times of class `Date` nA Cell A count (class `numeric`) of the 2x2 table: device/event of interest. nB Cell B count (class `numeric`) of the 2x2 table: device/non-event of interest. nC Cell C count (class `numeric`) of the 2x2 table: non-device/event of interest. nD Cell D count (class `numeric`) of the 2x2 table: non-device/non-event of interest.
`...`	Further arguments passed onto `gps` methods
`ts_event`	Required if `df` is of class `mds_ts`. Named string indicating the variable corresponding to the event count (cell A in the 2x2 contingency table). In most cases, the default is the appropriate setting. See details for alternative options. Default: `c("Count"="nA")` corresponding to the event count column in `mds_ts` objects. Name is generated from `mds_ts` metadata.
`analysis_of`	Optional string indicating the English description of what was analyzed. If specified, this will override the name of the `ts_event` string parameter. Default: `NA` indicates no English description for plain `df` data frames, or `ts_event` English description for `df` data frames of class `mds_ts`. Example: `"Count of bone cement leakages"`
`eval_period`	Required positive integer indicating the number of unique times counting in reverse chronological order to sum over to create the 2x2 contingency table. Default: `1` considers only the most recent time in `df`. Example: `12` sums over the last 12 time periods to create the 2x2 contingency table.
`null_ratio`	Numeric value representing the null relative reporting ratio (RR), used with `cred_interval` to establish the signal status. This `null_ratio` is saved in the output as the signal threshold. See details for more. Default: `1` indicates a null RR of 1 and tests if the lower bound of the `cred_interval` exceeds `1`.
`cred_interval`	Numeric value between 0 and 1 representing the width of the Bayesian posterior credible interval, where the lower bound of the interval is assessed against the `null_ratio`. The interval bounds are returned as the lcl and ucl. See details for more. Default: `0.90` indicates a 90% credible interval with bounds at 5% and 95%. The signal test is against the lower 5% bound, effectively creating the EB05 test.
`init_prior`	A numeric vector of length 5 representing the initialization parameters for the prior gamma mixture distribution in this order: `alpha1, beta1, alpha2, beta2, p`. See details for more. Default: `c(.2, .02, 2, 4, 1/3)` as suggested in openEBGM package v0.7.0.
`gamma_lower`	Positive mumeric value representing the lower bound for the two alphas and betas of the prior during PORT optimization. Default: `1e-5` is a value suggested in openEBGM package v0.7.0.
`gamma_upper`	Positive mumeric value representing the upper bound for the two alphas and betas of the prior during PORT optimization. Default: `20` is a value suggested in openEBGM package v0.7.0.
`quantiles`	Vector of quantiles between 0 and 1. `gps()` will return an equal length vector of estimated empirical Bayes quantiles from the posterior distribution. Specify `quantiles=NULL` if no quantiles are desired. Default: `c(.05, .95)` corresponds to the 5% (EB05) and 95% (EB95) quantiles.
`cont_adj`	Positive integer representing the continuity adjustment to be added to each cell of the 2x2 contingency table. A value greater than 0 allows for contingency tables with 0 cells to run the algorithm. Adding a continuity adjustment will adversely affect the algorithm estimates, user discretion is advised. See details for more. Default: `0` adds zero to each cell, thus an unadjusted table.

null_ratio and cred_interval are used together to establish the signal criteria. The null_ratio is conceptually similar to the relative reporting ratio under a null hypothesis of no signal. Common values are 1 and, more conservatively (fewer false signals), 2. The cred_interval is the posterior credibility interval used to test for a signal. A value of 0.90 returns the 5 tests if the lower bound exceeds null_ratio. Effectively, cred_interval=0.90 conducts the well-known EB05 test.

init_prior specifies the initial guess for the 5 parameters of the prior gamma mixture distribution as described in DuMouchel (1999, Eqs. 4, 7) in the sequence: alpha1, beta1, alpha2, beta2, p. gamma_lower specifies the optimization lower bound for the two alphas and betas. gamma_upper specifies similarly the upper bound. The initial guess, upper and lower bounds are fed into PORT optimization using the stats::nlminb() routine.

cont_adj provides the option to allow gps() to proceed running, however this is done at the user's discretion because there are adverse effects of adding a positive integer to every cell of the contingency table. By default, gps() runs with 0 in the C cell only, but not in A, B, or D. It has been suggested that 0.5 may be an appropriate value. However, values <1 have been shown to be unstable using box-constrained PORT optimization, which is the only optimization considered in this release. Overall, posterior distribution estimates have been shown to be unstable with very low or 0 count cells.

For parameter ts_event, in the uncommon case where the device-event count (Cell A) variable is not "nA", the name of the variable may be specified here. Note that the remaining 3 cells of the 2x2 contingency table (Cells B, C, D) must be the variables "nB", "nC", and "nD" respectively in df. A named character vector may be used where the name is the English description of what was analyzed. Note that if the parameter analysis_of is specified, it will override this name. Example: ts_event=c("Count of Bone Cement Leakages"="event_count")

A named list of class mdsstat_test object, as follows:

test_name: Name of the test run
analysis_of: English description of what was analyzed
status: Named boolean of whether the test was run. The name contains the run status.
result: A standardized list of test run results: statistic for the test statistic, lcl and ucl for the set confidence bounds, p for the p-value, signal status, and signal_threshold.
params: The test parameters
data: The data on which the test was run

mds_ts: GPS on mds_ts data
default: GPS on general data

DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3):177-190, August 1999.

Ahmed I, Poncet A. PhViD: PharmacoVigilance Signal Detection, 2016. R package version 1.0.8.

Ihrie J, Canida T. openEBGM: EBGM Scores for Mining Large Contingency Tables, 2018. R package version 0.7.0.

# Basic Example
data <- data.frame(time=c(1:25),
                   nA=as.integer(stats::rnorm(25, 25, 5)),
                   nB=as.integer(stats::rnorm(25, 50, 5)),
                   nC=as.integer(stats::rnorm(25, 100, 25)),
                   nD=as.integer(stats::rnorm(25, 200, 25)))
a1 <- gps(data)
# Example using an mds_ts object
a2 <- gps(mds_ts[[3]])