gps: Empirical Bayes Gamma-Poisson Shrinker

Description Usage Arguments Details Value Methods (by class) References Examples

Description

Test on device-events using William DuMouchel's Empirical Bayes Gamma-Poisson Shrinker. From the family of disproportionality analyses (DPA) used to generate signals of disproportionate reporting (SDRs).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
gps(df, ...)

## S3 method for class 'mds_ts'
gps(df, ts_event = c(Count = "nA"), analysis_of = NA, ...)

## Default S3 method:
gps(
  df,
  analysis_of = NA,
  eval_period = 1,
  null_ratio = 1,
  cred_interval = 0.9,
  init_prior = c(0.2, 0.02, 2, 4, 1/3),
  gamma_lower = 1e-05,
  gamma_upper = 20,
  quantiles = c(0.05, 0.95),
  cont_adj = 0,
  ...
)

Arguments

df

Required input data frame of class mds_ts or, for generic usage, any data frame with the following columns:

time

Unique times of class Date

nA

Cell A count (class numeric) of the 2x2 table: device/event of interest.

nB

Cell B count (class numeric) of the 2x2 table: device/non-event of interest.

nC

Cell C count (class numeric) of the 2x2 table: non-device/event of interest.

nD

Cell D count (class numeric) of the 2x2 table: non-device/non-event of interest.

...

Further arguments passed onto gps methods

ts_event

Required if df is of class mds_ts. Named string indicating the variable corresponding to the event count (cell A in the 2x2 contingency table). In most cases, the default is the appropriate setting. See details for alternative options.

Default: c("Count"="nA") corresponding to the event count column in mds_ts objects. Name is generated from mds_ts metadata.

analysis_of

Optional string indicating the English description of what was analyzed. If specified, this will override the name of the ts_event string parameter.

Default: NA indicates no English description for plain df data frames, or ts_event English description for df data frames of class mds_ts.

Example: "Count of bone cement leakages"

eval_period

Required positive integer indicating the number of unique times counting in reverse chronological order to sum over to create the 2x2 contingency table.

Default: 1 considers only the most recent time in df.

Example: 12 sums over the last 12 time periods to create the 2x2 contingency table.

null_ratio

Numeric value representing the null relative reporting ratio (RR), used with cred_interval to establish the signal status. This null_ratio is saved in the output as the signal threshold. See details for more.

Default: 1 indicates a null RR of 1 and tests if the lower bound of the cred_interval exceeds 1.

cred_interval

Numeric value between 0 and 1 representing the width of the Bayesian posterior credible interval, where the lower bound of the interval is assessed against the null_ratio. The interval bounds are returned as the lcl and ucl. See details for more.

Default: 0.90 indicates a 90% credible interval with bounds at 5% and 95%. The signal test is against the lower 5% bound, effectively creating the EB05 test.

init_prior

A numeric vector of length 5 representing the initialization parameters for the prior gamma mixture distribution in this order: alpha1, beta1, alpha2, beta2, p. See details for more.

Default: c(.2, .02, 2, 4, 1/3) as suggested in openEBGM package v0.7.0.

gamma_lower

Positive mumeric value representing the lower bound for the two alphas and betas of the prior during PORT optimization.

Default: 1e-5 is a value suggested in openEBGM package v0.7.0.

gamma_upper

Positive mumeric value representing the upper bound for the two alphas and betas of the prior during PORT optimization.

Default: 20 is a value suggested in openEBGM package v0.7.0.

quantiles

Vector of quantiles between 0 and 1. gps() will return an equal length vector of estimated empirical Bayes quantiles from the posterior distribution. Specify quantiles=NULL if no quantiles are desired.

Default: c(.05, .95) corresponds to the 5% (EB05) and 95% (EB95) quantiles.

cont_adj

Positive integer representing the continuity adjustment to be added to each cell of the 2x2 contingency table. A value greater than 0 allows for contingency tables with 0 cells to run the algorithm. Adding a continuity adjustment will adversely affect the algorithm estimates, user discretion is advised. See details for more.

Default: 0 adds zero to each cell, thus an unadjusted table.

Details

null_ratio and cred_interval are used together to establish the signal criteria. The null_ratio is conceptually similar to the relative reporting ratio under a null hypothesis of no signal. Common values are 1 and, more conservatively (fewer false signals), 2. The cred_interval is the posterior credibility interval used to test for a signal. A value of 0.90 returns the 5 tests if the lower bound exceeds null_ratio. Effectively, cred_interval=0.90 conducts the well-known EB05 test.

init_prior specifies the initial guess for the 5 parameters of the prior gamma mixture distribution as described in DuMouchel (1999, Eqs. 4, 7) in the sequence: alpha1, beta1, alpha2, beta2, p. gamma_lower specifies the optimization lower bound for the two alphas and betas. gamma_upper specifies similarly the upper bound. The initial guess, upper and lower bounds are fed into PORT optimization using the stats::nlminb() routine.

cont_adj provides the option to allow gps() to proceed running, however this is done at the user's discretion because there are adverse effects of adding a positive integer to every cell of the contingency table. By default, gps() runs with 0 in the C cell only, but not in A, B, or D. It has been suggested that 0.5 may be an appropriate value. However, values <1 have been shown to be unstable using box-constrained PORT optimization, which is the only optimization considered in this release. Overall, posterior distribution estimates have been shown to be unstable with very low or 0 count cells.

For parameter ts_event, in the uncommon case where the device-event count (Cell A) variable is not "nA", the name of the variable may be specified here. Note that the remaining 3 cells of the 2x2 contingency table (Cells B, C, D) must be the variables "nB", "nC", and "nD" respectively in df. A named character vector may be used where the name is the English description of what was analyzed. Note that if the parameter analysis_of is specified, it will override this name. Example: ts_event=c("Count of Bone Cement Leakages"="event_count")

Value

A named list of class mdsstat_test object, as follows:

test_name

Name of the test run

analysis_of

English description of what was analyzed

status

Named boolean of whether the test was run. The name contains the run status.

result

A standardized list of test run results: statistic for the test statistic, lcl and ucl for the set confidence bounds, p for the p-value, signal status, and signal_threshold.

params

The test parameters

data

The data on which the test was run

Methods (by class)

References

DuMouchel W. Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system. The American Statistician, 53(3):177-190, August 1999.

Ahmed I, Poncet A. PhViD: PharmacoVigilance Signal Detection, 2016. R package version 1.0.8.

Ihrie J, Canida T. openEBGM: EBGM Scores for Mining Large Contingency Tables, 2018. R package version 0.7.0.

Examples

1
2
3
4
5
6
7
8
9
# Basic Example
data <- data.frame(time=c(1:25),
                   nA=as.integer(stats::rnorm(25, 25, 5)),
                   nB=as.integer(stats::rnorm(25, 50, 5)),
                   nC=as.integer(stats::rnorm(25, 100, 25)),
                   nD=as.integer(stats::rnorm(25, 200, 25)))
a1 <- gps(data)
# Example using an mds_ts object
a2 <- gps(mds_ts[[3]])

mdsstat documentation built on March 13, 2020, 2:58 a.m.