Description Data preparation & squashing functions Negative log-likelihood functions Hyperparameter estimation functions Posterior distribution functions References

openEBGM is a Bayesian data mining package for calculating Empirical
Bayes scores based on the *Gamma-Poisson Shrinker* (*GPS*) model
for large, sparse contingency (frequency) tables. openEBGM includes
several important functions implementing DuMouchel's (1999, 2001) methods for
calculating the EBGM (Empirical Bayes Geometric Mean) score and the quantile
scores used to create credibility intervals. Some simple disproportionality
scores (relative report rate and proportional reporting ratio) are also
included. Adverse event report data are used as an example application. Much
of openEBGM's code is derived from the PhViD and mederrRank
packages.

The data preparation function, `processRaw`

, converts raw data
into actual and expected counts for product/event pairs.
`processRaw`

also adds the relative reporting ratio (RR) and
proportional reporting ratio (PRR). The data squashing function,
`squashData`

, implements the simple version of data squashing
described in DuMouchel et al. (2001). Data squashing can be used to reduce
computational burden.

The negative log-likelihood functions (`negLL`

,
`negLLsquash`

, `negLLzero`

, and
`negLLzeroSquash`

) provide the means of calculating the
negative log-likelihoods as mentioned in the DuMouchel papers. DuMouchel
uses the likelihood function, based on the marginal distributions of the
counts, to estimate the hyperparameters of the prior distribution.

The hyperparameter estimation functions (`exploreHypers`

and
`autoHyper`

) use gradient-based approaches to estimate the
hyperparameters, *θ*, of the prior distribution (gamma mixture)
using the negative log-likelihood functions from the marginal distributions
of the counts (negative binomial). *θ* is a vector containing five
parameters (*α_1*, *β_1*, *α_2*, *β_2*,
and *P*). `hyperEM`

estimates *θ* using a version
of the EM algorithm.

The posterior distribution functions calculate the mixture fraction
(`Qn`

), geometric mean (`ebgm`

), and quantiles
(`quantBisect`

) of the posterior distribution. Alternatively,
`ebScores`

can be used to create an object of class openEBGM
that contains the EBGM and quantiles scores. Appropriate methods exist for
the generic functions `print`

,
`summary`

, and `plot`

for openEBGM
objects.

Ahmed I, Poncet A (2016). PhViD: an R package for
PharmacoVigilance signal Detection. *R package version 1.0.8*.

Venturini S, Myers J (2015). mederrRank: Bayesian Methods
for Identifying the Most Harmful Medication Errors. *R package version
0.0.8*.

DuMouchel W (1999). "Bayesian Data Mining in Large Frequency
Tables, With an Application to the FDA Spontaneous Reporting System."
*The American Statistician*, 53(3), 177-190.

DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for
Multi-item Associations." In *Proceedings of the Seventh ACM SIGKDD
International Conference on Knowledge Discovery and Data Mining*, KDD '01,
pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.

Evans SJW, Waller P, Davis S (2001). "Use of Proportional
Reporting Ratios (PRRs) for Signal Generation from Spontaneous Adverse Drug
Reaction Reports." *Pharmacoepidemiology and Drug Safety*, 10(6),
483-486.

FDA (2017). "CFSAN Adverse Event Reporting System (CAERS)." URL https://www.fda.gov/Food/ComplianceEnforcement/ucm494015.htm.

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.