openEBGM: 'openEBGM': EBGM Scores for Mining Large Contingency Tables

Description Data preparation & squashing functions Negative log-likelihood functions Hyperparameter estimation functions Posterior distribution functions References


openEBGM is a Bayesian data mining package for calculating Empirical Bayes scores based on the Gamma-Poisson Shrinker (GPS) model for large, sparse contingency (frequency) tables. openEBGM includes several important functions implementing DuMouchel's (1999, 2001) methods for calculating the EBGM (Empirical Bayes Geometric Mean) score and the quantile scores used to create credibility intervals. Some simple disproportionality scores (relative report rate and proportional reporting ratio) are also included. Adverse event report data are used as an example application. Much of openEBGM's code is derived from the PhViD and mederrRank packages.

Data preparation & squashing functions

The data preparation function, processRaw, converts raw data into actual and expected counts for product/event pairs. processRaw also adds the relative reporting ratio (RR) and proportional reporting ratio (PRR). The data squashing function, squashData, implements the simple version of data squashing described in DuMouchel et al. (2001). Data squashing can be used to reduce computational burden.

Negative log-likelihood functions

The negative log-likelihood functions (negLL, negLLsquash, negLLzero, and negLLzeroSquash) provide the means of calculating the negative log-likelihoods as mentioned in the DuMouchel papers. DuMouchel uses the likelihood function, based on the marginal distributions of the counts, to estimate the hyperparameters of the prior distribution.

Hyperparameter estimation functions

The hyperparameter estimation functions (exploreHypers and autoHyper) use gradient-based approaches to estimate the hyperparameters, θ, of the prior distribution (gamma mixture) using the negative log-likelihood functions from the marginal distributions of the counts (negative binomial). θ is a vector containing five parameters (α_1, β_1, α_2, β_2, and P). hyperEM estimates θ using a version of the EM algorithm.

Posterior distribution functions

The posterior distribution functions calculate the mixture fraction (Qn), geometric mean (ebgm), and quantiles (quantBisect) of the posterior distribution. Alternatively, ebScores can be used to create an object of class openEBGM that contains the EBGM and quantiles scores. Appropriate methods exist for the generic functions print, summary, and plot for openEBGM objects.


Ahmed I, Poncet A (2016). PhViD: an R package for PharmacoVigilance signal Detection. R package version 1.0.8.

Venturini S, Myers J (2015). mederrRank: Bayesian Methods for Identifying the Most Harmful Medication Errors. R package version 0.0.8.

DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.

DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for Multi-item Associations." In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.

Evans SJW, Waller P, Davis S (2001). "Use of Proportional Reporting Ratios (PRRs) for Signal Generation from Spontaneous Adverse Drug Reaction Reports." Pharmacoepidemiology and Drug Safety, 10(6), 483-486.

FDA (2017). "CFSAN Adverse Event Reporting System (CAERS)." URL

openEBGM documentation built on Aug. 17, 2018, 1:05 a.m.