openEBGM-package | R Documentation |
An implementation of DuMouchel's (1999) \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1080/00031305.1999.10474456")} Bayesian data mining method for the market basket problem. Calculates Empirical Bayes Geometric Mean (EBGM) and posterior quantile scores using the Gamma-Poisson Shrinker (GPS) model to find unusually large cell counts in large, sparse contingency tables. Can be used to find unusually high reporting rates of adverse events associated with products. In general, can be used to mine any database where the co-occurrence of two variables or items is of interest. Also calculates relative and proportional reporting ratios. Builds on the work of the 'PhViD' package, from which much of the code is derived. Some of the added features include stratification to adjust for confounding variables and data squashing to improve computational efficiency. Includes an implementation of the EM algorithm for hyperparameter estimation loosely derived from the 'mederrRank' package.
The data preparation function, processRaw
, converts raw data
into actual and expected counts for product/event pairs.
processRaw
also adds the relative reporting ratio (RR) and
proportional reporting ratio (PRR). The data squashing function,
squashData
, implements the simple version of data squashing
described in DuMouchel et al. (2001). Data squashing can be used to reduce
computational burden.
The negative log-likelihood functions (negLL
,
negLLsquash
, negLLzero
, and
negLLzeroSquash
) provide the means of calculating the
negative log-likelihoods as mentioned in the DuMouchel papers. DuMouchel
uses the likelihood function, based on the marginal distributions of the
counts, to estimate the hyperparameters of the prior distribution.
The hyperparameter estimation functions (exploreHypers
and
autoHyper
) use gradient-based approaches to estimate the
hyperparameters, \theta
, of the prior distribution (gamma mixture)
using the negative log-likelihood functions from the marginal distributions
of the counts (negative binomial). \theta
is a vector containing five
parameters (\alpha_1
, \beta_1
, \alpha_2
, \beta_2
,
and P
). hyperEM
estimates \theta
using a version
of the EM algorithm.
The posterior distribution functions calculate the mixture fraction
(Qn
), geometric mean (ebgm
), and quantiles
(quantBisect
) of the posterior distribution. Alternatively,
ebScores
can be used to create an object of class openEBGM
that contains the EBGM and quantiles scores. Appropriate methods exist for
the generic functions print
,
summary
, and plot
for openEBGM
objects.
Maintainer: John Ihrie John.Ihrie@fda.hhs.gov
Authors:
Travis Canida Travis.Canida@fda.hhs.gov
Other contributors:
Ismaïl Ahmed (author of 'PhViD' package (derived code)) [contributor]
Antoine Poncet (author of 'PhViD') [contributor]
Sergio Venturini (author of 'mederrRank' package (derived code)) [contributor]
Jessica Myers (author of 'mederrRank') [contributor]
Ahmed I, Poncet A (2016). PhViD: an R package for PharmacoVigilance signal Detection. R package version 1.0.8.
Venturini S, Myers J (2015). mederrRank: Bayesian Methods for Identifying the Most Harmful Medication Errors. R package version 0.0.8.
DuMouchel W (1999). "Bayesian Data Mining in Large Frequency Tables, With an Application to the FDA Spontaneous Reporting System." The American Statistician, 53(3), 177-190.
DuMouchel W, Pregibon D (2001). "Empirical Bayes Screening for Multi-item Associations." In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '01, pp. 67-76. ACM, New York, NY, USA. ISBN 1-58113-391-X.
Evans SJW, Waller P, Davis S (2001). "Use of Proportional Reporting Ratios (PRRs) for Signal Generation from Spontaneous Adverse Drug Reaction Reports." Pharmacoepidemiology and Drug Safety, 10(6), 483-486.
FDA (2017). "CFSAN Adverse Event Reporting System (CAERS)." URL https://www.fda.gov/food/compliance-enforcement-food/cfsan-adverse-event-reporting-system-caers.
Useful links:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.