knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
processRaw()
The processRaw()
function calculates actual counts $(N)$ of each
product-symptom combination, expected counts $(E)$ under the row/column
independence assumption, relative reporting ratio $(RR)$, and proportional
reporting ratio $(PRR)$. processRaw()
has various parameters, some of which
are shown below.
library(openEBGM) set.seed(5629404) dat <- data.frame( var1 = c(sample(c("product_A", "product_B"), 16, replace = TRUE), "product_C"), var2 = c(sample(c("event_1", "event_2"), 16, replace = TRUE), "event_1"), stringsAsFactors = FALSE ) dat$id <- 1:nrow(dat)
Suppose the data look as so:
dat
We can calculate $N$, $E$, $RR$, and $PRR$ for the product-symptom pairs:
processRaw(data = dat, stratify = FALSE, zeroes = FALSE)
Stratification can help control for confounding variables. For instance, food, cosmetics, and dietary supplements are often consumed at different rates by different genders and age groups. Similarly, adverse events associated with these products occur with varying rates. Therefore, we might wish to control for these variables when we examine the CAERS data.
set.seed(5629404) dat <- data.frame( var1 = c(sample(c("product_A", "product_B"), 16, replace = TRUE), "product_C"), var2 = c(sample(c("event_1", "event_2"), 16, replace = TRUE), "event_1"), strat1 = sample(c("M", "F"), 17, replace = TRUE), strat2 = sample(c("age_cat1", "age_cat2"), 17, replace = TRUE), stringsAsFactors = FALSE ) dat$id <- 1:nrow(dat)
Now assume the data look as so:
dat
Notice that now we have stratifications variables ('strat' substring) present. We can use these stratification variables to get adjusted estimates for the $EBGM$ scores. Stratification will affect $E$ and $RR$, but not $PRR$. The $E$s are calculated by summing the expected counts from every stratum. Ideally, each stratum should contain several unique CAERS reports to insure good estimates of $E$.
processRaw(data = dat, stratify = TRUE, zeroes = FALSE)
Notice that we use stratify = TRUE
to accomodate the new stratification
variables. The calculations for $E$ and $RR$ are adjusted.
Finally, in some cases one may wish to calculate the $E$s for product-symptom
combinations that do not occur in the data. These can be calculated by using the
zeroes = TRUE
argument in the processRaw()
function. It is typically not
required to perform such calculations for zero counts, and doing so can lead to
much longer execution times when estimating hyperparameters. For this reason,
zero counts are only recommended for hyperparameter estimation when convergence
of optimization routines cannot be reached otherwise. If zero counts are used,
data squashing should typically follow. Even if zero counts are used for
hyperparameter estimation, $EBGM$ scores for zero counts never add value to an
analysis. For this reason, rows with zero counts should be removed after
estimating hyperparameters but before calculating $EBGM$ and quantile scores.
processRaw(data = dat, stratify = FALSE, zeroes = TRUE)
Next, the Hyperparameter Estimation with openEBGM vignette will demonstrate how to estimate the hyperparameters of the prior distribution.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.