Description Usage Arguments Value Methods (by class) binconf poisson bootstrap Examples
The calcOE
function is a convenience function that calculates various
CI intervals using one of three methods: binomial, poisson or bootstrap.
Additionally it adds some conveniences around working with data.frames by
allowing the use of formulas and grouping variables.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | calcOE(x, ...)
## S3 method for class 'formula'
calcOE(
formula,
data,
...,
prob = 0.75,
prefix = "",
df = TRUE,
method = c("binconf", "poisson", "bootstrap"),
num_boot = 1000
)
## S3 method for class 'numeric'
calcOE(
o_vect,
e_vect = NA,
prob = 0.75,
prefix = "",
df = TRUE,
method = c("binconf", "poisson", "bootstrap"),
num_boot = 1000
)
## S3 method for class 'logical'
calcOE(
o_vect,
e_vect = NA,
prob = 0.75,
prefix = "",
df = TRUE,
method = c("binconf", "poisson", "bootstrap"),
num_boot = 1000
)
|
... |
an unquoted list of grouping variables (optional) that o's and o/e's should be calculated on. |
formula |
a formula where the RHS contains the observed value while the LHS contains the e. If LHS = '1' then o values are not adjusted. |
data |
a dataframe with columns specificed in groupings and formula. |
prob |
the width of the confidence interval. |
prefix |
should a prefix be added to the column names? |
df |
should the output be a dataframe or a named vector? If |
method |
which method should be used for CI estimation? See details for information on the three methods. |
num_boot |
if |
o_vect |
a vector of outcomes. Should be either logical or numberic with 0 and 1 as values. Factors are not supported. |
e_vect |
a vector a probabilities. Can also be |
either a dataframe or named vector with the following:
The point estimate of the o/e or o
the low CI based on prob
the high CI based on prob
The number of observations
formula
: main method for formula interface
numeric
: basic method for calcOE and contains all measure logic.
logical
: basic method for calcOE that converts logical to integers.
binconf
This method leverages the binconf
function to
calculate the confidence interval around the observed proportion of events.
This is the default method and benifits from returning reasonable returns
when there are no events. Specifically, there are no events in a group, the
point estimate of the o/e is always 0. However, the high estimate of the
o/e should be related to the number of cases in the group. For example, we
are more confident that the actual o/e is closer to 0 if there are 100
cases vs 10 cases.
poisson
This is a commonly used method for calculating the confidence interval. It works by assuming the total number of observations come from a poisson distribuion and calculates the interval based on that. However, this method does not take the number of cases into account. Specifically, the CI around 10 events is the same whether it came from 100 or 1,000 cases.
bootstrap
This method uses bootstraping to resample the original data and create a distribution of o/e's that can be used to directly calculate the quantiles. While this method benefits from not assuming a distribution of o's as above, it breaks down when there are no/all events. Resampling will always produce 0's or 1's at high/low CI.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 | library(dplyr)
library(purrr)
df_size <- 1000
# Create some unbalanced groups
groups <- sample(c("A", "B", "C"), df_size, replace = T, prob = c(.05, .35, .6))
d_example <- data.frame(groups = groups,
e = runif(df_size)) %>%
mutate(o = map_int(e, ~rbinom(1,1, .)),
o2 = 0)
# Estimates between the three methods are similar when n is reasonably sized and
# there are a reasonable amount of events
methods <- c("binconf", "poisson", "bootstrap")
names(methods) <-methods
map_df(methods,
~calcOE(o ~ e, data = d_example, method =.),
.id = "method")
# However there are large difference when groups are compared that don't have events
# only the binomial method provides CI's that get smaller with a larger denominator
map_df(methods,
~calcOE(o2 ~ e, data = d_example, method =., groups),
.id = "method") %>%
arrange(groups)
# It's also possible to create CI's on just the O, ignoring the expected value:
calcOE(o ~ 1, data = d_example, method = "binconf")
calcOE(o ~ 1, data = d_example, method = "poisson")
calcOE(o ~ 1, data = d_example, method = "bootstrap")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.