calcOE: calcOE calculate CI's for O's or O/E's

Description Usage Arguments Value Methods (by class) binconf poisson bootstrap Examples

View source: R/calcOE.R

Description

The calcOE function is a convenience function that calculates various CI intervals using one of three methods: binomial, poisson or bootstrap. Additionally it adds some conveniences around working with data.frames by allowing the use of formulas and grouping variables.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
calcOE(x, ...)

## S3 method for class 'formula'
calcOE(
  formula,
  data,
  ...,
  prob = 0.75,
  prefix = "",
  df = TRUE,
  method = c("binconf", "poisson", "bootstrap"),
  num_boot = 1000
)

## S3 method for class 'numeric'
calcOE(
  o_vect,
  e_vect = NA,
  prob = 0.75,
  prefix = "",
  df = TRUE,
  method = c("binconf", "poisson", "bootstrap"),
  num_boot = 1000
)

## S3 method for class 'logical'
calcOE(
  o_vect,
  e_vect = NA,
  prob = 0.75,
  prefix = "",
  df = TRUE,
  method = c("binconf", "poisson", "bootstrap"),
  num_boot = 1000
)

Arguments

...

an unquoted list of grouping variables (optional) that o's and o/e's should be calculated on.

formula

a formula where the RHS contains the observed value while the LHS contains the e. If LHS = '1' then o values are not adjusted.

data

a dataframe with columns specificed in groupings and formula.

prob

the width of the confidence interval.

prefix

should a prefix be added to the column names?

df

should the output be a dataframe or a named vector? If ... is supplied then it must be true.

method

which method should be used for CI estimation? See details for information on the three methods.

num_boot

if method = "bootstrap" then how many bootstraps should be used?

o_vect

a vector of outcomes. Should be either logical or numberic with 0 and 1 as values. Factors are not supported.

e_vect

a vector a probabilities. Can also be NA if a distribution on o's is desirable.

Value

either a dataframe or named vector with the following:

o_e/o

The point estimate of the o/e or o

low_o_e/low_o

the low CI based on prob

high_o_e/high_o

the high CI based on prob

n

The number of observations

Methods (by class)

binconf

This method leverages the binconf function to calculate the confidence interval around the observed proportion of events. This is the default method and benifits from returning reasonable returns when there are no events. Specifically, there are no events in a group, the point estimate of the o/e is always 0. However, the high estimate of the o/e should be related to the number of cases in the group. For example, we are more confident that the actual o/e is closer to 0 if there are 100 cases vs 10 cases.

poisson

This is a commonly used method for calculating the confidence interval. It works by assuming the total number of observations come from a poisson distribuion and calculates the interval based on that. However, this method does not take the number of cases into account. Specifically, the CI around 10 events is the same whether it came from 100 or 1,000 cases.

bootstrap

This method uses bootstraping to resample the original data and create a distribution of o/e's that can be used to directly calculate the quantiles. While this method benefits from not assuming a distribution of o's as above, it breaks down when there are no/all events. Resampling will always produce 0's or 1's at high/low CI.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
library(dplyr)
library(purrr)
df_size <- 1000
# Create some unbalanced groups
groups <- sample(c("A", "B", "C"), df_size, replace = T, prob = c(.05, .35, .6))

d_example <- data.frame(groups = groups,
                        e = runif(df_size)) %>%
  mutate(o = map_int(e, ~rbinom(1,1, .)),
         o2 = 0)

# Estimates between the three methods are similar when n is reasonably sized and
# there are a reasonable amount of events
methods <- c("binconf", "poisson", "bootstrap")
names(methods) <-methods

map_df(methods,
       ~calcOE(o ~ e, data = d_example, method =.),
       .id = "method")

# However there are large difference when groups are compared that don't have events
# only the binomial method provides CI's that get smaller with a larger denominator

map_df(methods,
       ~calcOE(o2 ~ e, data = d_example, method =., groups),
       .id = "method")  %>%
  arrange(groups)

# It's also possible to create CI's on just the O, ignoring the expected value:
calcOE(o ~ 1, data = d_example, method = "binconf")
calcOE(o ~ 1, data = d_example, method = "poisson")
calcOE(o ~ 1, data = d_example, method = "bootstrap")

West-End-Statistics/r-library-vakdr documentation built on Dec. 18, 2021, 7:16 p.m.