bootSummary: 'bootSummary' calculates the empirical distribution of a...
In West-End-Statistics/r-library-vakdr: Miscellanious helper functions

Description Usage Arguments Details Value Examples

The bootSummary function uses bootstaps to estimate the midpoint, avg and confidence interval of a user defined function. It's purpose is to work seamlessly within the dplyr framework/the pipe an allows the use of bare column names.

1	bootSummary(data, var, ..., .funs = median, n = 100, ci = 0.95, na.rm = FALSE)

`data`	data.frame or tibble.
`var`	bare column name to summarise over.
`...`	grouping variables for summary statistic.
`.funs`	summarising function. It can be a bare function name or follow the usage of `funs`.
`n`	Number of bootstrap replicates to generate
`ci`	width of quantile interval for final summary.
`na.rm`	should the final summarization across bootraps remove `NAs`?

The user provides the name of column to summarise along with the summarise function.

The example shows how a t-test performs similarly to a bootstrap when the data is normal (Group A). It's also possible to make estimates for other statistitics such as the median.

a tibble containing the name of the grouping variables and the following columns:

stat_mean: The mean across bootstraps
stat_mid: The median across bootstraps
stat_low: The low quantile (e.g. 2.5% when ci = .95)
stat_high: The high quantile (e.g. 97.5% when ci = .95)

library(dplyr)
# Simulate some data
set.seed(5)
size <- 1000
test_data <- data.frame(cohort = rep(c("A", "B"), each = size),
                        stat = c(rnorm(size, 5, 10), exp(rnorm(size, mean = 0.1))))
# T Tests
test_data %>%
  filter(cohort == "A") %>%
  pull(stat) %>% t.test()
test_data %>%
  filter(cohort == "B") %>%
  pull(stat) %>% t.test()

# Bootrap the median
test_data %>% bootSummary(stat, cohort)