int_pctl: Bootstrap confidence intervals
In rsample: General Resampling Infrastructure

int_pctl

R Documentation

Bootstrap confidence intervals

Description

Calculate bootstrap confidence intervals using various methods.

Usage

int_pctl(.data, ...)

## Default S3 method:
int_pctl(.data, ...)

## S3 method for class 'bootstraps'
int_pctl(.data, statistics, alpha = 0.05, ...)

int_t(.data, ...)

## Default S3 method:
int_t(.data, ...)

## S3 method for class 'bootstraps'
int_t(.data, statistics, alpha = 0.05, ...)

int_bca(.data, ...)

## Default S3 method:
int_bca(.data, ...)

## S3 method for class 'bootstraps'
int_bca(.data, statistics, alpha = 0.05, .fn, ...)

Arguments

`.data`	A object containing the bootstrap resamples, created using `bootstraps()`. For t- and BCa-intervals, the `apparent` argument should be set to `TRUE`. Even if the `apparent` argument is set to `TRUE` for the percentile method, the apparent data is never used in calculating the percentile confidence interval.
`...`	Arguments to pass to `.fn` (`int_bca()` only).
`statistics`	An unquoted column name or `dplyr` selector that identifies a single column in the data set containing the individual bootstrap estimates. This must be a list column of tidy tibbles (with columns `term` and `estimate`). Optionally, users can include columns whose names begin with a period and the intervals will be created for each combination of these variables and the `term` column. For t-intervals, a standard tidy column (usually called `std.error`) is required. See the examples below.
`alpha`	Level of significance.
`.fn`	A function to calculate statistic of interest. The function should take an `rsplit` as the first argument and the `...` are required.

Details

Percentile intervals are the standard method of obtaining confidence intervals but require thousands of resamples to be accurate. T-intervals may need fewer resamples but require a corresponding variance estimate. Bias-corrected and accelerated intervals require the original function that was used to create the statistics of interest and are computationally taxing.

Value

Each function returns a tibble with columns .lower, .estimate, .upper, .alpha, .method, and term. .method is the type of interval (eg. "percentile", "student-t", or "BCa"). term is the name of the estimate. Note the .estimate returned from int_pctl() is the mean of the estimates from the bootstrap resamples and not the estimate from the apparent model.

References

https://rsample.tidymodels.org/articles/Applications/Intervals.html

Davison, A., & Hinkley, D. (1997). Bootstrap Methods and their Application. Cambridge: Cambridge University Press. doi:10.1017/CBO9780511802843

Examples



library(broom)
library(dplyr)
library(purrr)
library(tibble)
library(tidyr)

# ------------------------------------------------------------------------------

lm_est <- function(split, ...) {
  lm(mpg ~ disp + hp, data = analysis(split)) %>%
    tidy()
}

set.seed(52156)
car_rs <-
  bootstraps(mtcars, 500, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(car_rs, results)
int_t(car_rs, results)
int_bca(car_rs, results, .fn = lm_est)

# ------------------------------------------------------------------------------

# putting results into a tidy format
rank_corr <- function(split) {
  dat <- analysis(split)
  tibble(
    term = "corr",
    estimate = cor(dat$sqft, dat$price, method = "spearman"),
    # don't know the analytical std.error so no t-intervals
    std.error = NA_real_
  )
}

set.seed(69325)
data(Sacramento, package = "modeldata")
bootstraps(Sacramento, 1000, apparent = TRUE) %>%
  mutate(correlations = map(splits, rank_corr)) %>%
  int_pctl(correlations)

# ------------------------------------------------------------------------------
# An example of computing the interval for each value of a custom grouping
# factor (type of house in this example)

# Get regression estimates for each house type
lm_est <- function(split, ...) {
  analysis(split) %>%
    tidyr::nest(.by = c(type)) %>%
    # Compute regression estimates for each house type
    mutate(
      betas = purrr::map(data, ~ lm(log10(price) ~ sqft, data = .x) %>% tidy())
    ) %>%
    # Convert the column name to begin with a period
    rename(.type = type) %>%
    select(.type, betas) %>%
    unnest(cols = betas)
}

set.seed(52156)
house_rs <-
  bootstraps(Sacramento, 1000, apparent = TRUE) %>%
  mutate(results = map(splits, lm_est))

int_pctl(house_rs, results)

rsample documentation built on April 11, 2025, 5:54 p.m.