strapvizr is a package for performing bootstrapping of a sample to produce plots and statistics for use in final reports and documents. This notebook shows how you can utilize the strapvizr package within a project.

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

Import library

library(strapvizr)

Example data

We will be using the toy dataset mtcars to demonstrate the usage. This dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. We will only use the feature mpg for demonstration purpose.

ex_data <- mtcars |> 
  dplyr::pull(mpg)

ex_data

1. Bootstrap

There are two functions in the bootstrap module, bootstrapand calculate_boot_stats. These two functions perform the bootstrapping and calculate the relevant statistics.

1.1 bootstrap

Functionality

This function performs the bootstrap and returns a numeric vector as the result.

Function inputs

# returns 50 sample means via bootstrapping
boot_1 <- strapvizr::bootstrap(ex_data, 50, seed = 123)
boot_1
# returns 75 sample means via bootstrapping
boot_2 <- strapvizr::bootstrap(ex_data, 75, estimator = var, seed = 123)
boot_2
class(boot_1)
class(boot_2)

1.2 calculate_boot_stats

Functionality

This function performs bootstrapping and returns a named list of the sampling distribution statistics.

Function inputs

# Get 100 sample means via bootstrapping and calculate statistics at the 
#95% confidence interval
stat_list_1 <- strapvizr::calculate_boot_stats(ex_data, 100, level = 0.95, 
                                               seed = 123)
stat_list_1
# Get 50 sample variances via bootstrapping at a 90% confidence level
# and return the bootstrap distribution along with the statistics
stat_list_2 <- strapvizr::calculate_boot_stats(ex_data, 50, level = 0.90, 
                                               seed = 123, estimator = "var",
                                               pass_dist = TRUE)
stat_list_2
class(stat_list_1)
class(stat_list_2)

2. Display

There are two functions in the display module, plot_ci and tabulate_stats. These use the bootstrapping statistics to create report-ready visualizations and tables of the sampling distribution.

2.1 plot_ci

Functionality

This function creates a histogram of a sampling distribution with its confidence interval and sample mean

Function Inputs

# Plot sampling distibution of 1000 sample means at a 95% confidence interval
plot_1 <- strapvizr::plot_ci(ex_data, rep = 1000, level = 0.95, seed = 123)
plot_1
# Plot sampling distibution of 1000 sample means at a 99% confidence interval
# with a unique title and a bin size of 50
title <- "Bootstrapped miles/(US) gallon"
plot_2 <- strapvizr::plot_ci(ex_data, rep = 1000, bin_size = 50, level = 0.99, 
                             title = title, seed = 123, estimator = "var")
plot_2
class(plot_2)

2.2 tabulate_stats

Functionality

This function creates a list of two tibble objects that summarize the sampling distribution and the parameters for creating the bootstrapped samples and saves them as latex files.

Function Inputs

stat <- calculate_boot_stats(ex_data, 1000, level = 0.95, seed = 123)
result  <-  strapvizr::tabulate_stats(stat, precision = 2)
stats_table <- result[[1]] # stats table
parameters_table <- result[[2]] # parameter table

stats_table
parameters_table
class(stats_table)
class(parameters_table)


UBC-MDS/strapvizr documentation built on March 22, 2022, 6:39 p.m.