strapvizr
is a package for performing bootstrapping of a sample to produce plots and statistics for use in final reports and documents. This notebook shows how you can utilize the strapvizr package within a project.
knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(strapvizr)
We will be using the toy dataset mtcars
to demonstrate the usage. This dataset was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles. We will only use the feature mpg
for demonstration purpose.
ex_data <- mtcars |> dplyr::pull(mpg) ex_data
There are two functions in the bootstrap module, bootstrap
and calculate_boot_stats
. These two functions perform the bootstrapping and calculate the relevant statistics.
bootstrap
Functionality
This function performs the bootstrap and returns a numeric vector as the result.
Function inputs
auto
which means the distribution will be the same size as the original sample# returns 50 sample means via bootstrapping boot_1 <- strapvizr::bootstrap(ex_data, 50, seed = 123) boot_1
# returns 75 sample means via bootstrapping boot_2 <- strapvizr::bootstrap(ex_data, 75, estimator = var, seed = 123) boot_2
class(boot_1) class(boot_2)
calculate_boot_stats
Functionality
This function performs bootstrapping and returns a named list of the sampling distribution statistics.
Function inputs
# Get 100 sample means via bootstrapping and calculate statistics at the #95% confidence interval stat_list_1 <- strapvizr::calculate_boot_stats(ex_data, 100, level = 0.95, seed = 123) stat_list_1
# Get 50 sample variances via bootstrapping at a 90% confidence level # and return the bootstrap distribution along with the statistics stat_list_2 <- strapvizr::calculate_boot_stats(ex_data, 50, level = 0.90, seed = 123, estimator = "var", pass_dist = TRUE) stat_list_2
class(stat_list_1) class(stat_list_2)
There are two functions in the display module, plot_ci
and tabulate_stats
. These use the bootstrapping statistics to create report-ready visualizations and tables of the sampling distribution.
plot_ci
Functionality
This function creates a histogram of a sampling distribution with its confidence interval and sample mean
Function Inputs
# Plot sampling distibution of 1000 sample means at a 95% confidence interval plot_1 <- strapvizr::plot_ci(ex_data, rep = 1000, level = 0.95, seed = 123) plot_1
# Plot sampling distibution of 1000 sample means at a 99% confidence interval # with a unique title and a bin size of 50 title <- "Bootstrapped miles/(US) gallon" plot_2 <- strapvizr::plot_ci(ex_data, rep = 1000, bin_size = 50, level = 0.99, title = title, seed = 123, estimator = "var") plot_2
class(plot_2)
tabulate_stats
Functionality
This function creates a list of two tibble objects that summarize the sampling distribution and the parameters for creating the bootstrapped samples and saves them as latex files.
Function Inputs
calculate_boot_stats()
functionstat <- calculate_boot_stats(ex_data, 1000, level = 0.95, seed = 123) result <- strapvizr::tabulate_stats(stat, precision = 2) stats_table <- result[[1]] # stats table parameters_table <- result[[2]] # parameter table stats_table parameters_table
class(stats_table) class(parameters_table)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.