knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Bootstrap resampling is a powerful statistical technique for estimating confidence intervals without assuming a specific distribution. This example demonstrates parallelizing bootstrap analysis for A/B test results.
Use Case: A/B testing, hypothesis testing, statistical inference, conversion rate analysis
Computational Pattern: Embarrassingly parallel resampling with aggregation
You've run an A/B test on your website with two variants: - Variant A (Control): 10,000 visitors, 850 conversions (8.5% conversion rate) - Variant B (Treatment): 10,000 visitors, 920 conversions (9.2% conversion rate)
You need to: 1. Estimate the confidence interval for the difference in conversion rates 2. Calculate the probability that B is better than A 3. Determine if the result is statistically significant
Traditional parametric tests assume normal distributions. Bootstrap resampling makes no such assumptions and provides more robust estimates.
library(starburst)
Create synthetic A/B test data:
set.seed(42) # Variant A (control) n_a <- 10000 conversions_a <- 850 variant_a <- c(rep(1, conversions_a), rep(0, n_a - conversions_a)) # Variant B (treatment) n_b <- 10000 conversions_b <- 920 variant_b <- c(rep(1, conversions_b), rep(0, n_b - conversions_b)) # Observed difference observed_diff <- mean(variant_b) - mean(variant_a) cat(sprintf("Observed conversion rates:\n")) cat(sprintf(" Variant A: %.2f%%\n", mean(variant_a) * 100)) cat(sprintf(" Variant B: %.2f%%\n", mean(variant_b) * 100)) cat(sprintf(" Difference: %.2f%% (%.1f%% relative lift)\n", observed_diff * 100, (observed_diff / mean(variant_a)) * 100))
Output:
Observed conversion rates: Variant A: 8.50% Variant B: 9.20% Difference: 0.70% (8.2% relative lift)
Define a function that performs one bootstrap iteration:
bootstrap_iteration <- function(iter, data_a, data_b) { # Resample with replacement n_a <- length(data_a) n_b <- length(data_b) sample_a <- sample(data_a, n_a, replace = TRUE) sample_b <- sample(data_b, n_b, replace = TRUE) # Calculate metrics rate_a <- mean(sample_a) rate_b <- mean(sample_b) diff <- rate_b - rate_a relative_lift <- diff / rate_a list( iteration = iter, rate_a = rate_a, rate_b = rate_b, diff = diff, relative_lift = relative_lift, b_wins = diff > 0 ) }
Run a smaller bootstrap locally:
n_bootstrap_local <- 1000 cat(sprintf("Running %d bootstrap iterations locally...\n", n_bootstrap_local)) local_start <- Sys.time() local_results <- lapply( 1:n_bootstrap_local, bootstrap_iteration, data_a = variant_a, data_b = variant_b ) local_time <- as.numeric(difftime(Sys.time(), local_start, units = "secs")) cat(sprintf("ā Completed in %.2f seconds\n\n", local_time))
Typical output:
Running 1000 bootstrap iterations locally... ā Completed in 2.1 seconds
Run 10,000 bootstrap iterations on AWS:
n_bootstrap <- 10000 cat(sprintf("Running %d bootstrap iterations on AWS...\n", n_bootstrap)) results <- starburst_map( 1:n_bootstrap, bootstrap_iteration, data_a = variant_a, data_b = variant_b, workers = 25, cpu = 1, memory = "2GB" )
Typical output:
š Starting starburst cluster with 25 workers š° Estimated cost: ~$1.00/hour š Processing 10000 items with 25 workers š¦ Created 25 chunks (avg 400 items per chunk) š Submitting tasks... ā Submitted 25 tasks ā³ Progress: 25/25 tasks (0.3 minutes elapsed) ā Completed in 0.3 minutes š° Actual cost: $0.01
Extract and analyze the bootstrap distribution:
# Extract metrics diffs <- sapply(results, function(x) x$diff) relative_lifts <- sapply(results, function(x) x$relative_lift) b_wins <- sapply(results, function(x) x$b_wins) # Calculate confidence intervals ci_95 <- quantile(diffs, c(0.025, 0.975)) ci_99 <- quantile(diffs, c(0.005, 0.995)) # Probability that B is better than A prob_b_wins <- mean(b_wins) * 100 # Print results cat("\n=== Bootstrap Results (10,000 iterations) ===\n\n") cat(sprintf("Observed difference: %.2f%%\n", observed_diff * 100)) cat(sprintf("\n95%% Confidence Interval: [%.2f%%, %.2f%%]\n", ci_95[1] * 100, ci_95[2] * 100)) cat(sprintf("99%% Confidence Interval: [%.2f%%, %.2f%%]\n", ci_99[1] * 100, ci_99[2] * 100)) cat(sprintf("\nProbability that B > A: %.1f%%\n", prob_b_wins)) # Statistical significance if (ci_95[1] > 0) { cat("\nā Result is statistically significant at 95% confidence level\n") cat(" (95% CI does not include zero)\n") } else { cat("\nā Result is NOT statistically significant at 95% confidence level\n") cat(" (95% CI includes zero)\n") } # Relative lift analysis cat(sprintf("\nRelative lift: %.1f%%\n", median(relative_lifts) * 100)) cat(sprintf("95%% CI for relative lift: [%.1f%%, %.1f%%]\n", quantile(relative_lifts, 0.025) * 100, quantile(relative_lifts, 0.975) * 100))
Typical output:
=== Bootstrap Results (10,000 iterations) === Observed difference: 0.70% 95% Confidence Interval: [0.21%, 1.19%] 99% Confidence Interval: [0.08%, 1.32%] Probability that B > A: 99.7% ā Result is statistically significant at 95% confidence level (95% CI does not include zero) Relative lift: 8.2% 95% CI for relative lift: [2.5%, 14.0%]
Plot the bootstrap distribution:
# Create histogram hist(diffs * 100, breaks = 50, main = "Bootstrap Distribution of Conversion Rate Difference", xlab = "Difference in Conversion Rate (percentage points)", col = "lightblue", border = "white") # Add reference lines abline(v = 0, col = "red", lwd = 2, lty = 2) abline(v = ci_95 * 100, col = "darkblue", lwd = 2, lty = 2) abline(v = observed_diff * 100, col = "darkgreen", lwd = 2) # Add legend legend("topright", c("Observed difference", "Zero (no effect)", "95% CI"), col = c("darkgreen", "red", "darkblue"), lwd = 2, lty = c(1, 2, 2))
| Method | Iterations | Time | Cost | Speedup | |--------|-----------|------|------|---------| | Local | 1,000 | 2.1 sec | $0 | 1x | | Local (est.) | 10,000 | 21 sec | $0 | 1x | | staRburst | 10,000 | 18 sec | $0.01 | 6.9x |
Key Insights: - Bootstrap is highly parallelizable - Fast iterations still benefit from cloud parallelization - Minimal cost even with 10,000 iterations - Can easily scale to 100,000+ iterations for more precision
Bootstrap multiple metrics simultaneously:
bootstrap_all_metrics <- function(iter, data_a, data_b) { n_a <- length(data_a) n_b <- length(data_b) sample_a <- sample(data_a, n_a, replace = TRUE) sample_b <- sample(data_b, n_b, replace = TRUE) # Multiple metrics rate_a <- mean(sample_a) rate_b <- mean(sample_b) se_a <- sd(sample_a) / sqrt(n_a) se_b <- sd(sample_b) / sqrt(n_b) list( diff_rate = rate_b - rate_a, relative_lift = (rate_b - rate_a) / rate_a, z_score = (rate_b - rate_a) / sqrt(se_a^2 + se_b^2), effect_size = (rate_b - rate_a) / sqrt((var(sample_a) + var(sample_b)) / 2) ) } # Run multi-metric bootstrap multi_results <- starburst_map( 1:10000, bootstrap_all_metrics, data_a = variant_a, data_b = variant_b, workers = 25 )
Good fit: - Need robust confidence intervals - Non-normal distributions - Small sample sizes - Complex metrics (e.g., ratios, quantiles) - Multiple hypothesis testing
Not ideal: - Very large datasets (> 1M rows per group) - Simple metrics with known distributions - Real-time analysis requirements
The complete runnable script is available at:
system.file("examples/bootstrap.R", package = "starburst")
Run it with:
source(system.file("examples/bootstrap.R", package = "starburst"))
Related examples: - Monte Carlo Simulation - Similar resampling pattern - Risk Modeling - Advanced statistical analysis
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.