knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(jeksterslabRboot)
set.seed(42) n <- 1000 df <- n - 1 B <- 1000 mu <- 100 theta <- mu sigma2 <- 225 sigma <- sqrt(sigma2) var_thetahat <- sigma2 / n se_thetahat <- sqrt(var_thetahat)
Let $X$ be a random variable
from a normal distribution with $\mu = r mu
$
and $\sigma^2 = r sigma2
$.
\begin{equation}
X
\sim
\mathcal{N}
\left(
\mu = r mu
,
\sigma^2 = r sigma2
\right)
(#eq:dist-norm-X-r)
\end{equation}
This data generating function is referred to as $F$.
Variable <- c( "`mu`", "`sigma2`", "`sigma`" ) Description <- c( "Population mean.", "Population variance.", "Population standard deviation." ) Notation <- c( "$\\mu$", "$\\sigma^2$", "$\\sigma$" ) Value <- c( mu, sigma2, sigma ) knitr::kable( x = data.frame( Variable, Description, Notation, Value ), row.names = FALSE, caption = "Population Parameters" )
For a sample size of
$n = r n
$,
if the parameter of interest
$\theta$
is the mean $\mu$,
using the know parameters,
the variance of the sampling distribution of the mean is
\begin{equation}
\mathrm{Var}
\left(
\hat{\mu}
\right)
=
\frac{
\sigma^2
}
{
n
}
=
r var_thetahat
\end{equation}
\noindent and the standard error of the mean is
\begin{equation}
\mathrm{se}
\left(
\hat{\mu}
\right)
=
\frac{
\sigma
}
{
\sqrt{n}
}
=
r se_thetahat
\end{equation}
Variable <- c( "`n`", "`theta`", "`var_thetahat`", "`se_thetahat`" ) Description <- c( "Sample size.", "Population mean.", "Variance of the sampling distribution of the mean.", "Standard error of the mean." ) Notation <- c( "$n$", "$\\theta = \\mu$", "$\\mathrm{Var} \\left( \\hat{\\theta} \\right) = \\frac{ \\sigma^2 }{n}$", "$\\mathrm{se} \\left( \\hat{\\theta} \\right) = \\frac{ \\sigma }{\\sqrt{n}}$" ) Value <- c( n, theta, var_thetahat, se_thetahat ) knitr::kable( x = data.frame( Variable, Description, Notation, Value ), row.names = FALSE, caption = "Sampling Distribution of $\\hat{\\theta}$ with Known Parameters" )
Let $\mathbf{x}$ be a vector of length $n = r n
$
of realizations of the random variable $X$.
Each observation $i$
is independently sampled from the normal distribution
with $\mu = r mu
$ and $\sigma^2 = r sigma2
$.
\begin{equation} x = X \left( \omega \right) (#eq:dist-random-variable-1) \end{equation}
\begin{equation} \mathbf{x} = \begin{bmatrix} x_1 = X \left( \omega_1 \right) \ x_2 = X \left( \omega_2 \right) \ x_3 = X \left( \omega_3 \right) \ x_i = X \left( \omega_i \right) \ \vdots \ x_n = X \left( \omega_n \right) \end{bmatrix}, \ i = \left{ 1, 2, 3, \dots, n \right} (#eq:dist-random-variable-2) \end{equation}
This is accommplished in R
using the
rnorm()
function
with the following arguments
n =
r n
,
mean =
r mu
,
and
sd =
r sigma
.
rnorm()
returns a vector of length n
.
x <- rnorm( n = n, mean = mu, sd = sigma ) str(x)
We can calculate sample statistics using the sample data $\mathbf{x}$. Note that a hat (^) indicates that the quantity is an estimate of the parameter using sample data.
muhat <- mean(x) thetahat <- muhat sigma2hat <- var(x) sigmahat <- sqrt(sigma2hat) varhat <- sigma2hat / n sehat <- sqrt(varhat) Variable <- c( "`n`", "`thetahat`", "`sigma2hat`", "`sigmahat`", "`varhat`", "`sehat`" ) Description <- c( "Sample size.", "Sample mean.", "Sample variance.", "Sample standard deviation.", "Estimate of the variance of the sampling distribution of the mean.", "Estimate of the standard error of the mean." ) Notation <- c( "$n$", "$\\hat{\\theta} = \\hat{\\mu} = \\frac{1}{n} \\sum_{i = 1}^{n} x_i$", "$\\hat{\\sigma}^2 = \\frac{1}{n - 1} \\sum_{i = 1}^{n} \\left( x_i - \\hat{\\mu} \\right)^2$", "$\\hat{\\sigma} = \\sqrt{\\frac{1}{n - 1} \\sum_{i = 1}^{n} \\left( x_i - \\hat{\\mu} \\right)^2}$", "$\\widehat{\\mathrm{Var}} \\left( \\hat{\\theta} \\right) = \\frac{ \\hat{\\sigma}^2 }{n}$", "$\\widehat{\\mathrm{se}} \\left( \\hat{\\theta} \\right) = \\frac{ \\hat{\\sigma} }{\\sqrt{n}}$" ) Value <- c( n, thetahat, sigma2hat, sigmahat, varhat, sehat ) knitr::kable( x = data.frame( Variable, Description, Notation, Value ), row.names = FALSE, caption = "Sample Statistics (Parameter Estimates)" )
hist(x) qqnorm(x) qqline(x)
$\mathbf{x}$ which is a vector of realizations of the random variable $X$ is used in bootstrapping. This sample data is referred to as the empirical distribution $\hat{F}$.
In parametric bootstrapping we make a parametric assumption
about the distribution of $\hat{F}$,
that is, $\hat{F} = F_{\hat{\theta}}$ .
In the current example,
we assume a normal distribution $\mathcal{N} \left( \mu, \sigma^2 \right)$.
Data is generated $B$ number of times (r B
in the current example)
from the empirical distribution
$\hat{F}_{\mathcal{N} \left( \hat{\mu}, \hat{\sigma^2} \right)}$
using parameter estimates
($\hat{\mu} = r muhat
$ and $\hat{\sigma}^2 = r sigma2hat
$).
$\mathbf{x}^{*}$ is a set of $B$ bootstrap samples.
\begin{equation}
\mathbf{x}^{}
=
\begin{bmatrix}
\mathbf{x}^{1} = \left{ x^{1}_{1}, x^{1}{2}, x^{1}_{3}, x^{1}{i}, \dots, x^{1}_{n} \right} \
\mathbf{x}^{2} = \left{ x^{2}_{1}, x^{2}{2}, x^{2}_{3}, x^{2}{i}, \dots, x^{2}_{n} \right} \
\mathbf{x}^{3} = \left{ x^{3}_{1}, x^{3}{2}, x^{3}_{3}, x^{3}{i}, \dots, x^{3}_{n} \right} \
\mathbf{x}^{b} = \left{ x^{b}_{1}, x^{b}{2}, x^{b}_{3}, x^{b}{i}, \dots, x^{b}_{n} \right} \
\vdots \
\mathbf{x}^{B} = \left{ x^{B}_{1}, x^{B}{2}, x^{B}_{3}, x^{B}{i}, \dots, x^{*B}_{n} \right}
\end{bmatrix}, \
b
=
\left{
1, 2, 3, \dots, B
\right}, \
i
=
\left{
1, 2, 3, \dots, n
\right}
\end{equation}
The function pbuniv()
from the jeksterslabRboot
package
can be used to generate B
parametric bootstrap samples from a univariate probability distribution
such as the current example.
The argument rFUN
takes in a function to use
in the random data generation process.
In the current example,
assuming a univariate normal distribution,
we use rnorm
.
The argument n
is the sample size
of the original sample data x
.
B
is the number of bootstrap samples
B =
r B
in our case.
The ...
arguments is a way to pass arguments to rFUN
.
In the current example,
we pass
mean =
r thetahat
and
sd =
r sigmahat
to rnorm
.
The pbuniv()
function returns a list
of length B
bootstrap samples.
$x^{*}$ is saved as xstar
.
xstar <- pbuniv( n = n, rFUN = rnorm, B = B, mean = thetahat, sd = sigmahat ) str(xstar, list.len = 6)
The parameter is estimated for each of the $B$ bootstrap samples. For example, if we are interested in making inferences about the mean, we calculate the sample statistic ($\mathrm{s}$), that is, the sample mean, for each element of $\mathbf{x}^{}$. The $B$ estimated parameters form the bootstrap sampling distribution $\boldsymbol{\hat{\theta}^{}}$.
\begin{equation} \boldsymbol{\hat{\theta}^{}} = \begin{bmatrix} \hat{\theta}^{} \left( 1 \right) = \mathrm{s} \left( \mathbf{x}^{1} \right) \ \hat{\theta}^{} \left( 2 \right) = \mathrm{s} \left( \mathbf{x}^{2} \right) \ \hat{\theta}^{} \left( 3 \right) = \mathrm{s} \left( \mathbf{x}^{3} \right) \ \hat{\theta}^{} \left( b \right) = \mathrm{s} \left( \mathbf{x}^{b} \right) \ \vdots \ \hat{\theta}^{} \left( B \right) = \mathrm{s} \left( \mathbf{x}^{*B} \right) \end{bmatrix}, \ b = \left{ 1, 2, 3, \dots, B \right} \end{equation}
sapply
is a useful tool in R
to apply a function
to each element of a list.
In our case,
we apply the function mean()
to each element of the list xstar
.
sapply
simplifies the output when possible.
In our case,
sapply
returns a vector,
thetahatstar
.
The *apply
functions
as well as their parallel counterparts
(e.g., parSapply
)
can be used to simplify repetitive tasks.
thetahatstar <- sapply( X = xstar, FUN = mean ) str(thetahatstar)
hist( thetahatstar, main = expression( paste( "Histogram of ", hat(theta), "*" ) ), xlab = expression( paste( hat(theta), "*" ) ) ) qqnorm(thetahatstar) qqline(thetahatstar)
mean_thetahatstar <- mean(thetahatstar) var_thetahatstar <- var(thetahatstar) sd_thetahatstar <- sqrt(var_thetahatstar)
The estimated bootstrap standard error is given by
\begin{equation}
\widehat{\mathrm{se}}{\mathrm{B}}
\left(
\hat{\theta}
\right)
=
\sqrt{
\frac{1}{B - 1}
\sum{b = 1}^{B}
\left[
\hat{\theta}^{} \left( b \right)
-
\hat{\theta}^{} \left( \cdot \right)
\right]^2
}
=
r sd_thetahatstar
\end{equation}
\noindent where
\begin{equation}
\hat{\theta}^{}
\left(
\cdot
\right)
=
\frac{1}{B}
\sum_{b = 1}^{B}
\hat{\theta}^{} \left( b \right)
=
r mean_thetahatstar
.
\end{equation}
Note that $\widehat{\mathrm{se}}_{\mathrm{B}} \left( \hat{\theta} \right)$ is the standard deviation of $\boldsymbol{\hat{\theta}^{}}$ and $\hat{\theta}^{} \left( \cdot \right)$ is the mean of $\boldsymbol{\hat{\theta}^{*}}$ .
Variable <- c( "`B`", "`mean_thetahatstar`", "`var_thetahatstar`", "`sd_thetahatstar`" ) Description <- c( "Number of bootstrap samples.", "Mean of $B$ sample means.", "Variance of $B$ sample means.", "Standard deviation of $B$ sample means." ) Notation <- c( "$B$", "$\\hat{\\theta}^{*} \\left( \\cdot \\right) = \\frac{1}{B} \\sum_{b = 1}^{B} \\hat{\\theta}^{*} \\left( b \\right)$", "$\\widehat{\\mathrm{Var}}_{\\mathrm{B}} \\left( \\hat{\\theta} \\right) = \\frac{1}{B - 1} \\sum_{b = 1}^{B} \\left[ \\hat{\\theta}^{*} \\left( b \\right) - \\hat{\\theta}^{*} \\left( \\cdot \\right) \\right]^2$", "$\\widehat{\\mathrm{se}}_{\\mathrm{B}} \\left( \\hat{\\theta} \\right) = \\sqrt{ \\frac{1}{B - 1} \\sum_{b = 1}^{B} \\left[ \\hat{\\theta}^{*} \\left( b \\right) - \\hat{\\theta}^{*} \\left( \\cdot \\right) \\right]^2 }$" ) Value <- c( B, mean_thetahatstar, var_thetahatstar, sd_thetahatstar ) knitr::kable( x = data.frame( Variable, Description, Notation, Value ), row.names = FALSE, caption = "Parametric Bootstrapping Results" )
Confidence intervals can be constructed
around
$\hat{\theta}$ .
The functions
pc()
,
bc()
,
and
bca()
from
the jeksterslabRboot
package
can be used to construct
confidence intervals
using the default alphas
of
0.001,
0.01,
and
0.05.
Wald confidence intervals (wald()
)
are added for comparison.
The confidence intervals can also be evaluated.
Since we know the population parameter theta
$\left(\theta = \mu = r theta
\right)$,
we can check if our confidence intervals contain
the population parameter.
See documentation for
wald()
,
pc()
,
bc()
,
and
bca()
from
the jeksterslabRboot
package
on how confidence intervals are constructed.
See documentation for
zero_hit()
,
theta_hit()
,
len()
,
and
shape()
from
the jeksterslabRboot
package
on how confidence intervals are evaluated.
wald_out <- wald( thetahat = thetahat, sehat = sehat, eval = TRUE, theta = theta ) wald_out_t <- wald( thetahat = thetahat, sehat = sehat, dist = "t", df = df, eval = TRUE, theta = theta ) pc_out <- pc( thetahatstar = thetahatstar, thetahat = thetahat, wald = TRUE, eval = TRUE, theta = theta ) bc_out <- bc( thetahatstar = thetahatstar, thetahat = thetahat, wald = TRUE, eval = TRUE, theta = theta ) bca_out <- bca( thetahatstar = thetahatstar, thetahat = thetahat, data = x, fitFUN = mean, wald = TRUE, eval = TRUE, theta = theta )
knitr::kable( x = as.data.frame( rbind( wald = wald_out, wald_t = wald_out_t, pc = pc_out, bc = bc_out, bca = bca_out ) ), caption = "Confidence Intervals" )
The estimated bootstrap standard error is similar to results from the closed form solution. The estimated bootstrap confidence intervals are also close to the Wald confidence intervals. While a closed form solution for the standard error is available in this simple example, bootstrap standard errors and confidence intervals can be used in situations when standard errors are not easy to calculate.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.