knitr::opts_chunk$set( collapse = TRUE, comment = "ws#>", fig.align = "center" ) # directory to Lab dirdl <- system.file("Lab9",package = "Intro2R") # create rmd link library(Intro2R)
We have already learnt the basics of sampling from a population using the rstem
functions.
These are good for verifying, reproducing and sometimes investigating ideas where theory is intractable.
In practice, however, we often cannot assume a distribution for the population and must use distributional information latent within the sample. This information can be extracted using the bootstrap
.
There are a number of very cool packages on CRAN
that you can use to perform basic bootstrap as EFRON
designed. Plus there are many more algorithms which implement different sampling schemes.
We will look at the R package boot
and in the Lab we will design our own bootstrap functions.
install.packages("boot")
In classical statistics we are confronted with the need to make estimates of parameters. These estimates come in two forms:-
- Point $\hat{\theta}$
- Interval $(L,U)$
Generally we make our point estimate directly from the sample and the interval estimate we can make in many ways. If we implement analytic confidence intervals (using a formula such as $\bar{Y} \pm t_{\alpha/2}s/\sqrt{n}$) then we use the originating sample. We can however use bootstrap
methods which invoke resampling techniques.
{ width=70% }
Suppose we wish to make bootstrap interval estimates of the mean LENGTH
by each of the three fish, Catfish, Bass, and Buffalo fish in the Tennessee river
data(ddt) library(ggplot2) g <- ggplot(ddt, aes(x = SPECIES, y = LENGTH, fill = SPECIES)) + geom_boxplot() g
To resample the data set we need to make a function that is suitable for the R package boot
.
The function must have at least two arguments:
- data
- indices
The data is typically a data frame which will be given as the first argument of the boot
function, and the indices will be supplied by the boot
function.
Notice that you need to choose appropriate statistics. If you need to estimate the mean then the stat will be the sample mean.
The function must return a vector. Each will be referenced in the output by an index. For example the catfish will be referenced by a 1
and the Buffalo fish by a 3
.
library(boot) myfun <- function(data,indices){ d<-data[indices,] dbarcat <- mean(with(d, d[SPECIES == "CCATFISH","LENGTH"])) dbarbas <- mean(with(d, d[SPECIES == "LMBASS", "LENGTH"])) dbarbuf <- mean(with(d, d[SPECIES == "SMBUFFALO", "LENGTH"])) return(c(dbarcat = dbarcat, dbarbas = dbarbas, dbarbuf = dbarbuf)) }
Notice that the boot function has three arguments (you can use more -- see ?boot
)
- data
- the user defined function as above
- the number of iterations,
R
out <-boot(ddt,myfun, R = 10000)
The output can then be used to calculate confidence intervals by using the bootci()
function. Notice that the type of interval is perc=percentage
. This will put $\alpha/2*100/1$% of the distribution in each of the tails.
outci <- boot.ci(out, conf = 0.95, index = 1,type = "perc") outci
There is a lot more in the output than you might think.
names(out)
Also from the boot.ci
function
names(outci)
This information can be used for making graphical output.
To see the simulation output we can use the plot function. Note that since the class
of out
is boot
there is the possibility for the boot
programmer to make an S3
method. In fact there is a plot method. When the function plot
inspects out
it will see that its class is boot
and then look for a method plot.boot
if it finds it then it will call it.
The plot uses the matrix $t$ which contains the simulation.
methods(class = "boot") plot(out, index = 1) # this will call plot.boot
To view the confdence interval we can use the following
ci = outci$percent[1,c(4,5)] ci
hist(out$t[,1], main = "CCATFISH", col = "green", prob = TRUE, xlab = "sample mean") lines(density(out$t[,1]), lwd = 2, col = "red") abline(v = ci, col = "Blue", lwd = 3)
The lab will take you further and give you some excellent code to stimulate your understanding of the bootstrap
.
Place the following files in their correct folders and complete the lab.
r rmdfiles("Lab9", "Intro2R")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.