bayesboot: The Bayesian bootstrap

Description Usage Arguments Details Value Note References Examples

View source: R/bayesboot.R

Description

Performs a Bayesian bootstrap and returns a data.frame with a sample of size R representing the posterior distribution of the (possibly multivariate) summary statistic.

Usage

1
2
bayesboot(data, statistic, R = 4000, R2 = 4000, use.weights = FALSE,
  .progress = "none", .parallel = FALSE, ...)

Arguments

data

Either a vector or a list, or a matrix or a data.frame with one datapoint per row. The format of data should be compatible with the first argument of statistic

statistic

A function implementing the summary statistic of interest where the first argument should take the data. If use.weights = TRUE then the second argument should take a vector of weights.

R

The size of the posterior sample from the Bayesian bootstrap.

R2

When use.weights = FALSE this is the size of the resample of the data used to approximate the weighted statistic.

use.weights

When TRUE the data will be reweighted, like in the original Bayesian bootstrap. When FALSE (the default) the reweighting will be approximated by resampling the data.

.progress

The type of progress bar ("none", "text", "tk", and "win"). See the .progress argument to adply in the plyr package.

.parallel

If TRUE enables parallel processing. See the .parallel argument to adply in the plyr package.

...

Other arguments passed on to statistic

Details

The summary statistic is a function of the data that represents a feature of interest, where a typical statistic is the mean. In bayesboot it is most efficient to define the statistic as a function taking the data as the first argument and a vector of weights as the second argument. An example of such a function is weighted.mean. Indicate that you are using a statistic defined in this way by setting use.weights = TRUE.

It is also possible to define the statistic as a function only taking data (and no weights) by having use.weights = FALSE (the default). This will, for each of the R Bayesian bootstrap draws, give a resampled version of the data of size R2 to statistic. This will be much slower than using use.weights = TRUE but will work with a larger range of statistics (the median, for example)

For more information regarding this implementation of the Bayesian bootstrap see the blog post Easy Bayesian Bootstrap in R. For more information about the model behind the Bayesian bootstrap see the blog post The Non-parametric Bootstrap as a Bayesian Model and, of course, the original Bayesian bootstrap paper by Rubin (1981).

Value

A data.frame with R rows, each row being a draw from the posterior distribution of the Bayesian bootstrap. The number of columns is decided by the length of the output from statistic. If statistic does not return a vector or data frame with named values then the columns will be given the names V1, V2, V3, etc. While the output is a data.frame it has subclass bayesboot which enables specialized summary and plot functions for the result of a bayesboot call.

Note

References

Miller, R. G. (1974) The jackknife - a review. Biometrika, 61(1), 1–15.

Rubin, D. B. (1981). The Bayesian bootstrap. The annals of statistics, 9(1), 130–134.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
### A Bayesian bootstrap analysis of a mean ###

# Heights of the last ten American presidents in cm (Kennedy to Obama).
heights <- c(183, 192, 182, 183, 177, 185, 188, 188, 182, 185);
b1 <- bayesboot(heights, mean)
# But it's more efficient to use the a weighted statistic.
b2 <- bayesboot(heights, weighted.mean, use.weights = TRUE)

# The result of bayesboot can be plotted and summarized
plot(b2)
summary(b2)

# It can also be easily post processed.
# Here the probability that the mean is > 182 cm.
mean( b2[,1] > 182)

### A Bayesian bootstrap analysis of a SD ###

# When use.weights = FALSE it is important that the summary statistics
# does not change as a function of sample size. This is the case with
# the sample standard deviation, so here we have to implement a
# function calculating the population standard deviation.
pop.sd <- function(x) {
  n <- length(x)
  sd(x) * sqrt( (n - 1) / n)
}

b3 <- bayesboot(heights, pop.sd)
summary(b3)

### A Bayesian bootstrap analysis of a correlation coefficient ###

# Data comparing two methods of measuring blood flow.
# From Table 1 in Miller (1974) and used in an example
# by Rubin (1981, p. 132).
blood.flow <- data.frame(
  dye = c(1.15, 1.7, 1.42, 1.38, 2.80, 4.7, 4.8, 1.41, 3.9),
  efp = c(1.38, 1.72, 1.59, 1.47, 1.66, 3.45, 3.87, 1.31, 3.75))

# Using the weighted correlation (corr) from the boot package.
library(boot)
b4 <- bayesboot(blood.flow, corr, R = 1000, use.weights = TRUE)
hist(b4[,1])

### A Bayesian bootstrap analysis of lm coefficients ###

# A custom function that returns the coefficients of
# a weighted linear regression on the blood.flow data
lm.coefs <- function(d, w) {
  coef( lm(efp ~ dye, data = d, weights = w) )
}

b5 <- bayesboot(blood.flow, lm.coefs, R = 1000, use.weights = TRUE)

# Plotting the marginal posteriors
plot(b5)

# Plotting a scatter of regression lines from the posterior
plot(blood.flow)
for(i in sample(nrow(b5), size = 20)) {
  abline(coef = b5[i, ], col = "grey")
}

rasmusab/bayesboot documentation built on May 27, 2019, 2:03 a.m.