np.boot: Nonparametric Bootstrap Resampling
In nptest: Nonparametric Bootstrap and Permutation Tests

View source: R/np.boot.R

np.boot

R Documentation

Nonparametric Bootstrap Resampling

Description

Nonparametric bootstrap resampling for univariate and multivariate statistics. Computes bootstrap estimates of the standard error, bias, and covariance. Also computes five different types of bootstrap confidence intervals: normal approximation interval, basic (reverse percentile) interval, percentile interval, studentized (bootstrap-t) interval, and bias-corrected and accelerated (BCa) interval.

Usage

np.boot(x, statistic, ..., R = 9999, level = c(0.9, 0.95, 0.99),
        method = c("norm", "basic", "perc", "stud", "bca")[-4], 
        sdfun = NULL, sdrep = 99, jackknife = NULL, 
        parallel = FALSE, cl = NULL, boot.dist = TRUE)

Arguments

`x`	vector of data (for univariate data) or vector of row indices (for multivariate data). See examples for bootstrapping multivariate data.
`statistic`	function that takes in `x` (and possibly additional arguments passed using `...`) and returns a vector containing the statistic(s). See examples.
`...`	additional named arguments for the `statistic` function.
`R`	number of bootstrap replicates
`level`	desired confidence level(s) for the computed intervals. Default computes 90%, 95%, and 99% confidence intervals.
`method`	method(s) for computing confidence intervals. Partial matching is allowed. Any subset of allowable methods is permitted (default computes all intervals except studentized). Set `method = NULL` to produce no confidence intervals.
`sdfun`	function for computing the standard deviation of `statistic`. Should produce a vector the same length as the output of `statistic`. Only applicable if `"stud" %in% method`. If `NULL`, an inner bootstrap is used to estimate the standard deviation.
`sdrep`	number of bootstrap replicates for the inner bootstrap used to estimate the standard deviation of `statistic`. Only applicable if `"stud" %in% method` and `sdfun = NULL`. Larger values produce more accurate estimates (see Note).
`jackknife`	function that takes in `x` (and possibly additional arguments passed using `...`) and returns a vector containing the jackknife statistic(s). Should produce a vector the same length as the output of `statistic`. Only applicable if `"bca" %in% method`. If `NULL`, the jackknife function is defined as the statistic function (default). See the last example for a case when `statistic` and `jackknife` are different.
`parallel`	Logical indicating if the `parallel` package should be used for parallel computing (of the bootstrap distribution). Defaults to `FALSE`, which implements sequential computing.
`cl`	Cluster for parallel computing, which is used when `parallel = TRUE`. Note that if `parallel = TRUE` and `cl = NULL`, then the cluster is defined as `makeCluster(2L)` to use two cores. To make use of all available cores, use the code `cl = makeCluster(detectCores())`.
`boot.dist`	Logical indicating if the bootstrap distribution should be returned (see Note).

Details

The first three intervals (normal, basic, and percentile) are only first-order accurate, whereas the last two intervals (studentized and BCa) are both second-order accurate. Thus, the results from the studentized and BCa intervals tend to provide more accurate coverage rates.

Unless the standard deviation function for the studentized interval is input via the sdfun argument, the studentized interval can be quite computationally costly. This is because an inner bootstrap is needed to estimate the standard deviation of the statistic for each (outer) bootstrap replicate—and you may want to increase the default number of inner bootstrap replicates (see Note).

The efficiency of the BCa interval will depend on the sample size n and the computational complexity of the (jackknife) statistic estimate. Assuming that n is not too large and the jackknife statistic is not too difficult to compute, the BCa interval can be computed reasonably quickly—especially in comparison the studentized interval with an inner bootstrap.

Computational details of the various confidence intervals are described in Efron and Tibshirani (1994) and in Davison and Hinkley (1997). For a useful and concise discussion of the various intervals, see Carpenter and Bithell (2000).

Value

`t0`	Observed statistic, computed using `statistic(x, ...)`
`se`	Bootstrap estimate of the standard error.
`bias`	Bootstrap estimate of the bias.
`cov`	Bootstrap estimate of the covariance (for multivariate statistics).
`normal`	Normal approximation confidence interval(s).
`basic`	Basic (reverse percentile) confidence interval(s).
`percent`	Percentile confidence interval(s).
`student`	Studentized (bootstrap-t) confidence interval(s).
`bca`	Bias-corrected and accelerated (BCa) confidence interval(s).
`z0`	Bias-correction factor(s). Only provided if `bca %in% method`.
`acc`	Acceleration factor(s). Only provided if `bca %in% method`.
`boot.dist`	Bootstrap distribution of statistic(s). Only provided if `boot.dist = TRUE`.
`R`	Number of bootstrap replicates (same as input).
`level`	Confidence level (same as input).
`sdfun`	Standard deviation function for statistic (same as input).
`sdrep`	Number of inner bootstrap replicates (same as input).
`jackknife`	Jackknife function (same as input).

Note

If boot.dist = TRUE, the output boot.dist will be a matrix of dimension R by length(statistic(x, ...)) if the statistic is multivariate. Otherwise the bootstrap distribution will be a vector of length R.

For the "stud" method, the default of sdrep = 99 may produce a crude estimate of the standard deviation of the statistic(s). For more accurate estimates, the value of sdrep may need to be set substantially larger, e.g., sdrep = 999.

Author(s)

Nathaniel E. Helwig <helwig@umn.edu>

References

Carpenter, J., & Bithell, J. (2000). Bootstrap confidence intervals: when, which, what? A practical guide for medical statisticians. Statistics in Medicine, 19(9), 1141-1164. doi: 10.1002/(SICI)1097-0258(20000515)19:9%3C1141::AID-SIM479%3E3.0.CO;2-F

Davison, A. C., & Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge University Press. doi: 10.1017/CBO9780511802843

Efron, B., & Tibshirani, R. J. (1994). An Introduction to the Boostrap. Chapman & Hall/CRC. doi: 10.1201/9780429246593

Examples

######***######   UNIVARIATE DATA   ######***######

### Example 1: univariate statistic (median)

# generate 100 standard normal observations
set.seed(1)
n <- 100
x <- rnorm(n)

# nonparametric bootstrap
npbs <- np.boot(x = x, statistic = median)
npbs


### Example 2: multivariate statistic (quartiles)

# generate 100 standard normal observations
set.seed(1)
n <- 100
x <- rnorm(n)

# nonparametric bootstrap
npbs <- np.boot(x = x, statistic = quantile, 
                probs = c(0.25, 0.5, 0.75))
npbs



######***######   MULTIVARIATE DATA   ######***######

### Example 1: univariate statistic (correlation)

# correlation matrix square root (with rho = 0.5)
rho <- 0.5
val <- c(sqrt(1 + rho), sqrt(1 - rho))
corsqrt <- matrix(c(val[1], -val[2], val), 2, 2) / sqrt(2)

# generate 100 bivariate observations (with rho = 0.5)
n <- 100
set.seed(1)
data <- cbind(rnorm(n), rnorm(n)) %*% corsqrt

# define statistic function
statfun <- function(x, data) cor(data[x,1], data[x,2])

# nonparametric bootstrap
npbs <- np.boot(x = 1:n, statistic = statfun, data = data)
npbs


### Example 2: multivariate statistic (variances and covariance)

# correlation matrix square root (with rho = 0.5)
rho <- 0.5
val <- c(sqrt(1 + rho), sqrt(1 - rho))
corsqrt <- matrix(c(val[1], -val[2], val), 2, 2) / sqrt(2)

# generate 100 bivariate observations (with rho = 0.5)
n <- 100
set.seed(1)
data <- cbind(rnorm(n), rnorm(n)) %*% corsqrt

# define statistic function
statfun <- function(x, data) {
  cmat <- cov(data[x,])
  ltri <- lower.tri(cmat, diag = TRUE)
  cvec <- cmat[ltri]
  names(cvec) <- c("var(x1)", "cov(x1,x2)", "var(x2)")
  cvec
}

# nonparametric bootstrap
npbs <- np.boot(x = 1:n, statistic = statfun, data = data)
npbs



## Not run: 

######***######   REGRESSION   ######***######

### Example 1: bootstrap cases

# generate 100 observations
n <- 100
set.seed(1)
x <- seq(0, 1, length.out = n)
y <- 1 + 2 * x + rnorm(n)
data <- data.frame(x = x, y = y)

# define statistic function
statfun <- function(x, data) {
  xmat <- cbind(1, data$x[x])
  xinv <- solve(crossprod(xmat)) %*% t(xmat)
  coef <- as.numeric(xinv %*% data$y[x])
  names(coef) <- c("(Intercept)", "x")
  coef
}

# nonparametric bootstrap
npbs <- np.boot(x = 1:n, statistic = statfun, data = data)
npbs


### Example 2: bootstrap residuals

# generate 100 observations
n <- 100
set.seed(1)
x <- seq(0, 1, length.out = n)
y <- 1 + 2 * x + rnorm(n)

# prepare data
xmat <- cbind(1, x)
xinv <- solve(crossprod(xmat)) %*% t(xmat)
fit <- xmat %*% xinv %*% y
data <- list(fit = fit, resid = y - fit, xinv = xinv, x = x)

# define statistic function
statfun <- function(x, data) {
  ynew <- data$fit + data$resid[x]
  coef <- as.numeric(data$xinv %*% ynew)
  names(coef) <- c("(Intercept)", "x")
  coef
}

# define jackknife function
jackfun <- function(x, data){
  ynew <- data$fit[x] + data$resid[x]
  xmat <- cbind(1, data$x[x])
  xinv <- solve(crossprod(xmat)) %*% t(xmat)
  coef <- as.numeric(xinv %*% ynew)
  names(coef) <- c("(Intercept)", "x")
  coef
}

# nonparametric bootstrap
npbs <- np.boot(x = 1:n, statistic = statfun, data = data, 
                jackknife = jackfun)
npbs

## End(Not run)

nptest documentation built on April 15, 2023, 1:08 a.m.