bootstrap.trendfilter: Bootstrap the optimized trend filtering estimator to obtain...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/bootstrap.trendfilter.R

Description

\loadmathjax

bootstrap.trendfilter implements any of three possible bootstrap algorithms to obtain pointwise variability bands with a specified certainty to accompany an optimized trend filtering point estimate of a signal.

Usage

1
2
3
4
5
6
7
8
9
bootstrap.trendfilter(
  obj,
  bootstrap.algorithm = c("nonparametric", "parametric", "wild"),
  alpha = 0.05,
  B = 100L,
  full.ensemble = FALSE,
  prune = TRUE,
  mc.cores = detectCores()
)

Arguments

obj

An object of class 'SURE.trendfilter' or 'cv.trendfilter'.

bootstrap.algorithm

A string specifying the bootstrap algorithm to be used. One of c("nonparametric","parametric","wild"). See Details section below for suggested use. Defaults to "nonparametric".

alpha

Specifies the width of the 1-alpha pointwise variability bands. Defaults to alpha = 0.05.

B

The number of bootstrap samples used to estimate the pointwise variability bands. Defaults to B = 100. Increase this for more precise bands (e.g. the final analysis you intend to publish).

full.ensemble

logical. If TRUE, the full trend filtering bootstrap ensemble is returned as an \mjeqnn \times Bascii matrix, less any columns potentially pruned post-hoc (see prune below). Defaults to full.ensemble = FALSE.

prune

logical. If TRUE, then the trend filtering bootstrap ensemble is examined for rare instances in which the optimization has stopped at zero knots (most likely erroneously), and removes them from the ensemble. Defaults to TRUE. Do not change this unless you really know what you are doing!

mc.cores

Multi-core computing (for speedups): The number of cores to utilize. Defaults to the number of cores detected.

Details

This will contain a very detailed description... See Politsch et al. (2020a) for the full algorithms. The bootstrap algorithm should generally be chosen according to the following criteria:

Value

An object of class 'bootstrap.trendfilter'. This is a list with the following elements:

x.eval

The grid of inputs the trend filtering estimate and variability bands were evaluated on.

tf.estimate

The trend filtering estimate of the signal, evaluated on x.eval.

bootstrap.lower.band

Vector of lower bounds for the 1-alpha pointwise variability band, evaluated on x.eval.

bootstrap.upper.band

Vector of upper bounds for the 1-alpha pointwise variability band, evaluated on x.eval.

bootstrap.algorithm

The string specifying the bootstrap algorithms that was used.

alpha

The 'level' of the variability bands, i.e. alpha produces a 100*(1-alpha)% pointwise variability band.

B

The number of bootstrap samples used to estimate the pointwise variability bands.

tf.bootstrap.ensemble

(Optional) If full.ensemble = TRUE, the full trend filtering bootstrap ensemble as an \mjeqnn \times Bascii matrix, less any columns potentially pruned post-hoc (if prune = TRUE). If full.ensemble = FALSE, then this will return NULL.

edf.boots

An integer vector of the estimated number of effective degrees of freedom of each trend filtering bootstrap estimate. These should all be relatively close to edf.min (below).

prune

logical. If TRUE, then the trend filtering bootstrap ensemble is examined for rare instances in which the optimization has stopped at zero knots (most likely erroneously), and removes them from the ensemble.

n.pruned

The number of badly-converged bootstrap trend filtering estimates pruned from the ensemble.

x

The vector of the observed inputs.

y

The vector of the observed outputs.

weights

A vector of weights for the observed outputs. These are defined as weights = 1 / sigma^2, where sigma is a vector of standard errors of the uncertainty in the output measurements.

residuals

residuals = y - fitted.values.

k

The degree of the trend filtering estimator.

gammas

Vector of hyperparameter values tested during validation.

gammas.min

Hyperparameter value that minimizes the validation error curve.

edf

Integer vector of effective degrees of freedom for trend filtering estimators fit during validation.

edf.min

The effective degrees of freedom of the optimally-tuned trend filtering estimator.

i.min

The index of gammas that minimizes the validation error.

validation.method

Either "SURE" or "V-fold CV".

error

Vector of hyperparameter validation errors, inherited from obj (either class 'SURE.trendfilter' or 'cv.trendfilter')

thinning

logical. If TRUE, then the data are preprocessed so that a smaller, better conditioned data set is used for fitting.

optimization.params

a list of parameters that control the trend filtering convex optimization.

n.iter

Vector of the number of iterations needed for the ADMM algorithm to converge within the given tolerance, for each hyperparameter value. If many of these are exactly equal to max_iter, then their solutions have not converged with the tolerance specified by obj_tol. In which case, it is often prudent to increase max_iter.

n.iter.boots

Vector of the number of iterations needed for the ADMM algorithm to converge within the given tolerance, for each bootstrap trend filtering estimate.

x.scale, y.scale, data.scaled

for internal use.

Author(s)

Collin A. Politsch, collinpolitsch@gmail.com

References

Trend filtering and the various bootstraps in practice

  1. Politsch et al. (2020a). Trend filtering – I. A modern statistical tool for time-domain astronomy and astronomical spectroscopy. Monthly Notices of the Royal Astronomical Society, 492(3), p. 4005-4018. [Link]

  2. Politsch et al. (2020b). Trend Filtering – II. Denoising astronomical signals with varying degrees of smoothness. Monthly Notices of the Royal Astronomical Society, 492(3), p. 4019-4032. [Link]

The Bootstrap (and variations)

  1. Hastie, Tibshirani, and Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer Series in Statistics. [Online print #12]

  2. Mammen (1993). Bootstrap and Wild Bootstrap for High Dimensional Linear Models. The Annals of Statistics, 21(1), p. 255-285. [Link]

  3. Efron and Tibshirani (1986). Bootstrap Methods for Standard Errors, Confidence Intervals, and Other Measures of Statistical Accuracy. Statistical Science, 1(1), p. 54-75. [Link]

  4. Efron (1979). Bootstrap Methods: Another Look at the Jackknife. The Annals of Statistics, 7(1), p. 1-26. [Link]

Trend filtering optimization algorithm

  1. Ramdas and Tibshirani (2016). Fast and Flexible ADMM Algorithms for Trend Filtering. Journal of Computational and Graphical Statistics, 25(3), p. 839-858. [Link]

  2. Arnold, Sadhanala, and Tibshirani (2014). Fast algorithms for generalized lasso problems. R package glmgen. Version 0.0.3. [Link] (Implementation of Ramdas and Tibshirani algorithm)

See Also

SURE.trendfilter, cv.trendfilter

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
#############################################################################
##                    Quasar Lyman-alpha forest example                    ##
#############################################################################
# A quasar is an extremely luminous galaxy with an active supermassive black 
# hole at its center. Absorptions in the spectra of quasars at vast 
# cosmological distances from our galaxy reveal the presence of a gaseous 
# medium permeating the entirety of intergalactic space -- appropriately 
# named the 'intergalactic medium'. These absorptions allow astronomers to 
# study the structure of the Universe using the distribution of these 
# absorptions in quasar spectra. Particularly important is the 'forest' of 
# absorptions that arise from the Lyman-alpha spectral line, which traces 
# the presence of electrically neutral hydrogen in the intergalactic medium.
#
# Here, we are interested in denoising the Lyman-alpha forest of a quasar 
# spectroscopically measured by the Sloan Digital Sky Survey. SDSS spectra 
# are equally spaced in log10 wavelength space, aside from some instances of 
# masked pixels.

data("quasar_spec")
data("plotting_utilities")

# head(data)
#
# | log10.wavelength|       flux|   weights|
# |----------------:|----------:|---------:|
# |           3.5529|  0.4235348| 0.0417015|
# |           3.5530| -2.1143005| 0.1247811|
# |           3.5531| -3.7832341| 0.1284383|

SURE.out <- SURE.trendfilter(x = data$log10.wavelength, 
                             y = data$flux, 
                             weights = data$weights)


# Extract the estimated hyperparameter error curve and optimized trend 
# filtering estimate from the `SURE.trendfilter` output, and transform the 
# input grid to wavelength space (in Angstroms).

log.gammas <- log(SURE.out$gammas)
errors <- SURE.out$errors
log.gamma.min <- log(SURE.out$gamma.min)

wavelength <- 10 ^ (SURE.out$x)
wavelength.eval <- 10 ^ (SURE.out$x.eval)
tf.estimate <- SURE.out$tf.estimate


# Run a parametric bootstrap on the optimized trend filtering estimator to 
# obtain uncertainty bands

boot.out <- bootstrap.trendfilter(obj = SURE.out, 
                                  bootstrap.algorithm = "parametric")


# Plot the results

par(mfrow = c(2,1), mar = c(5,4,2.5,1) + 0.1)
plot(x = log.gammas, y = errors, main = "SURE error curve", 
     xlab = "log(gamma)", ylab = "SURE error")
abline(v = log.gamma.min, lty = 2, col = "blue3")
text(x = log.gamma.min, y = par("usr")[4], 
     labels = "optimal gamma", pos = 1, col = "blue3")

plot(x = wavelength, y = SURE.out$y, type = "l", 
     main = "Quasar Lyman-alpha forest", 
     xlab = "Observed wavelength (Angstroms)", ylab = "Flux")
polygon(c(wavelength.eval, rev(wavelength.eval)), 
        c(boot.out$bootstrap.lower.band, 
        rev(boot.out$bootstrap.upper.band)),
        col = transparency("orange", 90), border = NA)
lines(wavelength.eval, boot.out$bootstrap.lower.band, 
      col = "orange", lwd = 0.5)
lines(wavelength.eval, boot.out$bootstrap.upper.band, 
      col = "orange", lwd = 0.5)
lines(wavelength.eval, tf.estimate, col = "orange", lwd = 2.5)
legend(x = "topleft", lwd = c(1,2,8), lty = 1, cex = 0.75,
       col = c("black","orange", transparency("orange", 90)), 
       legend = c("Noisy quasar spectrum",
                  "Trend filtering estimate",
                  "95% variability band"))

capolitsch/trendfilteringSupp documentation built on Oct. 15, 2021, 11:04 a.m.