cv.trendfilter: Optimize the trend filtering hyperparameter (by V-fold cross...

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cv.trendfilter.R

Description

\loadmathjax

cv.trendfilter performs V-fold cross validation to estimate the random-input squared error of a trend filtering estimator on a grid of values for the hyperparameter gamma, and returns the full error curve and the optimized trend filtering estimate within a larger list with useful ancillary information.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
cv.trendfilter(
  x,
  y,
  weights = NULL,
  k = 2L,
  V = 5L,
  ngammas = 250L,
  gammas = NULL,
  gamma.choice = c("gamma.min", "gamma.1se"),
  validation.error.type = c("WMAE", "WMSE", "MAE", "MSE"),
  nx.eval = 1500L,
  x.eval = NULL,
  thinning = NULL,
  optimization.params = trendfilter.control.list(max_iter = 600L, obj_tol = 1e-10),
  mc.cores = detectCores()
)

Arguments

x

The vector of observed values of the input variable (a.k.a. the predictor, covariate, explanatory variable, regressor, independent variable, control variable, etc.)

y

The vector of observed values of the output variable (a.k.a. the response, target, outcome, regressand, dependent variable, etc.).

weights

A vector of weights for the observed outputs. These are defined as weights = 1 / sigma^2, where sigma is a vector of standard errors of the uncertainty in the output measurements. weights should either have length equal to 1 (corresponding to observations with a constant (scalar) variance of sigma = 1/sqrt(weights)) or length equal to length(y) (i.e. general heteroskedastic noise).

k

The degree of the trend filtering estimator. Defaults to k=2 (quadratic trend filtering). Must be one of k = 0,1,2,3, although k=3 is discouraged due to algorithmic instability (and is visually indistinguishable from k=2 anyway).

V

The number of folds the data are split into for the V-fold cross validation. Defaults to V=5 (recommended).

ngammas

Integer. The number of trend filtering hyperparameter values to run the grid search over.

gammas

Overrides ngammas if passed. A user-supplied vector of trend filtering hyperparameter values to run the grid search over. It is advisable to let the vector be equally-spaced in log-space and provided in descending order. The function output will contain the sorted hyperparameter vector regardless of the input ordering, and all related output objects (e.g. the errors vector) will correspond to the sorted ordering. Unless you know what you are doing, it is best to leave this NULL. validation.

gamma.choice

One of c("gamma.min","gamma.1se"). The choice of hyperparameter that is used for optimized trend filtering estimate.

gamma.min: the hyperparameter value that minimizes the cross validation error curve.

gamma.1se: the largest hyperparameter value with a cross validation error within 1 standard error of the minimum cross validation error. This choice therefore favors simpler (i.e. smoother) trend filtering estimates. The motivation here is essentially Occam's razor: the two models yield results that are quantitatively very close, so we favor the simpler model.

validation.error.type

Type of error to optimize during cross validation. One of c("WMAE","WMSE","MAE","MSE"), i.e. mean-absolute deviations error, mean-squared error, and their weighted counterparts. If weights = NULL, then the weighted and unweighted counterparts are equivalent. In short, weighting helps combat heteroskedasticity and absolute error decreases sensitivity to outliers. Defaults to "WMAE".

nx.eval

The length of the equally-spaced input grid to evaluate the evaluate the optimized trend filtering estimate on.

x.eval

Overrides nx.eval if passed. A user-supplied grid of inputs to evaluate the optimized trend filtering estimate on.

thinning

logical. If TRUE, then the data are preprocessed so that a smaller, better conditioned data set is used for fitting. When left NULL, the default, the optimization will automatically detect whether thinning should be applied (i.e., cases in which the numerical fitting algorithm will struggle to converge). This preprocessing procedure is controlled by the x_tol argument of trendfilter.control.list.

optimization.params

a named list of parameters produced by the glmgen function trendfilter.control.list that contains all parameter choices (user-supplied or defaults) to be passed to the trend filtering ADMM algorithm (Ramdas and Tibshirani 2016). See the linked function documentation for full details. No technical understanding of the ADMM algorithm is needed and the default parameter choices will almost always suffice. However, the following three parameters may require some adjustments to ensure that your trend filtering estimate has sufficiently converged:

  1. max_iter: Maximum iterations allowed for the trend filtering convex optimization. Defaults to max_iter = 600L. Increase this if the trend filtering estimate does not appear to have fully converged to a reasonable estimate of the signal.

  2. obj_tol: The tolerance used in the convex optimization stopping criterion; when the relative change in the objective function is less than this value, the algorithm terminates. Decrease this if the trend filtering estimate does not appear to have fully converged to a reasonable estimate of the signal.

  3. x_tol: defines uniqueness or sameness of x's. If we make bins of size x_tol and find at least two x's which fall into the same bin, then we thin the data.

mc.cores

Multi-core computing (for speedups): The number of cores to utilize. Defaults to the number of cores detected.

Details

This will be a very detailed description...

\mjeqnWMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|\frac\sqrtw_i\sum_j\sqrtw_jascii
\mjeqnWMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2\fracw_i\sum_jw_jascii
\mjeqnMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|ascii
\mjeqnMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2ascii

where \mjeqn\widehatf(x_i; \gamma)ascii is the trend filtering estimate with hyperparameter γ, evaluated at \mjeqnx_iascii.

Value

An object of class 'cv.trendfilter'. This is a list with the following elements:

x.eval

The grid of inputs the optimized trend filtering estimate was evaluated on.

tf.estimate

The optimized trend filtering estimate of the signal, evaluated on x.eval.

validation.method

paste0(V,"-fold CV")

V

The number of folds the data are split into for the V-fold cross validation.

validation.error.type

Type of error that validation was performed on. One of c("WMAE","WMSE","MAE","MSE").

gammas

Vector of hyperparameter values tested during validation. This vector will always be returned in descending order, regardless of the ordering provided by the user. The indices i.min and i.1se correspond to this descending ordering.

gamma.min

Hyperparameter value that minimizes the SURE error curve.

gamma.1se

The largest hyperparameter value that is still within one standard error of the minimum hyperparameter's cross validation error.

gamma.choice

One of c("gamma.min","gamma.1se"). The choice of hyperparameter that is used for optimized trend filtering estimate.

edfs

Vector of effective degrees of freedom for trend filtering estimators fit during validation.

edf.min

The effective degrees of freedom of the optimally-tuned trend filtering estimator.

edf.1se

The effective degrees of freedom of the 1-stand-error rule trend filtering estimator.

i.min

The index of gammas that minimizes the cross validation error.

i.1se

The index of gammas that gives the largest hyperparameter value that has a cross validation error within 1 standard error of the minimum of the cross validation error curves.

errors

Vector of cross validation errors for the given hyperparameter values.

se.errors

The standard errors of the cross validation errors. These are particularly useful for implementing the “1-standard-error rule”. The 1-SE rule favors a smoother trend filtering estimate by, instead of using the hyperparameter that minimizes the CV error, instead uses the largest hyperparameter that has a CV error within 1 standard error of the smallest CV error.

x

The vector of the observed inputs.

y

The vector of the observed outputs.

weights

A vector of weights for the observed outputs. These are defined as weights = 1/sigma^2, where sigma is a vector of standard errors of the uncertainty in the measured outputs.

fitted.values

The trend filtering estimate of the signal, evaluated at the observed inputs x.

residuals

residuals = y - fitted.values.

k

The degree of the trend filtering estimator.

thinning

logical. If TRUE, then the data are preprocessed so that a smaller, better conditioned data set is used for fitting.

optimization.params

a list of parameters that control the trend filtering convex optimization.

n.iter

Vector of the number of iterations needed for the ADMM algorithm to converge within the given tolerance, for each hyperparameter value. If many of these are exactly equal to max_iter, then their solutions have not converged with the tolerance specified by obj_tol. In which case, it is often prudent to increase max_iter.

x.scale, y.scale, data.scaled

for internal use.

Author(s)

Collin A. Politsch, collinpolitsch@gmail.com

References

Cross validation

  1. Hastie, Tibshirani, and Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer Series in Statistics. [Online print #12]. (See Sections 7.10 and 7.12)

  2. James, Witten, Hastie, and Tibshirani (2013). An Introduction to Statistical Learning : with Applications in R. Springer. [Most recent online print] (See Section 5.1). Less technical than the above reference.

  3. Tibshirani (2013). Model selection and validation 2: Model assessment, more cross-validation. 36-462: Data Mining course notes (Carnegie Mellon). [Link]

Trend filtering optimization algorithm

  1. Ramdas and Tibshirani (2016). Fast and Flexible ADMM Algorithms for Trend Filtering. Journal of Computational and Graphical Statistics, 25(3), p. 839-858. [Link]

  2. Arnold, Sadhanala, and Tibshirani (2014). Fast algorithms for generalized lasso problems. R package glmgen. Version 0.0.3. [Link]
    (Software implementation of Ramdas and Tibshirani algorithm)

See Also

SURE.trendfilter, bootstrap.trendfilter

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
#######################################################################
###  Phase-folded light curve of an eclipsing binary star system   ####
#######################################################################
# A binary star system is a pair of closely-separated stars that move
# in an orbit around a common center of mass. When a binary star system 
# is oriented in such a way that the stars eclipse one another from our 
# vantage point on Earth, we call it an 'eclipsing binary (EB) star 
# system'. From our perspective, the total brightness of an EB dips 
# periodically over time due to the stars eclipsing one another. And 
# the shape of the brightness curve is consistent within each period
# of the orbit. In order to learn about the physics of these EBs,
# astronomers 'phase-fold' the brightness curve so that all the orbital 
# periods are stacked on top of one another in a plot of the EB's phase 
# vs. its apparent brightness, and then find a 'best-fitting' model
# for the phase-folded curve. Here, we use trend filtering to fit an
# optimal phase-folded model for an EB.

data(eclipsing_binary)

# head(df)
#
# |      phase|      flux|  std.err|
# |----------:|---------:|--------:|
# | -0.4986308| 0.9384845| 0.010160|
# | -0.4978067| 0.9295757| 0.010162|
# | -0.4957892| 0.9438493| 0.010162|

# I did not think up this specific choice of grid a priori
# It required some empirical honing
gamma.grid <- exp( seq(7, 16, length = 150) )

cv.out <- cv.trendfilter(x = df$phase, 
                         y = df$flux, 
                         weights = 1 / df$std.err ^ 2,
                         gammas = gamma.grid,
                         validation.error.type = "MAE",
                         thinning = TRUE, 
                         optimization.params = glmgen::trendfilter.control.list(max_iter = 5e3,
                                                                                obj_tol = 1e-6)
                         )

# Plot the results

par(mfrow = c(2,1), mar = c(5,4,2.5,1) + 0.1)
plot(log(cv.out$gammas), cv.out$errors, main = "CV error curve", 
     xlab = "log(gamma)", ylab = "CV error")
segments(x0 = log(cv.out$gammas), x1 = log(cv.out$gammas), 
         y0 = cv.out$errors - cv.out$se.errors, 
         y1 = cv.out$errors + cv.out$se.errors)
abline(v = log(cv.out$gamma.min), lty = 2, col = "blue3")
text(x = log(cv.out$gamma.min), y = par("usr")[4], 
     labels = "optimal gamma", pos = 1, col = "blue3")
plot(df$phase, df$flux, cex = 0.15, xlab = "Phase", ylab = "Flux",
     main = "Eclipsing binary phase-folded light curve")
segments(x0 = df$phase, x1 = df$phase, 
         y0 = df$flux - df$std.err, y1 = df$flux + df$std.err, 
         lwd = 0.25)
lines(cv.out$x.eval, cv.out$tf.estimate, col = "orange", lwd = 2.5)

capolitsch/trendfilteringSupp documentation built on Oct. 15, 2021, 11:04 a.m.