cv.trendfilter: Optimize the trend filtering hyperparameter (by V-fold cross...
In capolitsch/trendfilteringSupp: Optimal one-dimensional data analysis with trend filtering

Description Usage Arguments Details Value Author(s) References See Also Examples

View source: R/cv.trendfilter.R

\loadmathjax

cv.trendfilter performs V-fold cross validation to estimate the random-input squared error of a trend filtering estimator on a grid of values for the hyperparameter gamma, and returns the full error curve and the optimized trend filtering estimate within a larger list with useful ancillary information.

cv.trendfilter(
  x,
  y,
  weights = NULL,
  k = 2L,
  V = 5L,
  ngammas = 250L,
  gammas = NULL,
  gamma.choice = c("gamma.min", "gamma.1se"),
  validation.error.type = c("WMAE", "WMSE", "MAE", "MSE"),
  nx.eval = 1500L,
  x.eval = NULL,
  thinning = NULL,
  optimization.params = trendfilter.control.list(max_iter = 600L, obj_tol = 1e-10),
  mc.cores = detectCores()
)

`x`	The vector of observed values of the input variable (a.k.a. the predictor, covariate, explanatory variable, regressor, independent variable, control variable, etc.)
`y`	The vector of observed values of the output variable (a.k.a. the response, target, outcome, regressand, dependent variable, etc.).
`weights`	A vector of weights for the observed outputs. These are defined as `weights = 1 / sigma^2`, where `sigma` is a vector of standard errors of the uncertainty in the output measurements. `weights` should either have length equal to 1 (corresponding to observations with a constant (scalar) variance of `sigma = 1/sqrt(weights)`) or length equal to `length(y)` (i.e. general heteroskedastic noise).
`k`	The degree of the trend filtering estimator. Defaults to `k=2` (quadratic trend filtering). Must be one of `k = 0,1,2,3`, although `k=3` is discouraged due to algorithmic instability (and is visually indistinguishable from `k=2` anyway).
`V`	The number of folds the data are split into for the V-fold cross validation. Defaults to `V=5` (recommended).
`ngammas`	Integer. The number of trend filtering hyperparameter values to run the grid search over.
`gammas`	Overrides `ngammas` if passed. A user-supplied vector of trend filtering hyperparameter values to run the grid search over. It is advisable to let the vector be equally-spaced in log-space and provided in descending order. The function output will contain the sorted hyperparameter vector regardless of the input ordering, and all related output objects (e.g. the `errors` vector) will correspond to the sorted ordering. Unless you know what you are doing, it is best to leave this `NULL`. validation.
`gamma.choice`	One of `c("gamma.min","gamma.1se")`. The choice of hyperparameter that is used for optimized trend filtering estimate. `gamma.min`: the hyperparameter value that minimizes the cross validation error curve. `gamma.1se`: the largest hyperparameter value with a cross validation error within 1 standard error of the minimum cross validation error. This choice therefore favors simpler (i.e. smoother) trend filtering estimates. The motivation here is essentially Occam's razor: the two models yield results that are quantitatively very close, so we favor the simpler model.
`validation.error.type`	Type of error to optimize during cross validation. One of `c("WMAE","WMSE","MAE","MSE")`, i.e. mean-absolute deviations error, mean-squared error, and their weighted counterparts. If `weights = NULL`, then the weighted and unweighted counterparts are equivalent. In short, weighting helps combat heteroskedasticity and absolute error decreases sensitivity to outliers. Defaults to `"WMAE"`.
`nx.eval`	The length of the equally-spaced input grid to evaluate the evaluate the optimized trend filtering estimate on.
`x.eval`	Overrides `nx.eval` if passed. A user-supplied grid of inputs to evaluate the optimized trend filtering estimate on.
`thinning`	logical. If `TRUE`, then the data are preprocessed so that a smaller, better conditioned data set is used for fitting. When left `NULL`, the default, the optimization will automatically detect whether thinning should be applied (i.e., cases in which the numerical fitting algorithm will struggle to converge). This preprocessing procedure is controlled by the `x_tol` argument of `trendfilter.control.list`.
`optimization.params`	a named list of parameters produced by the glmgen function `trendfilter.control.list` that contains all parameter choices (user-supplied or defaults) to be passed to the trend filtering ADMM algorithm (Ramdas and Tibshirani 2016). See the linked function documentation for full details. No technical understanding of the ADMM algorithm is needed and the default parameter choices will almost always suffice. However, the following three parameters may require some adjustments to ensure that your trend filtering estimate has sufficiently converged: `max_iter`: Maximum iterations allowed for the trend filtering convex optimization. Defaults to `max_iter = 600L`. Increase this if the trend filtering estimate does not appear to have fully converged to a reasonable estimate of the signal. `obj_tol`: The tolerance used in the convex optimization stopping criterion; when the relative change in the objective function is less than this value, the algorithm terminates. Decrease this if the trend filtering estimate does not appear to have fully converged to a reasonable estimate of the signal. x_tol: defines uniqueness or sameness of `x`'s. If we make bins of size `x_tol` and find at least two `x`'s which fall into the same bin, then we thin the data.
`mc.cores`	Multi-core computing (for speedups): The number of cores to utilize. Defaults to the number of cores detected.

This will be a very detailed description...

\mjeqnWMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|\frac\sqrtw_i\sum_j\sqrtw_jascii
\mjeqnWMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2\fracw_i\sum_jw_jascii
\mjeqnMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|ascii
\mjeqnMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2ascii

where \mjeqn\widehatf(x_i; \gamma)ascii is the trend filtering estimate with hyperparameter γ, evaluated at \mjeqnx_iascii.

An object of class 'cv.trendfilter'. This is a list with the following elements:

`x.eval`	The grid of inputs the optimized trend filtering estimate was evaluated on.
`tf.estimate`	The optimized trend filtering estimate of the signal, evaluated on `x.eval`.
`validation.method`	`paste0(V,"-fold CV")`
`V`	The number of folds the data are split into for the V-fold cross validation.
`validation.error.type`	Type of error that validation was performed on. One of `c("WMAE","WMSE","MAE","MSE")`.
`gammas`	Vector of hyperparameter values tested during validation. This vector will always be returned in descending order, regardless of the ordering provided by the user. The indices `i.min` and `i.1se` correspond to this descending ordering.
`gamma.min`	Hyperparameter value that minimizes the SURE error curve.
`gamma.1se`	The largest hyperparameter value that is still within one standard error of the minimum hyperparameter's cross validation error.
`gamma.choice`	One of `c("gamma.min","gamma.1se")`. The choice of hyperparameter that is used for optimized trend filtering estimate.
`edfs`	Vector of effective degrees of freedom for trend filtering estimators fit during validation.
`edf.min`	The effective degrees of freedom of the optimally-tuned trend filtering estimator.
`edf.1se`	The effective degrees of freedom of the 1-stand-error rule trend filtering estimator.
`i.min`	The index of `gammas` that minimizes the cross validation error.
`i.1se`	The index of `gammas` that gives the largest hyperparameter value that has a cross validation error within 1 standard error of the minimum of the cross validation error curves.
`errors`	Vector of cross validation errors for the given hyperparameter values.
`se.errors`	The standard errors of the cross validation errors. These are particularly useful for implementing the “1-standard-error rule”. The 1-SE rule favors a smoother trend filtering estimate by, instead of using the hyperparameter that minimizes the CV error, instead uses the largest hyperparameter that has a CV error within 1 standard error of the smallest CV error.
`x`	The vector of the observed inputs.
`y`	The vector of the observed outputs.
`weights`	A vector of weights for the observed outputs. These are defined as `weights = 1/sigma^2`, where `sigma` is a vector of standard errors of the uncertainty in the measured outputs.
`fitted.values`	The trend filtering estimate of the signal, evaluated at the observed inputs `x`.
`residuals`	`residuals = y - fitted.values`.
`k`	The degree of the trend filtering estimator.
`thinning`	logical. If `TRUE`, then the data are preprocessed so that a smaller, better conditioned data set is used for fitting.
`optimization.params`	a list of parameters that control the trend filtering convex optimization.
`n.iter`	Vector of the number of iterations needed for the ADMM algorithm to converge within the given tolerance, for each hyperparameter value. If many of these are exactly equal to `max_iter`, then their solutions have not converged with the tolerance specified by `obj_tol`. In which case, it is often prudent to increase `max_iter`.
`x.scale, y.scale, data.scaled`	for internal use.

Collin A. Politsch, collinpolitsch@gmail.com

Cross validation

Hastie, Tibshirani, and Friedman (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer Series in Statistics. [Online print #12]. (See Sections 7.10 and 7.12)
James, Witten, Hastie, and Tibshirani (2013). An Introduction to Statistical Learning : with Applications in R. Springer. [Most recent online print] (See Section 5.1). Less technical than the above reference.
Tibshirani (2013). Model selection and validation 2: Model assessment, more cross-validation. 36-462: Data Mining course notes (Carnegie Mellon). [Link]

Trend filtering optimization algorithm

Ramdas and Tibshirani (2016). Fast and Flexible ADMM Algorithms for Trend Filtering. Journal of Computational and Graphical Statistics, 25(3), p. 839-858. [Link]
Arnold, Sadhanala, and Tibshirani (2014). Fast algorithms for generalized lasso problems. R package glmgen. Version 0.0.3. [Link]
(Software implementation of Ramdas and Tibshirani algorithm)

SURE.trendfilter, bootstrap.trendfilter

#######################################################################
###  Phase-folded light curve of an eclipsing binary star system   ####
#######################################################################
# A binary star system is a pair of closely-separated stars that move
# in an orbit around a common center of mass. When a binary star system 
# is oriented in such a way that the stars eclipse one another from our 
# vantage point on Earth, we call it an 'eclipsing binary (EB) star 
# system'. From our perspective, the total brightness of an EB dips 
# periodically over time due to the stars eclipsing one another. And 
# the shape of the brightness curve is consistent within each period
# of the orbit. In order to learn about the physics of these EBs,
# astronomers 'phase-fold' the brightness curve so that all the orbital 
# periods are stacked on top of one another in a plot of the EB's phase 
# vs. its apparent brightness, and then find a 'best-fitting' model
# for the phase-folded curve. Here, we use trend filtering to fit an
# optimal phase-folded model for an EB.

data(eclipsing_binary)

# head(df)
#
# |      phase|      flux|  std.err|
# |----------:|---------:|--------:|
# | -0.4986308| 0.9384845| 0.010160|
# | -0.4978067| 0.9295757| 0.010162|
# | -0.4957892| 0.9438493| 0.010162|

# I did not think up this specific choice of grid a priori
# It required some empirical honing
gamma.grid <- exp( seq(7, 16, length = 150) )

cv.out <- cv.trendfilter(x = df$phase, 
                         y = df$flux, 
                         weights = 1 / df$std.err ^ 2,
                         gammas = gamma.grid,
                         validation.error.type = "MAE",
                         thinning = TRUE, 
                         optimization.params = glmgen::trendfilter.control.list(max_iter = 5e3,
                                                                                obj_tol = 1e-6)
                         )

# Plot the results

par(mfrow = c(2,1), mar = c(5,4,2.5,1) + 0.1)
plot(log(cv.out$gammas), cv.out$errors, main = "CV error curve", 
     xlab = "log(gamma)", ylab = "CV error")
segments(x0 = log(cv.out$gammas), x1 = log(cv.out$gammas), 
         y0 = cv.out$errors - cv.out$se.errors, 
         y1 = cv.out$errors + cv.out$se.errors)
abline(v = log(cv.out$gamma.min), lty = 2, col = "blue3")
text(x = log(cv.out$gamma.min), y = par("usr")[4], 
     labels = "optimal gamma", pos = 1, col = "blue3")
plot(df$phase, df$flux, cex = 0.15, xlab = "Phase", ylab = "Flux",
     main = "Eclipsing binary phase-folded light curve")
segments(x0 = df$phase, x1 = df$phase, 
         y0 = df$flux - df$std.err, y1 = df$flux + df$std.err, 
         lwd = 0.25)
lines(cv.out$x.eval, cv.out$tf.estimate, col = "orange", lwd = 2.5)