Description Usage Arguments Details Value Author(s) References See Also Examples
View source: R/cv.trendfilter.R
cv.trendfilter performs V-fold cross
validation to estimate the random-input squared error of a trend filtering
estimator on a grid of values for the hyperparameter gamma, and
returns the full error curve and the optimized trend filtering estimate
within a larger list with useful ancillary information.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | cv.trendfilter(
x,
y,
weights = NULL,
k = 2L,
V = 5L,
ngammas = 250L,
gammas = NULL,
gamma.choice = c("gamma.min", "gamma.1se"),
validation.error.type = c("WMAE", "WMSE", "MAE", "MSE"),
nx.eval = 1500L,
x.eval = NULL,
thinning = NULL,
optimization.params = trendfilter.control.list(max_iter = 600L, obj_tol = 1e-10),
mc.cores = detectCores()
)
|
x |
The vector of observed values of the input variable (a.k.a. the predictor, covariate, explanatory variable, regressor, independent variable, control variable, etc.) |
y |
The vector of observed values of the output variable (a.k.a. the response, target, outcome, regressand, dependent variable, etc.). |
weights |
A vector of weights for the observed outputs. These are
defined as |
k |
The degree of the trend filtering estimator. Defaults to
|
V |
The number of folds the data are split into for the V-fold cross
validation. Defaults to |
ngammas |
Integer. The number of trend filtering hyperparameter values to run the grid search over. |
gammas |
Overrides |
gamma.choice |
One of |
validation.error.type |
Type of error to optimize during cross
validation. One of |
nx.eval |
The length of the equally-spaced input grid to evaluate the evaluate the optimized trend filtering estimate on. |
x.eval |
Overrides |
thinning |
logical. If |
optimization.params |
a named list of parameters produced by the
glmgen function
|
mc.cores |
Multi-core computing (for speedups): The number of cores to utilize. Defaults to the number of cores detected. |
This will be a very detailed description...
\mjeqnWMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|\frac\sqrtw_i\sum_j\sqrtw_jascii
\mjeqnWMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2\fracw_i\sum_jw_jascii
\mjeqnMAE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|ascii
\mjeqnMSE(\gamma) = \frac1n\sum_i=1^n |Y_i - \widehatf(x_i; \gamma)|^2ascii
where \mjeqn\widehatf(x_i; \gamma)ascii is the trend filtering
estimate with hyperparameter γ, evaluated at
\mjeqnx_iascii.
An object of class 'cv.trendfilter'. This is a list with the following elements:
x.eval |
The grid of inputs the optimized trend filtering estimate was evaluated on. |
tf.estimate |
The optimized trend filtering estimate of the signal,
evaluated on |
validation.method |
|
V |
The number of folds the data are split into for the V-fold cross validation. |
validation.error.type |
Type of error that validation was performed on.
One of |
gammas |
Vector of hyperparameter values tested during validation. This
vector will always be returned in descending order, regardless of the
ordering provided by the user. The indices |
gamma.min |
Hyperparameter value that minimizes the SURE error curve. |
gamma.1se |
The largest hyperparameter value that is still within one standard error of the minimum hyperparameter's cross validation error. |
gamma.choice |
One of |
edfs |
Vector of effective degrees of freedom for trend filtering estimators fit during validation. |
edf.min |
The effective degrees of freedom of the optimally-tuned trend filtering estimator. |
edf.1se |
The effective degrees of freedom of the 1-stand-error rule trend filtering estimator. |
i.min |
The index of |
i.1se |
The index of |
errors |
Vector of cross validation errors for the given hyperparameter values. |
se.errors |
The standard errors of the cross validation errors. These are particularly useful for implementing the “1-standard-error rule”. The 1-SE rule favors a smoother trend filtering estimate by, instead of using the hyperparameter that minimizes the CV error, instead uses the largest hyperparameter that has a CV error within 1 standard error of the smallest CV error. |
x |
The vector of the observed inputs. |
y |
The vector of the observed outputs. |
weights |
A vector of weights for the observed outputs. These are
defined as |
fitted.values |
The trend filtering estimate of the signal, evaluated at
the observed inputs |
residuals |
|
k |
The degree of the trend filtering estimator. |
thinning |
logical. If |
optimization.params |
a list of parameters that control the trend filtering convex optimization. |
n.iter |
Vector of the number of iterations needed for the ADMM
algorithm to converge within the given tolerance, for each hyperparameter
value. If many of these are exactly equal to |
x.scale, y.scale, data.scaled |
for internal use. |
Collin A. Politsch, collinpolitsch@gmail.com
Cross validation
Hastie, Tibshirani, and Friedman (2009). The Elements of Statistical
Learning: Data Mining, Inference, and Prediction. 2nd edition. Springer
Series in Statistics.
[Online print #12]. (See Sections 7.10 and 7.12)
James, Witten, Hastie, and Tibshirani (2013). An Introduction to
Statistical Learning : with Applications in R. Springer.
[Most recent online print] (See
Section 5.1). Less technical than the above reference.
Tibshirani (2013). Model selection and validation 2: Model assessment, more cross-validation. 36-462: Data Mining course notes (Carnegie Mellon). [Link]
Trend filtering optimization algorithm
Ramdas and Tibshirani (2016). Fast and Flexible ADMM Algorithms
for Trend Filtering. Journal of Computational and Graphical
Statistics, 25(3), p. 839-858.
[Link]
Arnold, Sadhanala, and Tibshirani (2014). Fast algorithms for
generalized lasso problems. R package glmgen. Version 0.0.3.
[Link]
(Software implementation of Ramdas and Tibshirani algorithm)
SURE.trendfilter, bootstrap.trendfilter
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | #######################################################################
### Phase-folded light curve of an eclipsing binary star system ####
#######################################################################
# A binary star system is a pair of closely-separated stars that move
# in an orbit around a common center of mass. When a binary star system
# is oriented in such a way that the stars eclipse one another from our
# vantage point on Earth, we call it an 'eclipsing binary (EB) star
# system'. From our perspective, the total brightness of an EB dips
# periodically over time due to the stars eclipsing one another. And
# the shape of the brightness curve is consistent within each period
# of the orbit. In order to learn about the physics of these EBs,
# astronomers 'phase-fold' the brightness curve so that all the orbital
# periods are stacked on top of one another in a plot of the EB's phase
# vs. its apparent brightness, and then find a 'best-fitting' model
# for the phase-folded curve. Here, we use trend filtering to fit an
# optimal phase-folded model for an EB.
data(eclipsing_binary)
# head(df)
#
# | phase| flux| std.err|
# |----------:|---------:|--------:|
# | -0.4986308| 0.9384845| 0.010160|
# | -0.4978067| 0.9295757| 0.010162|
# | -0.4957892| 0.9438493| 0.010162|
# I did not think up this specific choice of grid a priori
# It required some empirical honing
gamma.grid <- exp( seq(7, 16, length = 150) )
cv.out <- cv.trendfilter(x = df$phase,
y = df$flux,
weights = 1 / df$std.err ^ 2,
gammas = gamma.grid,
validation.error.type = "MAE",
thinning = TRUE,
optimization.params = glmgen::trendfilter.control.list(max_iter = 5e3,
obj_tol = 1e-6)
)
# Plot the results
par(mfrow = c(2,1), mar = c(5,4,2.5,1) + 0.1)
plot(log(cv.out$gammas), cv.out$errors, main = "CV error curve",
xlab = "log(gamma)", ylab = "CV error")
segments(x0 = log(cv.out$gammas), x1 = log(cv.out$gammas),
y0 = cv.out$errors - cv.out$se.errors,
y1 = cv.out$errors + cv.out$se.errors)
abline(v = log(cv.out$gamma.min), lty = 2, col = "blue3")
text(x = log(cv.out$gamma.min), y = par("usr")[4],
labels = "optimal gamma", pos = 1, col = "blue3")
plot(df$phase, df$flux, cex = 0.15, xlab = "Phase", ylab = "Flux",
main = "Eclipsing binary phase-folded light curve")
segments(x0 = df$phase, x1 = df$phase,
y0 = df$flux - df$std.err, y1 = df$flux + df$std.err,
lwd = 0.25)
lines(cv.out$x.eval, cv.out$tf.estimate, col = "orange", lwd = 2.5)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.