rss | R Documentation |
Fits a sequence of regression models using robust subset selection.
rss(
x,
y,
k = 0:min(nrow(x) - 1, ncol(x), 20),
h = round(seq(0.75, 1, 0.05) * nrow(x)),
k.mio = NULL,
h.mio = NULL,
params = list(TimeLimit = 60, OutputFlag = 0),
tau = 1.5,
warm.start = TRUE,
robust = TRUE,
max.ns.iter = 100,
max.gd.iter = 1e+05,
eps = 1e-04
)
x |
a predictor matrix |
y |
a response vector |
k |
the number of predictors to minimise sum of squares over; by default a sequence from 0 to 20 |
h |
the number of observations to minimise sum of squares over; by default a sequence from 75 to 100 percent of sample size (in increments of 5 percent) |
k.mio |
the subset of |
h.mio |
the subset of |
params |
a list of parameters (settings) to pass to the mixed-integer solver (Gurobi) |
tau |
a positive number greater than or equal to 1 used to tighten coefficient bounds in the
mixed-integer solver; small values give quicker run times but can also exclude the optimal
solution; can be |
warm.start |
a logical indicating whether to warm start the mio solver using the heuristics |
robust |
a logical indicating whether to standardise the data robustly; median/mad for
|
max.ns.iter |
the maximum number of neighbourhood search iterations allowed |
max.gd.iter |
the maximum number of gradient descent iterations allowed per value of
|
eps |
a numerical tolerance parameter used to declare convergence |
The function first computes solutions over all combinations of k
and h
using heuristics. The heuristics include projected block-coordinate gradient descent and
neighbourhood search (see arXiv). The solutions produced
by the heuristics can be refined further using the mixed-integer solver. The tuning parameters
that the solver operates on are specified by the k.mio
and h.mio
parameters,
which must be subsets of k
and h
.
By default, the mixed-integer optimisation problem is formulated with SOS constraints and
bound constraints. The bound constraints are estimated as \tau\|\hat{\beta}\|_\infty
, where
\hat{\beta}
is output from the heuristics. For finite values of tau
, the
mixed-integer solver automatically converts the SOS constraints to Big-M constraints, which are
more numerically efficient to optimise.
An object of class rss
; a list with the following components:
beta |
an array of estimated regression coefficients; columns correspond to |
weights |
an array of binary weights; weights equal to one correspond to good observations
selected for inclusion in the least squares fit; columns correspond to |
objval |
a matrix with the objective function values; rows correspond to |
mipgap |
a matrix with the optimality gap values; rows correspond to |
k |
a vector containing the values of |
h |
a vector containing the values of |
Ryan Thompson
Thompson, R. (2022). 'Robust subset selection'. Computational Statistics and Data Analysis 169, p. 107415.
# Generate training data with mixture error
set.seed(0)
n <- 100
p <- 10
p0 <- 5
ncontam <- 10
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e
# Robust subset selection
fit <- rss(x, y, k.mio = p0, h.mio = n - ncontam, params = list(OutputFlag = 1))
# Extract model coefficients, generate predictions, and plot cross-validation results
coef(fit, k = p0, h = n - ncontam)
predict(fit, x[1:3, ], k = p0, h = n - ncontam)
plot(fit)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.