rss: Robust subset selection

rssR Documentation

Robust subset selection

Description

Fits a sequence of regression models using robust subset selection.

Usage

rss(
  x,
  y,
  k = 0:min(nrow(x) - 1, ncol(x), 20),
  h = round(seq(0.75, 1, 0.05) * nrow(x)),
  k.mio = NULL,
  h.mio = NULL,
  params = list(TimeLimit = 60, OutputFlag = 0),
  tau = 1.5,
  warm.start = TRUE,
  robust = TRUE,
  max.ns.iter = 100,
  max.gd.iter = 1e+05,
  eps = 1e-04
)

Arguments

x

a predictor matrix

y

a response vector

k

the number of predictors to minimise sum of squares over; by default a sequence from 0 to 20

h

the number of observations to minimise sum of squares over; by default a sequence from 75 to 100 percent of sample size (in increments of 5 percent)

k.mio

the subset of k for which the mixed-integer solver should be run

h.mio

the subset of h for which the mixed-integer solver should be run

params

a list of parameters (settings) to pass to the mixed-integer solver (Gurobi)

tau

a positive number greater than or equal to 1 used to tighten coefficient bounds in the mixed-integer solver; small values give quicker run times but can also exclude the optimal solution; can be Inf

warm.start

a logical indicating whether to warm start the mio solver using the heuristics

robust

a logical indicating whether to standardise the data robustly; median/mad for TRUE and mean/sd for FALSE

max.ns.iter

the maximum number of neighbourhood search iterations allowed

max.gd.iter

the maximum number of gradient descent iterations allowed per value of k and h

eps

a numerical tolerance parameter used to declare convergence

Details

The function first computes solutions over all combinations of k and h using heuristics. The heuristics include projected block-coordinate gradient descent and neighbourhood search (see arXiv). The solutions produced by the heuristics can be refined further using the mixed-integer solver. The tuning parameters that the solver operates on are specified by the k.mio and h.mio parameters, which must be subsets of k and h.

By default, the mixed-integer optimisation problem is formulated with SOS constraints and bound constraints. The bound constraints are estimated as \tau\|\hat{\beta}\|_\infty, where \hat{\beta} is output from the heuristics. For finite values of tau, the mixed-integer solver automatically converts the SOS constraints to Big-M constraints, which are more numerically efficient to optimise.

Value

An object of class rss; a list with the following components:

beta

an array of estimated regression coefficients; columns correspond to k and matrices to h

weights

an array of binary weights; weights equal to one correspond to good observations selected for inclusion in the least squares fit; columns correspond to k and matrices to h

objval

a matrix with the objective function values; rows correspond to k and columns to h

mipgap

a matrix with the optimality gap values; rows correspond to k and columns to h

k

a vector containing the values of k used in the fit

h

a vector containing the values of h used in the fit

Author(s)

Ryan Thompson

References

Thompson, R. (2022). 'Robust subset selection'. Computational Statistics and Data Analysis 169, p. 107415.

Examples

# Generate training data with mixture error
set.seed(0)
n <- 100
p <- 10
p0 <- 5
ncontam <- 10
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e

# Robust subset selection
fit <- rss(x, y, k.mio = p0, h.mio = n - ncontam, params = list(OutputFlag = 1))

# Extract model coefficients, generate predictions, and plot cross-validation results
coef(fit, k = p0, h = n - ncontam)
predict(fit, x[1:3, ], k = p0, h = n - ncontam)
plot(fit)

ryan-thompson/robustsubsets documentation built on Dec. 14, 2024, 6:25 a.m.