cv.rss: Cross-validated robust subset selection

cv.rssR Documentation

Cross-validated robust subset selection

Description

Fits a sequence of regression models using robust subset selection and then cross-validates these models.

Usage

cv.rss(
  x,
  y,
  k = 0:min(nrow(x) - 1, ncol(x), 20),
  h = function(n) round(seq(0.75, 1, 0.05) * n),
  mio = "min",
  nfold = 10,
  cv.loss = tmspe,
  cluster = NULL,
  ...
)

Arguments

x

a predictor matrix

y

a response vector

k

the number of predictors to minimise sum of squares over; by default a sequence from 0 to 20

h

a function that takes the sample size that returns the number of observations to minimise sum of squares over; by default produces a sequence from 75 to 100 percent of sample size (in increments of 5 percent); a function is used here to facilitate varying sample sizes in cross-validation

mio

one of 'min', 'all', or 'none' indicating whether to run the mixed-integer solver on the k and h that minimise the cv error, all k and h, or none at all

nfold

the number of folds to use in cross-validation

cv.loss

an optional cross-validation loss-function to use; should accept a vector of errors; by default trimmed mean square prediction error with 25% trimming

cluster

an optional cluster for running cross-validation in parallel; must be set up using parallel::makeCluster

...

any other arguments

Value

An object of class cv.rss; a list with the following components:

cv

a matrix with the cross-validated values of cv.loss; rows correspond to k and columns to h

k

a vector containing the values of k used in the fit

h

a vector containing the values of h used in the fit

k.min

the k yielding the lowest cross-validated cv.loss

h.min

the h yielding the lowest cross-validated cv.loss

fit

the fit from running rss() on the full data

Author(s)

Ryan Thompson

Examples

# Generate training data with mixture error
set.seed(0)
n <- 100
p <- 10
p0 <- 5
ncontam <- 10
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e

# Robust subset selection with cross-validation
cl <- parallel::makeCluster(2)
fit <- cv.rss(x, y, cluster = cl)
parallel::stopCluster(cl)

# Extract model coefficients, generate predictions, and plot cross-validation results
coef(fit)
predict(fit, x[1:3, ])
plot(fit)

ryan-thompson/robustsubsets documentation built on Dec. 14, 2024, 6:25 a.m.