cv.bss: Cross-validated best subset selection

cv.bssR Documentation

Cross-validated best subset selection

Description

Fits a sequence of regression models using best subset selection and then cross-validates these models. This function is just a wrapper for the cv.rss function. The function solves the robust subset selection problem with h=n, using nonrobust measures of location and scale to standardise, and a nonrobust measure of prediction error in cross-validation.

Usage

cv.bss(
  x,
  y,
  k = 0:min(nrow(x) - 1, ncol(x), 20),
  mio = "min",
  nfold = 10,
  cv.loss = mspe,
  ...
)

Arguments

x

a predictor matrix

y

a response vector

k

the number of predictors to minimise sum of squares over; by default a sequence from 0 to 20

mio

one of 'min', 'all', or 'none' indicating whether to run the mixed-integer solver on the k that minimises the cv error, all k, or none at all

nfold

the number of folds to use in cross-validation

cv.loss

an optional cross-validation loss-function to use; should accept a vector of errors; by default mean square prediction error

...

any other arguments

Value

See documentation for the cv.rss function.

Author(s)

Ryan Thompson

Examples

# Generate training data
set.seed(0)
n <- 100
p <- 10
p0 <- 5
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n)
y <- x %*% beta + e

# Best subset selection with cross-validation
cl <- parallel::makeCluster(2)
fit <- cv.bss(x, y, cluster = cl)
parallel::stopCluster(cl)

# Extract model coefficients, generate predictions, and plot cross-validation results
coef(fit)
predict(fit, x[1:3, ])
plot(fit)

ryan-thompson/robustsubsets documentation built on Dec. 14, 2024, 6:25 a.m.