cv.rss: Cross-validated robust subset selection
In ryan-thompson/robustsubsets: Robust Subset Selection

cv.rss

R Documentation

Cross-validated robust subset selection

Description

Fits a sequence of regression models using robust subset selection and then cross-validates these models.

Usage

cv.rss(
  x,
  y,
  k = 0:min(nrow(x) - 1, ncol(x), 20),
  h = function(n) round(seq(0.75, 1, 0.05) * n),
  mio = "min",
  nfold = 10,
  cv.loss = tmspe,
  cluster = NULL,
  ...
)

Arguments

`x`	a predictor matrix
`y`	a response vector
`k`	the number of predictors to minimise sum of squares over; by default a sequence from 0 to 20
`h`	a function that takes the sample size that returns the number of observations to minimise sum of squares over; by default produces a sequence from 75 to 100 percent of sample size (in increments of 5 percent); a function is used here to facilitate varying sample sizes in cross-validation
`mio`	one of 'min', 'all', or 'none' indicating whether to run the mixed-integer solver on the `k` and `h` that minimise the cv error, all `k` and `h`, or none at all
`nfold`	the number of folds to use in cross-validation
`cv.loss`	an optional cross-validation loss-function to use; should accept a vector of errors; by default trimmed mean square prediction error with 25% trimming
`cluster`	an optional cluster for running cross-validation in parallel; must be set up using `parallel::makeCluster`
`...`	any other arguments

Value

An object of class cv.rss; a list with the following components:

`cv`	a matrix with the cross-validated values of `cv.loss`; rows correspond to `k` and columns to `h`
`k`	a vector containing the values of `k` used in the fit
`h`	a vector containing the values of `h` used in the fit
`k.min`	the `k` yielding the lowest cross-validated `cv.loss`
`h.min`	the `h` yielding the lowest cross-validated `cv.loss`
`fit`	the fit from running `rss()` on the full data

Author(s)

Ryan Thompson

Examples

# Generate training data with mixture error
set.seed(0)
n <- 100
p <- 10
p0 <- 5
ncontam <- 10
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e

# Robust subset selection with cross-validation
cl <- parallel::makeCluster(2)
fit <- cv.rss(x, y, cluster = cl)
parallel::stopCluster(cl)

# Extract model coefficients, generate predictions, and plot cross-validation results
coef(fit)
predict(fit, x[1:3, ])
plot(fit)

ryan-thompson/robustsubsets documentation built on Dec. 14, 2024, 6:25 a.m.