knitr::opts_chunk$set(echo = T, fig.path = 'man/figures/README-')

robustsubsets

Overview

An R implementation of robust subset selection.

Robust subset selection is a robust adaption of the classic best subset selection estimator, and is defined by the constrained least squares problem:

$$ \min_{\beta, I}\,\frac{1}{2}\sum_{i\in I}(y_i-x_i^T\beta)^2\quad\,\operatorname{s.t.}\|\beta\|_0\leq k,\,I\subseteq{1,\ldots,n},\,|I|\geq h $$

Robust subsets seeks out the best subset of predictors and observations and performs a least squares fit on this subset. The number of predictors used in the fit is controlled by the parameter k and the observations by the parameter h.

Installation

You should install Gurobi and the associated R package gurobi before installing robustsubsets. Gurobi is available for free under academic license at https://www.gurobi.com/.

To install robustsubsets from GitHub, run the following code:

``` {r, eval = F} devtools::install_github('ryan-thompson/robustsubsets')

## Usage

The `rss()` function fits a robust subset regression model for a grid of `k` and `h`. The `cv.rss()` function provides a convenient way to automatically cross-validate these parameters.

```r 
library(robustsubsets)

# Generate training data with contaminated predictor matrix
set.seed(0)
n <- 100 # Number of observations
p <- 10 # Number of predictors
p0 <- 5 # Number of relevant predictors
ncontam <- 10 # Number of contaminated observations
beta <- c(rep(1, p0), rep(0, p - p0))
x <- matrix(rnorm(n * p), n, p)
e <- rnorm(n, c(rep(10, ncontam), rep(0, n - ncontam)))
y <- x %*% beta + e

# Fit using robust subset selection
fit <- rss(x, y)
coef(fit, k = p0, h = n - ncontam)

# Cross-validate using robust subset selection
cl <- parallel::makeCluster(2)
fit <- cv.rss(x, y, cluster = cl)
parallel::stopCluster(cl)
coef(fit)

Documentation

See the package reference manual.



ryan-thompson/robustsubsets documentation built on Dec. 14, 2024, 6:25 a.m.