An introduction to bestridge" In bestridge: A Comprehensive R Package for Best Subset Selection

knitr::opts_chunk$set( collapse = TRUE, comment = "#>", warning = F, message = F )  1 Introduction One of the main tasks of statistical modeling is to exploit the association between a response variable and multiple predictors. The linear model (LM), a simple parametric regression model, is often used to capture linear dependence between response and predictors. As the extensions of the linear model, the other two common models: generalized linear model (GLM) and Cox's proportional hazards (CoxPH) model, depending on the types of responses, are modeled with mean linear to the inputs. When the number of predictors is large, parameter estimations in these models can be computationally burdensome. At the same time, a Widely accepted heuristic rule for statistical modeling is Occam's razor, which asks modelers to keep a good balance between the goodness of fit and model complexity, leading to a relatively small subset of important predictors. bestridge is a toolkit for the best subset ridge regression (BSRR) problems. Under many sparse learning regimes,$L_0$regularization can outperform commonly used feature selection methods (e.g.,$L_1and MCP). We state the problem as \begin{align*} \min_\beta& -2 \log L(\beta) + \lambda\Vert\beta\Vert_2^2 \quad s.t. \|\beta\|_0 \leq s \quad&(BSRR)\ \end{align*} where\log L(\beta)$is the log-likelihood function in the GLM case or the log partial likelihood function In the Cox model.$\Vert \beta\Vert_0$denotes the$L_0$norm of$\beta$, i.e., the number of non-zeros in$\beta$. The vanilla$L_0$regularization gains the minimum-variance unbiased estimator on the active set. For restricted fits, the estimation bias is positive, and it is worthwhile if the decrease in variance exceeds the increase in bias. Here We would like to introduce the best subset ridge regression as an option of bias-variance tradeoff. This will add an$L_2$norm shrinkage to the coefficients, the strength of which is controlled by the parameter$\lambda$. The fitting is done over a grid of sparsity$s$and$\lambda$values to generate a regularization path. For each candidate model size and$\lambda$, the best subset ridge regression is solved by the$L_2$penalized primal-dual active set algorithm. This algorithm utilizes an active set updating strategy via primal and dual variables and fits the sub-model by exploiting the fact that their support sets are non-overlap and complementary. For the case of method = "sequential" if warms.start = "TRUE", we run this algorithm for a list of sequential combination of model sizes and$\lambda$and use the estimate from the last iteration as a warm start. For the case of method = "psequential" and method = "pgsection", the Powell conjugate direction method is implemented. This method finds the optimal parameters by a bi-directional search along each search vector, in turn. The bi-directional line search along each search vector can be done by sequential search path or golden section search, the line search method of which is specified by method = "psequential" or method = "pgsection". 2 Quick start This quick start guide walks you through the implementation of the best subset ridge regression on linear and logistic models. 2.1 Regression: linear model We apply the methods to the data set reported in @scheetz2006regulation, which contains gene expression levels of 18975 probes obtained from 120 rats. We are interested in finding probes that are related to that of gene TRIM32 using linear regression. This gene had been known to cause Bardet-Biedl syndrome, which is a genetic disease of multiple organ systems including the retina. For simplicity, the number of probes is reduced to 500 by picking out maximum 500 marginal correalation to Trim32 gene. library(bestridge) data("trim32", package = "bestridge")  Supposed we wants to fit a best subset ridge regression model with$\lambda$taking values in 100 grid points from 100 to 0.001 and the model size between$\max {p, n/\log(n)}$(the decimal portion is rounded) and 1, which is the default model size range where$p$denotes the number of total variables and$n$the sample size, via Generalized Information Criterion. We call the bsrr function as follows: y <- trim32[, 1] x <- as.matrix(trim32[, -1]) lm.bsrr <- bsrr(x, y)  The fitted coefficients can be extracted by running the coef function. coef(lm.bsrr, sparse = TRUE)  To make a prediction on the new data, a predict function can be used as follows: predict.bsrr <- predict(lm.bsrr, newx = x)  If the option newx is omitted, the fitted linear predictors are used. 2.2 Classification: logistic regression We now turn to the classification task. We use the data set duke [@west2001predicting]. This data set contains microarray experiment for 86 breast cancer patients. We are interested in classify the patients into estrogen receptor-positive (Status = 0) and estrogen receptor-negative (Status = 1) based on the expression level of the considered genes. data("duke") y <- duke$y
x <- as.matrix(duke[, -1])


Supposed we wants to do this classification based on logistic regression with a BSRR model. This can be realized through calling the bsrr function with the parameter family = "binomial as follows:

logi.bsrr <- bsrr(x, y, family = "binomial")


Calling the plot routine on an bsrr object fitted by type = "bsrr" will provide a heatmap of GIC of the fitted path.

plot(logi.bsrr)


3.1 Censored response: Cox proportional hazard model

The bestridge package also supports studying the relationship between predictors variables and survival time based on the Cox proportional hazard model. We now demonstrate the usage on a real data set, Alizadeh, Eisen, Davis, Ma, Lossos, Rosenwald, Boldrick, Sabet, Tran, Yu et al. (2000)[@alizadeh2000distinct]: gene-expression data in lymphoma patients. We illustrate the usage on a subset of the full data set. There were 50 patients with measurements on 1000 probes.

data(patient.data)

x <- patient.data$x y <- patient.data$time

Try the bestridge package in your browser

Any scripts or data that you put into this service are public.

bestridge documentation built on April 4, 2021, 9:06 a.m.