In grpreg: Regularization Paths for Regression Models with Grouped Covariates

library(grpreg)
knitr::opts_knit$set(aliases=c(h = 'fig.height', w = 'fig.width'))
knitr::opts_chunk$set(comment="#", collapse=TRUE, cache=FALSE, tidy=FALSE)
knitr::knit_hooks$set(small.mar = function(before, options, envir) {
  if (before) par(mar = c(4, 4, .1, .1))
})

grpreg fits models that fall into the penalized likelihood framework. Rather than estimating $\bb$ by maximizing the likelihood, in this framework we estimate $\bb$ by minimizing the objective function $$ Q(\bb|\X, \y) = \frac{1}{n}L(\bb|\X,\y) + P_\lam(\bb), $$ where the loss function $L(\bb|\X,\y)$ is the negative log-likelihood ($2$ times this quantity is known as the deviance), $P_\lam(\bb)$ is the penalty, and $\lam$ is a regularization parameter that controls the tradeoff between the two components. This article describes the different loss models available in grpreg; see penalties for more information on the different penalties available.

Gaussian (linear regression)

In linear regression, the loss function is simply the squared error loss: $$ L(\bb|\X,\y) = \frac{1}{2} \norm{\y-\X\bb}_2^2; $$ this is proportional to the negative log-likelihood for a model where $y$ follows a Gaussian distribution with constant variance and mean equal to $\X\bb$.

To fit a penalized linear regression model with grpreg:

fit <- grpreg(X, y, group)

Binomial (logistic regression)

In logistic regression, the loss function is: $$ L(\bb|\X,\y) = -\sum_{i:y_i=1}\log\ph_i - \sum_{i:y_i=0}\log(1-\ph_i); $$ this is the negative log-likelihood for a binomial distribution with probabilities $P(Y_i=1)=\ph_i$ given by: $$ \ph_i = \frac{\exp(\eta_i)}{1+\eta_i}, $$ where $\be = \X\bb$ denotes the linear predictors.

To fit a penalized logistic regression model with grpreg:

fit <- grpreg(X, y, group, family='binomial')

Poisson

In Poisson regression, the loss function is: $$ L(\bb|\X,\y) = 2\sum_i \left{y_i\log y_i - y_i\log \mu_i + mu_i - y_i\right}; $$ note that some of these terms are constant with respect to $\mu_i$ and can therefore be ignored during optimization. Twice this loss is the deviance for a Poisson distribution $Y_i \sim \text{Pois}(\mh_i)$ with rate parameter given by: $$ \mh_i = \exp(\eta_i). $$

To fit a penalized Poisson regression model with grpreg:

fit <- grpreg(X, y, group, family='poisson')

Cox proportional hazards

The above models all fall into the category of distributions known as exponential families (hence the family) argument. grpreg also allows users to fit Cox proportional hazards models, although these models fall outside this framework and are therefore fit using a different function, grpsurv. In Cox regression, the negative log of the partial likelihood is $$ L(\bb|\X,\y) = -2\sum_{j=1}^{m} d_j \eta_j + 2\sum_{j=1}^{m} d_j \log\left{\sum_{i \in R_j} \exp(\eta_i)\right}, $$ where $t_1 < t_2 < \ldots < t_m$ denotes an increasing list of unique failure times indexed by $j$ and $R_j$ denotes the set of observations still at risk at time $t_j$, known as the risk set.

The Lung data (see ?Lung for more details) provides an example of time-to-event data that can be used with Cox regression. Loading this data set into R,

data(Lung)
X <- Lung$X
y <- Lung$y
group <- Lung$group

To fit a penalized Cox regression model,

fit <- grpsurv(X, y, group)

As before, you can call plot, coef, predict, etc. on fit:

coef(fit, lambda=0.1)

Cross-validation is similar:

set.seed(1)

cvfit <- cv.grpsurv(X, y, group)
par(mfrow=c(1,2))
plot(cvfit, type='cve')
plot(cvfit, type='rsq')

In addition to the quantities like coefficients and number of nonzero coefficients that predict returns for other types of models, predict() for an grpsurv object can also estimate the baseline hazard (using the Kalbfleish-Prentice method) and therefore, the survival function. A method to plot the resulting function is also available:

S <- predict(fit, X[1,], type='survival', lambda=0.02)
S(365)   # Estiamted survival at 1 year
plot(S, xlim=c(0,200))

When multiple subjects are involved in the prediction:

S <- predict(fit, X, type='survival', lambda=0.02)
S[[1]](365)  # Estimated survival at 1 year for subject 1
S[[2]](365)  # Estimated survival at 1 year for subject 2
plot(S, xlim=c(0,200))