knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

sail is a package that fits a linear model with non-linear interactions via penalized maximum likelihood. The regularization path is computed at a grid of values for the regularization parameter $\lambda$ and a fixed value of the second regularization parameter $\alpha$. The method enforces the strong heredity property, i.e., an interaction is selected only if its corresponding main effects are also included. The interactions are limited to a single exposure variable, i.e., $y \sim e + x_1 + x_2 + ex_1 + ex_2 + \epsilon$. Furthermore, this package allows a user-defined basis expansion on the $x$ variables to allow for non-linear effects. The default is bsplines (e.g. splines::bs(x, 5)). It currently only fits linear models (binomial models are due in the next release).

Model

Let $Y=(Y_1, \ldots, Y_n) \in \mathbb{R}^n$ be a continuous outcome variable, \mbox{$X_E=(E_1, \ldots, E_n) \in \mathbb{R}^n$} a binary or continuous environment vector, \mbox{$\bX = (X_{1}, \ldots, X_{p}) \in \mathbb{R}^{n\times p}$} a matrix of predictors, and $\varepsilon = (\varepsilon_1, \ldots, \varepsilon_n) \in \mathbb{R}^n$ a vector of i.i.d random variables with mean 0. Furthermore let $f_j: \mathbb{R} \rightarrow \mathbb{R}$ be a smoothing method for variable $X_j$ by a projection on to a set of basis functions: \begin{equation} f_j(X_j) = \sum_{\ell = 1}^{m_j} \psi_{j\ell}(X_j) \beta_{j\ell} \label{eq:smooth} \end{equation} Here, the $\left\lbrace \psi_{j\ell} \right\rbrace_1^{m_j}$ are a family of basis functions in $X_j$~\citep{hastie2015statistical}. Let $\boldsymbol{\Psi}j$ be the $n \times m_j$ matrix of evaluations of the $\psi{j\ell}$ and \mbox{$\boldsymbol{\theta}j = (\beta{j1}, \ldots, \beta_{jm_j}) \in \mathbb{R}^{m_j}$} for $j = 1, \ldots, p$, i.e., $\boldsymbol{\theta}j$ is a $m_j$-dimensional column vector of basis coefficients for the $j$th main effect. In this article we consider an additive interaction regression model of the form \begin{align} Y & = \beta_0 \cdot \boldsymbol{1} + \sum{j=1}^p \boldsymbol{\Psi}j \boldsymbol{\theta}_j + \beta_E X_E + \sum{j=1}^p (X_E \circ \boldsymbol{\Psi}j) \boldsymbol{\alpha}{j} + \varepsilon \label{eq:linpred} \end{align} where $\beta_0$ is the intercept, $\beta_E$ is the coefficient for the environment variable, $\boldsymbol{\alpha}j = (\alpha{j1}, \ldots, \alpha_{jm_j})\in \mathbb{R}^{m_j}$ are the basis coefficients for the $j$th interaction term and $(X_E \circ \boldsymbol{\Psi}j)$ is the $n \times m_j$ matrix formed by the component-wise multiplication of the column vector $X_E$ by each column of $\boldsymbol{\Psi}_j$. To enforce the strong heredity property, we reparametrize the coefficients for the interaction terms in~\eqref{eq:linpred} as $\boldsymbol{\alpha}{j} = \gamma_{j} \beta_E \boldsymbol{\theta}j$: \begin{align} Y & = \beta_0 \cdot \boldsymbol{1} + \sum{j=1}^p \boldsymbol{\Psi}j \boldsymbol{\theta}_j + \beta_E X_E + \sum{j=1}^p \gamma_{j} \beta_E (X_E \circ \boldsymbol{\Psi}j) \boldsymbol{\theta}_j + \varepsilon \label{eq:linpred2} \end{align} For a continuous response, we use the squared-error loss: \begin{equation} \mathcal{L}(Y;\boldsymbol{\theta}) = \frac{1}{2n}\lVert Y - \beta_0 \cdot \boldsymbol{1} - \sum{j=1}^p \boldsymbol{\Psi}j \boldsymbol{\theta}_j - \beta_E X_E - \sum{j=1}^p \gamma_{j} \beta_E (X_E \circ \boldsymbol{\Psi}_j) \boldsymbol{\theta}_j \rVert_2^2 \end{equation} where $\boldsymbol{\theta} \equiv (\beta_0, \beta_E,\boldsymbol{\theta}_1, \ldots, \boldsymbol{\theta}_p, \gamma_1, \ldots, \gamma_p)$.

We consider the following penalized least squares criterion for this problem: \begin{equation} \arg\min_{\boldsymbol{\theta} } \mathcal{L}(Y;\boldsymbol{\theta}) + \lambda (1-\alpha) \left( w_E |\beta_E| + \sum_{j=1}^{p} w_j \lVert\boldsymbol{\theta}j \rVert_2 \right) + \lambda\alpha \sum{j=1}^{p} w_{jE} |\gamma_{j}| \label{eq:lassolikelihood3} \end{equation} where $\lambda >0$ and $\alpha \in (0,1)$ are tuning parameters and $w_E, w_j, w_{jE}$ are adaptive weights for $j=1, \ldots, p$. These weights serve as a way of allowing parameters to be penalized differently.

Installation

The package can be installed from GitHub via

install.packages("pacman")
pacman::p_load_gh('sahirbhatnagar/sail')

Quick Start

We give a quick overview of the main functions and go into details in other vignettes. We will use the simulated data which ships with the package and can be loaded via:

library(sail)
data("sailsim")
names(sailsim)

We first define a basis expansion. In this example we use cubic bsplines with degree 3.

library(splines)
f.basis <- function(x) splines::bs(x, degree = 3)

Next we fit the model using the most basic call to sail

fit <- sail(x = sailsim$x, y = sailsim$y, e = sailsim$e, basis = f.basis)

fit is an object of class sail that contains all the relevant information of the fitted model including the estimated coefficients at each value of $\lambda$ (by default the program chooses its own decreasing sequence of 100 $\lambda$ values). There are print, plot, coef and predict methods of objects of class sail. The print method outputs the following:

fit

When expand = TRUE (i.e. the user did not provide their own design matrix), the df_main and df_interaction columns correspond to the number of non-zero predictors present in the model before basis expansion. This does not correspond to the number of non-zero coefficients in the model, but rather the number of unique variables. In this example we expanded each column of $\mathbf{X}$ to five columns. If df_main=4, df_interaction=2 and df_environment=1, then the total number of non-zero coefficients would be $5 \times (4+2) + 1$.

The entire solution path can be plotted via the plot method for objects of class sail. The y-axis is the value of the coefficient and the x-axis is the $\log(\lambda)$. Each line represents a coefficient in the model, and each color represents a variable (i.e. in this example a given variable will have 5 lines when it is non-zero). The numbers at the top of the plot represent the number of non-zero variables in the model: top panel (df_main + df_environment), bottom panel (df_interaction). The black line is the coefficient path for the environment variable.

plot(fit)

The estimated coefficients at each value of lambda is given by (matrix partially printed here for brevity)

coef(fit)[1:6,50:55]

The predicted response at each value of lambda:

predict(fit)[1:5,50:55]

The predicted response at a specific value of lambda can be specified by the s argument:

predict(fit, s = 0.8)

You can specify more than one value for s:

predict(fit, s = c(0.8, 0.2))

You can also extract a list of active variables (i.e. variables with a non-zero estimated coefficient) for each value of lambda:

fit[["active"]]

Cross-Validation

cv.sail is the main function to do cross-validation along with plot, predict, and coef methods for objects of class cv.sail. We can also run it in parallel:

set.seed(432) # to reproduce results (randomness due to CV folds)
library(doParallel) 
registerDoParallel(cores = 2) 
cvfit <- cv.sail(x = sailsim$x, y = sailsim$y, e = sailsim$e, basis = f.basis,
                 dfmax = 10, # to speed up vignette build time
                 nfolds = 5, parallel = TRUE, nlambda = 50)

We plot the cross-validated error curve which has the mean-squared error on the y-axis and $\log(\lambda)$ on the x-axis. It includes the cross-validation curve (red dotted line), and upper and lower standard deviation curves along the $\lambda$ sequence (error bars). Two selected $\lambda$'s are indicated by the vertical dotted lines (see below). The numbers at the top of the plot represent the total number of non-zero variables at that value of $\lambda$ (df_main + df_environment + df_interaction):

plot(cvfit)

lambda.min is the value of $\lambda$ that gives minimum mean cross-validated error. The other $\lambda$ saved is lambda.1se, which gives the most regularized model such that error is within one standard error of the minimum. We can view the selected $\lambda$'s and the corresponding coefficients:

cvfit[["lambda.min"]]
cvfit[["lambda.1se"]]

The estimated coefficients at lambda.1se and lambda.min:

cbind(coef(cvfit, s="lambda.1se"), # lambda.1se is the default
coef(cvfit, s = "lambda.min"))

Estimated non-zero coefficients at lambda.1se:

predict(cvfit, type = "nonzero")

Visualizing the Effect of the Non-linear Terms

bsplines are difficult to interpret. We provide a plotting function to visualize the effect of the non-linear function on the response.

Main Effects

Since we are using simulated data, we also plot the true curve:

plotMain(cvfit$sail.fit, x = sailsim$x, xvar = "X3",
         legend.position = "topright",
         s = cvfit$lambda.min, f.truth = sailsim$f3)

Interaction Effects

Again, since we are using simulated data, we also plot the true interaction:

plotInter(cvfit$sail.fit, x = sailsim$x, xvar = "X4",
          f.truth = sailsim$f4.inter,
          s = cvfit$lambda.min,
          title_z = "Estimated")

Linear Interactions

The basis argument in the sail function is very flexible in that it allows you to apply any basis expansion to the columns of $\mathbf{X}$. Of course, there might be situations where you do not expect any non-linear main effects or interactions to be present in your data. You can still use the sail method to search for linear main effects and interactions. This can be accomplished by specifying an identity map:

f.identity <- function(i) i

We then pass this function to basis argument in cv.sail:

cvfit_linear <- cv.sail(x = sailsim$x, y = sailsim$y, e = sailsim$e,
                        basis = f.identity, nfolds = 5, parallel = TRUE,
                 dfmax = 10, # to speed up vignette build time
                        nlambda = 50)

Next we plot the cross-validated curve:

plot(cvfit_linear)

And extract the model at lambda.min:

coef(cvfit_linear, s = "lambda.min")

Applying a different penalty to each predictor

Recall that we consider the following penalized least squares criterion for this problem:

\begin{equation} \arg\min_{\boldsymbol{\theta} } \mathcal{L}(Y;\boldsymbol{\theta}) + \lambda (1-\alpha) \left( w_E |\beta_E| + \sum_{j=1}^{p} w_j \lVert\boldsymbol{\theta}j \rVert_2 \right) + \lambda\alpha \sum{j=1}^{p} w_{jE} |\gamma_{j}| \end{equation}

The weights $w_E, w_j, w_{jE}$ are by default set to 1 as specified by the penalty.factor argument. This argument allows users to apply separate penalty factors to each coefficient. In particular, any variable with penalty.factor equal to zero is not penalized at all. This feature can be applied mainly for two reasons:

  1. Prior knowledge about the importance of certain variables is known. Larger weights will penalize the variable more, while smaller weights will penalize the variable less
  2. Allows users to apply the Adaptive sail, similar to the Adaptive Lasso

In the following example, we want the environment variable to always be included so we set the first element of p.fac to zero. We also want to apply less of a penalty to the main effects for $X_2, X_3, X_4$:

# the weights correspond to E, X1, X2, X3, ... X_p, X1:E, X2:E, ... X_p:E
p.fac <- c(0, 1, 0.4, 0.6, 0.7, rep(1, 2*ncol(sailsim$x) - 4))
fit_pf <- sail(x = sailsim$x, y = sailsim$y, e = sailsim$e, basis = f.basis,
               nlambda = 50,
               dfmax = 10, # to speed up vignette build time
               penalty.factor = p.fac)
plot(fit_pf)

We see from the plot above that the black line (corresponding to the $E$ variable with penalty.factor equal to zero) is always included in the model.



sahirbhatnagar/funshim documentation built on July 18, 2021, 3:59 p.m.