sail: Fit Sparse Additive Interaction Model with Strong Heredity
In sahirbhatnagar/sail: Sparse Additive Interaction Learning

Description Usage Arguments Details Value Author(s) References See Also Examples

Function to fit the Sparse Additive Interaction Model with strong heredity for a sequence of tuning parameters. This is a penalized regression method that ensures the interaction term is non-zero only if its corresponding main-effects are non-zero. This model only considers the interactions between a single exposure (E) variable and a high-dimensional matrix (X). Additive (non-linear) main effects and interactions can be specified by the user. This can also be seen as a varying-coefficient model.

sail(x, y, e, basis = function(i) splines::bs(i, df = 5),
  strong = TRUE, group.penalty = c("gglasso", "grMCP", "grSCAD"),
  family = c("gaussian", "binomial"), center.x = TRUE,
  center.e = TRUE, expand = TRUE, group, weights,
  penalty.factor = rep(1, 1 + 2 * nvars), lambda.factor = ifelse(nobs <
  (1 + 2 * bscols * nvars), 0.01, 1e-04), lambda = NULL, alpha = 0.5,
  nlambda = 100, thresh = 1e-04, fdev = 1e-05, maxit = 1000,
  dfmax = 2 * nvars + 1, verbose = 0)

`x`	input matrix of dimension `n x p`, where `n` is the number of subjects and p is number of X variables. Each row is an observation vector. Can be a high-dimensional (n < p) matrix. Can be a user defined design matrix of main effects only (without intercept) if `expand=FALSE`
`y`	response variable. For `family="gaussian"` should be a 1 column matrix or numeric vector. For `family="binomial"`, should be a 1 column matrix or numeric vector with -1 for failure and 1 for success.
`e`	exposure or environment vector. Must be a numeric vector. Factors must be converted to numeric.
`basis`	user defined basis expansion function. This function will be applied to every column in `x`. Specify `function(i) i` if no expansion is desired. Default: `function(i) splines::bs(i, df = 5)`.
`strong`	Flag specifying strong hierarchy (TRUE) or weak hierarchy (FALSE). Default FALSE.
`group.penalty`	group lasso penalty. Can be one of `"gglasso"` (group lasso), `"grMCP"` (group MCP) or `"grSCAD"` (group SCAD). See references for details. Default: `"gglasso"`.
`family`	response type. See `y` for details. Currently only `family = "gaussian"` is implemented. Default: `"gaussian"`.
`center.x`	should the columns of `x` (after basis expansion) be centered (logical). Default: `TRUE`.
`center.e`	should exposure variable `e` be centered. Default: `TRUE`.
`expand`	should `basis` be applied to every column of `x` (logical). Set to `FALSE` if you want a user defined main effects design matrix. If `FALSE` the `group` membership argument must also be supplied. Default: `TRUE`.
`group`	a vector of consecutive integers, starting from 1, describing the grouping of the coefficients. Only required when `expand=FALSE`.
`weights`	observation weights. Default is 1 for each observation. Currently NOT IMPLEMENTED.
`penalty.factor`	separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. Must be of length `1 + 2*ncol(x)` where the first entry corresponds to the penalty.factor for `e`, the next `ncol(x)` entries correspond to main effects, and the following `ncol(x)` entries correspond to the interactions.
`lambda.factor`	the factor for getting the minimal lambda in the lambda sequence, where `min(lambda) = lambda.factor * max(lambda)`. `max(lambda)` is the smallest value of lambda for which all coefficients are zero. The default depends on the relationship between `N` (the number of rows in the matrix of predictors) and `q` (the total number of predictors in the design matrix - including interactions). If `N > q`, the default is `1e-4`, close to zero. If `N < p`, the default is `0.01`. A very small value of lambda.factor will lead to a saturated fit.
`lambda`	a user supplied lambda sequence. Typically, by leaving this option unspecified users can have the program compute its own lambda sequence based on `nlambda` and `lambda.factor`. Supplying a value of lambda overrides this. It is better to supply a decreasing sequence of lambda values than a single (small) value, if not, the program will sort user-defined lambda sequence in decreasing order automatically. Default: `NULL`.
`alpha`	the mixing tuning parameter, with 0<α<1. It controls the penalization strength between the main effects and the interactions. The penalty is defined as λ(1-α)(w_e\|β_e\|+ ∑ w_j \|\|β_j\|\|_2) + λα(∑ w_{je} \|γ_j\|) Larger values of `alpha` will favor selection of main effects over interactions. Smaller values of `alpha` will allow more interactions to enter the final model. Default: `0.5`
`nlambda`	the number of lambda values. Default: 100
`thresh`	convergence threshold for coordinate descent. Each coordinate-descent loop continues until the change in the objective function after all coefficient updates is less than `thresh`. Default: `1e-04`.
`fdev`	minimum fractional change in deviance for stopping path. Default: `1e-5`.
`maxit`	maximum number of outer-loop iterations allowed at fixed lambda value. If models do not converge, consider increasing `maxit`. Default: 1000.
`dfmax`	limit the maximum number of variables in the model. Useful for very large `q` (the total number of predictors in the design matrix - including interactions), if a partial path is desired. Default: `2 * p + 1` where p is the number of columns in `x`.
`verbose`	display progress. Can be either 0,1 or 2. 0 will not display any progress, 2 will display very detailed progress and 1 is somewhere in between. Default: 1.

The objective function for family="gaussian" is

RSS/2n + λ(1-α)(w_e|β_e|+ ∑ w_j ||β_j||_2) + λα(∑ w_{je} |γ_j|)

where RSS is the residual sum of squares and n is the number of observations. See Bhatnagar et al. (2018+) for details.

It is highly recommended to specify center.x = TRUE and center.e = TRUE for both convergence and interpretation reasons. If centered, the final estimates can be interpreted as the effect of the predictor on the response while holding all other predictors at their mean value. For computing speed reasons, if models are not converging or running slow, consider increasing thresh, decreasing nlambda, or increasing lambda.factor before increasing maxit. Then try increasing the value of alpha (which translates to more penalization on the interactions).

By default, sail uses the group lasso penalty on the basis expansions of x. To use the group MCP and group SCAD penalties (see Breheny and Huang 2015), the grpreg package must be installed.

an object with S3 class "sail", "*", where "*" is "lspath" or "logitreg". Results are provided for converged values of lambda only.

call: the call that produced this object
a0: intercept sequence of length nlambda
beta: a (# main effects after basis expansion x nlambda) matrix of main effects coefficients, stored in sparse column format ("dgCMatrix")
alpha: a (# interaction effects after basis expansion x nlambda) matrix of interaction effects coefficients, stored in sparse column format ("dgCMatrix")
gamma: A p x nlambda matrix of (γ) coefficients, stored in sparse column format ("dgCMatrix")
bE: exposure effect estimates of length nlambda
active: list of length nlambda containing character vector of selected variables
lambda: the actual sequence of lambda values used
lambda2: value for the mixing tuning parameter 0<α<1
dfbeta: the number of nonzero main effect coefficients for each value of lambda
dfalpha: the number of nonzero interaction coefficients for each value of lambda
dfenviron: the number of nonzero exposure (e) coefficients for each value of lambda
dev.ratio: the fraction of (null) deviance explained (for "lspath", this is the R-square). The deviance calculations incorporate weights if present in the model. The deviance is defined to be 2*(loglike_sat - loglike), where loglike_sat is the log-likelihood for the saturated model (a model with a free parameter per observation). Hence dev.ratio=1-dev/nulldev.
converged: vector of logicals of length nlambda indicating if the algorithm converged
nlambda: number of converged lambdas
design: design matrix (X, E, X:E), of dimension n x (2*ncols*p+1) if expand=TRUE. This is used in the predict method.
nobs: number of observations
nvars: number of main effect variables
vnames: character of variable names for main effects (without expansion)
ncols: an integer of basis for each column of x if expand=TRUE, or an integer vector of basis for each variable if expand=FALSE
center.x: were the columns of x (after expansion) centered?
center.e: was e centered?
basis: user defined basis expansion function
expand: was the basis function applied to each column of x?
group: a vector of consecutive integers describing the grouping of the coefficients. Only if expand=FALSE
interaction.names: character vector of names of interaction variables
main.effect.names: character vector of names of main effect variables (with expansion)

Sahir Bhatnagar

Maintainer: Sahir Bhatnagar sahir.bhatnagar@gmail.com

Jerome Friedman, Trevor Hastie, Robert Tibshirani (2010). Regularization Paths for Generalized Linear Models via Coordinate Descent. Journal of Statistical Software, 33(1), 1-22. http://www.jstatsoft.org/v33/i01/.

Breheny P and Huang J (2015). Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Statistics and Computing, 25: 173-187.

Yang Y, Zou H (2015). A fast unified algorithm for solving group-lasso penalize learning problems. Statistics and Computing. Nov 1;25(6):1129-41. http://www.math.mcgill.ca/yyang/resources/papers/STCO_gglasso.pdf

Bhatnagar SR, Yang Y, Greenwood CMT. Sparse additive interaction models with the strong heredity property (2018+). Preprint.

bs cv.sail

f.basis <- function(i) splines::bs(i, degree = 3)
# we specify dfmax to early stop the solution path to
# limit the execution time of the example
fit <- sail(x = sailsim$x, y = sailsim$y, e = sailsim$e,
            basis = f.basis, nlambda = 10, dfmax = 10,
            maxit = 100)

# estimated coefficients at each value of lambda
coef(fit)

# predicted response at each value of lambda
predict(fit)

#predicted response at a specific value of lambda
predict(fit, s = 0.5)
# plot solution path for main effects and interactions
plot(fit)
# plot solution path only for main effects
plot(fit, type = "main")
# plot solution path only for interactions
plot(fit, type = "interaction")