intsel: Logistic regression with two-way interaction screening
In intsel: Interaction Selection in Logistic Regression

View source: R/intsel.R

intsel

R Documentation

Logistic regression with two-way interaction screening

Description

Fit a logistic regression model including all the two-way interaction terms between the user-specified set of variables. The method uses an overlapping group lasso penalty that respects the commonly recognized selection rule, which is that, when the interaction term is selected into the model, both main effect terms should be in the model too. The regularization path is computed at a grid of values for the regularization parameter lambda.

Usage

intsel(
  x,
  y,
  weights,
  intercept = TRUE,
  p.screen,
  lambda,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

`x`	Predictor matrix with dimension `n * p`, where `n` is the number of subjects, and `p` is the number of predictors.
`y`	Binary outcome, a vector of length `n`.
`weights`	Optional, observation weights. Default is 1 for all observations.
`intercept`	Logical, indicating whether an intercept term should be included in the model. The intercept term will not be penalized. The default is `TRUE`.
`p.screen`	Number of variables of which all two-way interactions are screened. These variables should be placed in the `p.screen` left-most columns of matrix `x`.
`lambda`	Sequence of regularization coefficients `\lambda`'s. Will be sorted in a decreasing order.
`par_init`	Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all `p` variables.
`stepsize_init`	Initial value of the stepsize of the optimization algorithm. Default is 1.0.
`stepsize_shrink`	Factor in `(0,1)` by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.
`tol`	Convergence criterion. Algorithm stops when the `l_2` norm of the parameter update is smaller than `tol`. Default is `1e-5`.
`maxit`	Maximum number of iterations allowed. Default is `100L`.
`verbose`	Logical, whether progress is printed. Default is `FALSE`.

Value

A list.

`lambdas`	The user-specified regularization coefficients `lambda` sorted in decreasing order.
`estimates`	A matrix, with each column corresponding to the coefficient estimates at each `\lambda` in `lambdas`.
`iterations`	A vector of number of iterations it takes to converge at each `\lambda` in `lambdas`.
`x.original`	The input matrix `x`.
`x`	The predictor matrix with `x` plus `p.screen` * (`p.screen` - 1)/2 interaction terms.
`y`	The input `y`.
`p.screen`	The input `p.screen`.
`intercept`	The input `intercept`.

Examples

n <- 1000
p.int <- 5
p.noint <- 3
intercept <- TRUE
p.screen <- 5

p.int.expand <- p.int*(p.int-1)/2
p.main <- p.int + p.noint
x <- matrix(rnorm(n * p.main), nrow = n, ncol = p.main)

# true model
# logit(p) = 0.1 + 0.3 x1 + 0.3 x2 + 0.3 x8 + 0.2 * x1 * x2

beta.true <- rep(0, p.main)
beta.true[c(1, 2, p.main)] <- 0.3
eta <- x %*% beta.true + 0.2 * x[, 1] * x[, 2]

if (intercept) eta <- eta + 0.1

py <- 1/(1 + exp(-eta))

y <- rbinom(n, 1, py)

nlam <- 30
lambdas <- exp(seq(log(0.1), log(0.00005), length.out = nlam))

# All the pairwise two-way interactions for the first p.screen variables 
# are included in the model and screened in a data-driven manner.
fit <- intsel(x = x,
              y = y,
              p.screen = 5,
              intercept = intercept,
              lambda = lambdas)
fit$iterations
fit$estimates[, 1]

intsel documentation built on April 12, 2025, 1:33 a.m.