intsel: Logistic regression with two-way interaction screening

View source: R/intsel.R

intselR Documentation

Logistic regression with two-way interaction screening

Description

Fit a logistic regression model including all the two-way interaction terms between the user-specified set of variables. The method uses an overlapping group lasso penalty that respects the commonly recognized selection rule, which is that, when the interaction term is selected into the model, both main effect terms should be in the model too. The regularization path is computed at a grid of values for the regularization parameter lambda.

Usage

intsel(
  x,
  y,
  weights,
  intercept = TRUE,
  p.screen,
  lambda,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

x

Predictor matrix with dimension n * p, where n is the number of subjects, and p is the number of predictors.

y

Binary outcome, a vector of length n.

weights

Optional, observation weights. Default is 1 for all observations.

intercept

Logical, indicating whether an intercept term should be included in the model. The intercept term will not be penalized. The default is TRUE.

p.screen

Number of variables of which all two-way interactions are screened. These variables should be placed in the p.screen left-most columns of matrix x.

lambda

Sequence of regularization coefficients \lambda's. Will be sorted in a decreasing order.

par_init

Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all p variables.

stepsize_init

Initial value of the stepsize of the optimization algorithm. Default is 1.0.

stepsize_shrink

Factor in (0,1) by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.

tol

Convergence criterion. Algorithm stops when the l_2 norm of the parameter update is smaller than tol. Default is 1e-5.

maxit

Maximum number of iterations allowed. Default is 100L.

verbose

Logical, whether progress is printed. Default is FALSE.

Value

A list.

lambdas

The user-specified regularization coefficients lambda sorted in decreasing order.

estimates

A matrix, with each column corresponding to the coefficient estimates at each \lambda in lambdas.

iterations

A vector of number of iterations it takes to converge at each \lambda in lambdas.

x.original

The input matrix x.

x

The predictor matrix with x plus p.screen * (p.screen - 1)/2 interaction terms.

y

The input y.

p.screen

The input p.screen.

intercept

The input intercept.

Examples

n <- 1000
p.int <- 5
p.noint <- 3
intercept <- TRUE
p.screen <- 5

p.int.expand <- p.int*(p.int-1)/2
p.main <- p.int + p.noint
x <- matrix(rnorm(n * p.main), nrow = n, ncol = p.main)

# true model
# logit(p) = 0.1 + 0.3 x1 + 0.3 x2 + 0.3 x8 + 0.2 * x1 * x2

beta.true <- rep(0, p.main)
beta.true[c(1, 2, p.main)] <- 0.3
eta <- x %*% beta.true + 0.2 * x[, 1] * x[, 2]

if (intercept) eta <- eta + 0.1

py <- 1/(1 + exp(-eta))

y <- rbinom(n, 1, py)

nlam <- 30
lambdas <- exp(seq(log(0.1), log(0.00005), length.out = nlam))

# All the pairwise two-way interactions for the first p.screen variables 
# are included in the model and screened in a data-driven manner.
fit <- intsel(x = x,
              y = y,
              p.screen = 5,
              intercept = intercept,
              lambda = lambdas)
fit$iterations
fit$estimates[, 1]

intsel documentation built on April 12, 2025, 1:33 a.m.