intsel_cv: Cross-validation for logistic regression with two-way...

View source: R/intsel_cv.R

intsel_cvR Documentation

Cross-validation for logistic regression with two-way interaction screening

Description

Cross-validation function for intsel

Usage

intsel_cv(
  x,
  y,
  weights,
  intercept = TRUE,
  p.screen,
  lambda,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  nfolds = 10,
  foldid = NULL,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

x

Predictor matrix with dimension n * p, where n is the number of subjects, and p is the number of predictors.

y

Binary outcome, a vector of length n.

weights

Optional, observation weights. Default is 1 for all observations.

intercept

Logical, indicating whether an intercept term should be included in the model. The intercept term will not be penalized. The default is TRUE.

p.screen

Number of variables of which all two-way interactions are screened. These variables should be placed in the p.screen left-most columns of matrix x.

lambda

Sequence of regularization coefficients \lambda's. Will be sorted in a decreasing order.

par_init

Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all p variables.

stepsize_init

Initial value of the stepsize of the optimization algorithm. Default is 1.0.

stepsize_shrink

Factor in (0,1) by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.

nfolds

Optional, the folds of cross-validation. Default is 10.

foldid

Optional, user-specified vector indicating the cross-validation fold in which each observation should be included. Values in this vector should range from 1 to nfolds. If left unspecified, intsel will randomly assign observations to folds

tol

Convergence criterion. Algorithm stops when the l_2 norm of the parameter update is smaller than tol. Default is 1e-5.

maxit

Maximum number of iterations allowed. Default is 100L.

verbose

Logical, whether progress is printed. Default is FALSE.

Value

A list.

lambdas

A vector of lambda used for each cross-validation.

cvm

The cv error averaged across all folds for each lambda.

cvsd

The standard error of the cv error for each lambda.

cvup

The cv error plus its standard error for each lambda.

cvlo

The cv error minus its standard error for each lambda.

nzero

The number of non-zero coefficients at each lambda.

intsel.fit

A fitted model for the full data at all lambdas of class "intsel".

lambda.min

The lambda such that the cvm reach its minimum.

lambda.1se

The maximum of lambda such that the cvm is less than the minimum the cvup (the minmum of cvm plus its standard error).

foldid

The fold assignments used.

index

A one column matrix with the indices of lambda.min and lambda.1se

.

iterations

A vector of number of iterations it takes to converge at each \lambda in lambdas

.

x.original

The input matrix x.

x

The predictor matrix with x plus p.screen * (p.screen - 1)/2 interaction terms.

y

The input y.

p.screen

The input p.screen.

intercept

The input intercept.

Examples

n <- 1000
p.int <- 5
p.noint <- 3
intercept <- TRUE
p.screen <- 5

p.int.expand <- p.int*(p.int-1)/2
p.main <- p.int + p.noint
x <- matrix(rnorm(n * p.main), nrow = n, ncol = p.main)

# true model
# logit(p) = 0.1 + 0.3 x1 + 0.3 x2 + 0.3 x8 + 0.2 * x1 * x2

beta.true <- rep(0, p.main)
beta.true[c(1, 2, p.main)] <- 0.3
eta <- x %*% beta.true + 0.2 * x[, 1] * x[, 2]

if (intercept) eta <- eta + 0.1

py <- 1/(1 + exp(-eta))

y <- rbinom(n, 1, py)

nlam <- 30
lambdas <- exp(seq(log(0.1), log(0.00005), length.out = nlam))

# All the pairwise two-way interactions for the first p.screen variables 
# are included in the model and screened in a data-driven manner.
cv <- intsel_cv(x = x,
                y = y,
                p.screen =5,
                intercept = intercept,
                stepsize_init = 1,
                lambda = lambdas,
                nfolds = 5,
                foldid = NULL)
cv$index

intsel documentation built on April 12, 2025, 1:33 a.m.