intsel_cv: Cross-validation for logistic regression with two-way...
In intsel: Interaction Selection in Logistic Regression

View source: R/intsel_cv.R

intsel_cv

R Documentation

Cross-validation for logistic regression with two-way interaction screening

Description

Cross-validation function for intsel

Usage

intsel_cv(
  x,
  y,
  weights,
  intercept = TRUE,
  p.screen,
  lambda,
  par_init,
  stepsize_init = 1,
  stepsize_shrink = 0.8,
  nfolds = 10,
  foldid = NULL,
  tol = 1e-05,
  maxit = 1000L,
  verbose = FALSE
)

Arguments

`x`	Predictor matrix with dimension `n * p`, where `n` is the number of subjects, and `p` is the number of predictors.
`y`	Binary outcome, a vector of length `n`.
`weights`	Optional, observation weights. Default is 1 for all observations.
`intercept`	Logical, indicating whether an intercept term should be included in the model. The intercept term will not be penalized. The default is `TRUE`.
`p.screen`	Number of variables of which all two-way interactions are screened. These variables should be placed in the `p.screen` left-most columns of matrix `x`.
`lambda`	Sequence of regularization coefficients `\lambda`'s. Will be sorted in a decreasing order.
`par_init`	Optional, vector of initial values of the optimization algorithm. Default initial value is zero for all `p` variables.
`stepsize_init`	Initial value of the stepsize of the optimization algorithm. Default is 1.0.
`stepsize_shrink`	Factor in `(0,1)` by which the stepsize shrinks in the backtracking linesearch. Default is 0.8.
`nfolds`	Optional, the folds of cross-validation. Default is 10.
`foldid`	Optional, user-specified vector indicating the cross-validation fold in which each observation should be included. Values in this vector should range from 1 to `nfolds`. If left unspecified, `intsel` will randomly assign observations to folds
`tol`	Convergence criterion. Algorithm stops when the `l_2` norm of the parameter update is smaller than `tol`. Default is `1e-5`.
`maxit`	Maximum number of iterations allowed. Default is `100L`.
`verbose`	Logical, whether progress is printed. Default is `FALSE`.

Value

A list.

`lambdas`	A vector of lambda used for each cross-validation.
`cvm`	The cv error averaged across all folds for each lambda.
`cvsd`	The standard error of the cv error for each lambda.
`cvup`	The cv error plus its standard error for each lambda.
`cvlo`	The cv error minus its standard error for each lambda.
`nzero`	The number of non-zero coefficients at each lambda.
`intsel.fit`	A fitted model for the full data at all lambdas of class "`intsel`".
`lambda.min`	The lambda such that the `cvm` reach its minimum.
`lambda.1se`	The maximum of lambda such that the `cvm` is less than the minimum the `cvup` (the minmum of `cvm` plus its standard error).
`foldid`	The fold assignments used.
`index`	A one column matrix with the indices of `lambda.min` and `lambda.1se`

.

iterations

A vector of number of iterations it takes to converge at each \lambda in lambdas

.

`x.original`	The input matrix `x`.
`x`	The predictor matrix with `x` plus `p.screen` * (`p.screen` - 1)/2 interaction terms.
`y`	The input `y`.
`p.screen`	The input `p.screen`.
`intercept`	The input `intercept`.

Examples

n <- 1000
p.int <- 5
p.noint <- 3
intercept <- TRUE
p.screen <- 5

p.int.expand <- p.int*(p.int-1)/2
p.main <- p.int + p.noint
x <- matrix(rnorm(n * p.main), nrow = n, ncol = p.main)

# true model
# logit(p) = 0.1 + 0.3 x1 + 0.3 x2 + 0.3 x8 + 0.2 * x1 * x2

beta.true <- rep(0, p.main)
beta.true[c(1, 2, p.main)] <- 0.3
eta <- x %*% beta.true + 0.2 * x[, 1] * x[, 2]

if (intercept) eta <- eta + 0.1

py <- 1/(1 + exp(-eta))

y <- rbinom(n, 1, py)

nlam <- 30
lambdas <- exp(seq(log(0.1), log(0.00005), length.out = nlam))

# All the pairwise two-way interactions for the first p.screen variables 
# are included in the model and screened in a data-driven manner.
cv <- intsel_cv(x = x,
                y = y,
                p.screen =5,
                intercept = intercept,
                stepsize_init = 1,
                lambda = lambdas,
                nfolds = 5,
                foldid = NULL)
cv$index

intsel documentation built on April 12, 2025, 1:33 a.m.