stable.clr.g: Stability selection based on penalized conditional logistic...

View source: R/stable.clr.g.R

stable.clr.gR Documentation

Stability selection based on penalized conditional logistic regression

Description

Performs stability selection for conditional logistic regression models with L1 and L2 penalty allowing for different penalties for different blocks (groups) of covariates (different data sources).

Usage

stable.clr.g(
  response,
  stratum,
  penalized,
  unpenalized = NULL,
  p = NULL,
  lambda.list,
  alpha = 1,
  B = 100,
  parallel = TRUE,
  standardize = TRUE,
  event
)

Arguments

response

The response variable, either a 0/1 vector or a factor with two levels.

stratum

A numeric vector with stratum membership of each observation.

penalized

A matrix of penalized covariates.

unpenalized

A matrix of additional unpenalized covariates.

p

The sizes of blocks of covariates, a numerical vector of the length equal to the number of blocks, and with the sum equal to the number of penalized covariates. If missing, all covariates are treated the same and a single penalty is applied.

lambda.list

A list of vectors of penalties to be applied to different blocks of covariates. Each vector should have the length equal to the number of blocks.

alpha

The elastic net mixing parameter, a number between 0 and 1. alpha=0 would give pure ridge; alpha=1 gives lasso. Pure ridge penalty is never obtained in this implementation since alpha must be positive.

B

A single positive number for the number of subsamples.

parallel

Logical. Should the computation be parallelized?

standardize

Should the covariates be standardized, a logical value.

event

If response is a factor, the level that should be considered a success in the logistic regression.

Details

This function implements stability selection (Meinshausen and Bühlmann, 2010) in a conditional logistic regression. The implementation is based on the modification of Shah and Samworth (2013) featuring complementary subsamples. Note that this means that the number of subsamples will be ⁠2B⁠ instead of B. Subsampling procedure is repeated ⁠2B⁠ times for each vector of per-block penalties resulting each time in a vector of selection frequencies (frequency of non-zero coefficient estimate of each covariate). The final selection probability Pistab is obtained by taking the maximum over all considered vectors of penalties.

Value

A list containing a numeric vector Pistab, giving selection probabilities for all penalized covariates, lambda.list and p provided as input arguments.

References

  1. Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.

  2. Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80.

Examples

set.seed(123)

# simulate covariates (pure noise in two blocks of 20 and 80 variables)
X <- cbind(matrix(rnorm(4000, 0, 1), ncol = 20), matrix(rnorm(16000, 2, 0.6), ncol = 80))
p <- c(20,80)

# stratum membership
stratum <- sort(rep(1:100, 2))

# the response
Y <- rep(c(1, 0), 100)

# list of L1 penalties

lambda.list = list(c(0.5,1), c(2,0.9))

# perform stability selection

stable.g1 <- stable.clr.g(response = Y, penalized = X, stratum = stratum,
                         p = p, lambda.list = lambda.list)



penalizedclr documentation built on July 26, 2023, 5:18 p.m.