# <tt>penalizedclr</tt>: an R package for penalized conditional logistic regression In penalizedclr: Integrative Penalized Conditional Logistic Regression



### Penalized conditional logistic regression: model fit

The function penalized.clr fits a penalized conditional logistic regression model with different penalties for different blocks of covariates. The L1 penalty parameter lambda (a numeric vector of the length equal to the number of blocks) can be specified by the user or computed internally. In the latter case, cross-validation is performed internally for each data layer separately. It is also possible to apply elastic net penalty through parameter alpha.

#### Case 1: Penalty parameters provided by the user

This code illustrates how to fit penalized.clr with penalty parameters specified by the user. Here Y is the response vector, X is the multi-source matrix of covariates, stratum is the vector of ids of the matching pairs, and p is the vector of block dimensions. It has the same dimension as the vector of penalty parameters lambda. Penalty parameters are thus specified for each data source, i.e., a block of covariates. It is possible to standardize the variables by setting standardize = TRUE (FALSE by default).

fit1 <- penalized.clr(response = Y, penalized = X, stratum = stratum,
lambda = c(6,7), p = p, standardize = TRUE)


The function returns a list with elements:

1. penalized - Regression coefficients for the penalized covariates.

2. unpenalized - Regression coefficients for the unpenalized covariates.

3. converged - Whether the fitting process was judged to be converged.

4. lambda - The tuning parameter for L1 used.

5. alpha - The elastic net mixing parameter used.

For instance, fit1$penalized is a numeric vector containing the regression coefficients for the penalized covariates. We can compare estimated coefficients with the true coefficients (the ones that generated our data), by constructing a 2 x 2 contingency table cross-tabulating true non-zero and estimated non-zero coefficients: nonzero_index <- (beta != 0) * 1 #index of nonzero coefficients table(fitted = (fit1$penalized != 0) * 1, nonzero_index)


#### Case 2: Penalty parameters computed internally

When the user does not specify the penalty parameters, they are computed internally for each data source separately based on cross-validation.

fit2 <- penalized.clr(response = Y, penalized = X, stratum = stratum,
p = p,
standardize = TRUE)


The selected penalty coefficients are:



#### Case 4: Unpenalized covariates

Somethimes, a subset of covariates should be excluded from penalization. This can be achieved by specifying the unpenalized argument. In what follows, we penalize the first block of covariates, and leave the remaining block unpenalized.

X1 <- X[, 1:p[1]]
X2 <- X[, (p[1]+1):(p[1]+p[2])]
fit4 <- penalized.clr(response = Y, penalized = X1, unpenalized = X2,
stratum = stratum, p = p[1],
standardize = TRUE)


This can be particularly useful when performing variable selection on omics variables (penalized) while adjusting for clinical covariates that should not be penalized.

#### Case 5: Elastic net penalty

The package penalizedclr is not limited to L1, i.e. lasso, penalty. While L1 penalty is more suited for variable selection, in the presence of highly correlated covariates, it can be useful to add small amount of L2 penalty. The two can be combined in an elastic net framework by specifying the mixing parameter alpha that can assume values between 0 and 1. Default alpha=1 gives lasso.

fit2 <- penalized.clr(response = Y, penalized = X, stratum = stratum,
p = p,
standardize = TRUE, alpha = 0.6)


### Penalized conditional logistic regression: stability selection

The function stable.clr performs stability selection (Meinshausen and Bühlmann 2010 and Shah and Samworth 2013) for variable selection in penalized conditional logistic regression. For details on stability selection, we refer to the original publications. Briefly, a set of L1 penalties is considered, and for each considered value of the penalty, $2B$ random subsamples of $\lfloor K/2 \rfloor$ matched pairs are taken and a penalized model is estimated. For each covariate and penalty value, selection probability is estimated as the proportion of estimated models in which the associated coefficient estimate is different from zero. Finally, the overall selection probability of a variable is obtained by taking the maximum selection probability over all considered penalty values.

Parameter $B$ is set to 100 by default, but can be changed by the user. Note that this choice will have an impact on the computation time, and higher values of $B$ will lead to a slower computation.

The function returns a list containing the selection probabilities for each covariate, i.e. the proportion of estimated models in which the associated coefficient estimate is different from zero. The user can then set a threshold for selection probability (values ranging from 0.55 to 0.9 are recommended) and obtain the set of selected covariates.

#### Case 1: Penalty parameters specified by the user

The following code performs stability selection when all covariates are considered as a single block (a single data source) and a sequence of L1 penalties contains only 2 values.

stable1 <- stable.clr(response = Y,
penalized = X, stratum = stratum,
lambda.seq = c(10,20))


The function returns a list with two elements:

1. Pilambda a numeric vector giving estimates of selection probabilities for each penalized covariate.

2. lambda.seq a sequence of L1 penalties used.

To inspect the results, we can, for instance, print the covariates with selection probability higher than 0.6:



#### Case 3: Penalty parameters computed internally

If we do not specify the penalty parameters as in case 2, stable.clr will compute them by default by using find.default.lambda with default options. We only need to run:

stable3 <- stable.clr(response = Y,
penalized = X,
stratum = stratum)


The selected sequence of lambda is:

stable3$lambda.seq  Covariates with selection probability higher than 0.6: which(stable3$P>0.6)


#### Case 4: Covariates divided into blocks

This code implements stability selection while taking into account the block structure of covariates ($p$). To achieve this, we use the function stable.clr.g which extends stable.clr to allow for blocks (groups) of covariates. In this case, the function takes as an argument lambda.list, a list of sequences of L1 penalties, one for each block. Note that the computation time depends on the length of the block-specific sequences: for each combination of penalties, $2B$ subsamples are taken and used to estimate a penalized conditional logistic regression model.

lambda.list <- find.default.lambda(response = Y,
penalized = X, stratum = stratum,
p = p)

lambda.list

stable.g1 <- stable.clr.g(response = Y,
penalized = X,
stratum = stratum,
p = p,
lambda.list = lambda.list)

which(stable.g1$P>0.6)  Note that if$p\$ is not specified, the package will automatically run stable.clr and apply equal penalties for all covariates.

# References

1. Boulesteix, A. L., De Bin, R., Jiang, X., & Fuchs, M. (2017). IPF-LASSO: Integrative-penalized regression with penalty factors for prediction based on multi-omics data. Computational and mathematical methods in medicine, 2017.

2. Meinshausen, N., & Bühlmann, P. (2010). Stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(4), 417-473.

3. Shah, R. D., & Samworth, R. J. (2013). Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 75(1), 55-80.

## Try the penalizedclr package in your browser

Any scripts or data that you put into this service are public.

penalizedclr documentation built on April 22, 2021, 9:07 a.m.