# glm.regu: Regularied M-estimation for fitting generalized linear models... In RCAL: Regularized Calibrated Estimation

## Description

This function implements regularized M-estimation for fitting generalized linear models with continuous or binary responses for a fixed choice of tuning parameters.

## Usage

 ```1 2 3 4``` ```glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL, offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL, xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL, pos = 10000) ```

## Arguments

 `y` An n x 1 response vector. `x` An n x p matix of covariates, excluding a constant. `iw` An n x 1 weight vector. `loss` A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones. `init` A (p+1) x 1 vector of initial values (the intercept and coefficients). `rhos` A p x 1 vector of Lasso tuning parameters, usually a constant vector, associated with the p coefficients. `test` A vector giving the indices of observations between 1 and n which are included in the test set. `offs` An n x 1 vector of offset values, similarly as in `glm`. `id` An argument which can be used to speed up computation. `Wmat` An argument which can be used to speed up computation. `Rmat` An argument which can be used to speed up computation. `zzs` An argument which can be used to speed up computation. `xxs` An argument which can be used to speed up computation. `n.iter` The maximum number of iterations allowed. An iteration is defined by computing an quadratic approximation and solving a least-squares Lasso problem. `eps` The tolerance at which the difference in the objective (loss plus penalty) values is considered close enough to 0 to declare convergence. `bt.lim` The maximum number of backtracking steps allowed. `nz.lab` A p x 1 logical vector (useful for simulations), indicating which covariates are included when calculating the number of nonzero coefficients. If `nz.lab=NULL`, then `nz.lab` is reset to a vector of 0s. `pos` A value which can be used to facilitate recording the numbers of nonzero coefficients with or without the restriction by `nz.lab`. If `nz.lab=NULL`, then `pos` is reset to 1.

## Details

For continuous responses, this function uses an active-set descent algorithm (Osborne et al. 2000; Yang and Tan 2018) to solve the least-squares Lasso problem. For binary responses, regularized calibrated estimation is implemented using the Fisher scoring descent algorithm in Tan (2020), whereas regularized maximum likelihood estimation is implemented in a similar manner based on quadratic approximation as in the R package glmnet.

## Value

 `iter` The number of iterations performed up to `n.iter`. `conv` 1 if convergence is obtained, 0 if exceeding the maximum number of iterations, or -1 if exceeding maximum number of backtracking steps. `nz` A value defined as (nz0 * `pos` + nz1) to record the numbers of nonzero coefficients without or with the restriction (denoted as nz0 and nz1) by `nz.lab`. If `nz.lab=NULL`, then nz1 is 0, `pos` is 1, and hence `nz` is nz0. `inter` The estimated intercept. `bet` The p x 1 vector of estimated coefficients, excluding the intercept. `fit` The vector of fitted values in the training set. `eta` The vector of linear predictors in the training set. `tau` The p x 1 vector of generalized signs, which should be -1 or 1 for a negative or positive estimate and between -1 and 1 for a zero estimate. `obj.train` The average loss in the training set. `pen` The Lasso penalty of the estimates. `obj` The average loss plus the Lasso penalty. `fit.test` The vector of fitted values in the test set. `eta.test` The vector of linear predictors in the test set. `obj.test` The average loss in the test set. `id` This can be re-used to speed up computation. `Wmat` This can be re-used to speed up computation. `Rmat` This can be re-used to speed up computation. `zzs` This can be re-used to speed up computation. `xxs` This can be re-used to speed up computation.

## References

Osborne, M., Presnell, B., and Turlach, B. (2000) A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.

Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198.

Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58, 267-288.

Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137<e2><80><93>158.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33``` ```data(simu.data) n <- dim(simu.data)[1] p <- dim(simu.data)[2]-2 y <- simu.data[,1] tr <- simu.data[,2] x <- simu.data[,2+1:p] x <- scale(x) ### Example 1: linear regression # rhos should be a vector of length p, even though a constant vector out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus") # the intercept out.rgaus\$inter # the estimated coefficients and generalized signs; the first 10 are shown cbind(out.rgaus\$bet, out.rgaus\$tau)[1:10,] # the number of nonzero coefficients out.rgaus\$nz ### Example 2: logistic regression using likelihood loss out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml") out.rml\$inter cbind(out.rml\$bet, out.rml\$tau)[1:10,] out.rml\$nz ### Example 3: logistic regression using calibration loss out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal") out.rcal\$inter cbind(out.rcal\$bet, out.rcal\$tau)[1:10,] out.rcal\$nz ```

RCAL documentation built on Nov. 8, 2020, 4:22 p.m.