glm.regu: Regularied M-estimation for fitting generalized linear models... In RCAL: Regularized Calibrated Estimation

Description

This function implements regularized M-estimation for fitting generalized linear models with continuous or binary responses for a fixed choice of tuning parameters.

Usage

 1 2 3 4 glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL, offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL, xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL, pos = 10000)

Arguments

 y An n x 1 response vector. x An n x p matix of covariates, excluding a constant. iw An n x 1 weight vector. loss A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones. init A (p+1) x 1 vector of initial values (the intercept and coefficients). rhos A p x 1 vector of Lasso tuning parameters, usually a constant vector, associated with the p coefficients. test A vector giving the indices of observations between 1 and n which are included in the test set. offs An n x 1 vector of offset values, similarly as in glm. id An argument which can be used to speed up computation. Wmat An argument which can be used to speed up computation. Rmat An argument which can be used to speed up computation. zzs An argument which can be used to speed up computation. xxs An argument which can be used to speed up computation. n.iter The maximum number of iterations allowed. An iteration is defined by computing an quadratic approximation and solving a least-squares Lasso problem. eps The tolerance at which the difference in the objective (loss plus penalty) values is considered close enough to 0 to declare convergence. bt.lim The maximum number of backtracking steps allowed. nz.lab A p x 1 logical vector (useful for simulations), indicating which covariates are included when calculating the number of nonzero coefficients. If nz.lab=NULL, then nz.lab is reset to a vector of 0s. pos A value which can be used to facilitate recording the numbers of nonzero coefficients with or without the restriction by nz.lab. If nz.lab=NULL, then pos is reset to 1.

Details

For continuous responses, this function uses an active-set descent algorithm (Osborne et al. 2000; Yang and Tan 2018) to solve the least-squares Lasso problem. For binary responses, regularized calibrated estimation is implemented using the Fisher scoring descent algorithm in Tan (2020), whereas regularized maximum likelihood estimation is implemented in a similar manner based on quadratic approximation as in the R package glmnet.

Value

 iter The number of iterations performed up to n.iter. conv 1 if convergence is obtained, 0 if exceeding the maximum number of iterations, or -1 if exceeding maximum number of backtracking steps. nz A value defined as (nz0 * pos + nz1) to record the numbers of nonzero coefficients without or with the restriction (denoted as nz0 and nz1) by nz.lab. If nz.lab=NULL, then nz1 is 0, pos is 1, and hence nz is nz0. inter The estimated intercept. bet The p x 1 vector of estimated coefficients, excluding the intercept. fit The vector of fitted values in the training set. eta The vector of linear predictors in the training set. tau The p x 1 vector of generalized signs, which should be -1 or 1 for a negative or positive estimate and between -1 and 1 for a zero estimate. obj.train The average loss in the training set. pen The Lasso penalty of the estimates. obj The average loss plus the Lasso penalty. fit.test The vector of fitted values in the test set. eta.test The vector of linear predictors in the test set. obj.test The average loss in the test set. id This can be re-used to speed up computation. Wmat This can be re-used to speed up computation. Rmat This can be re-used to speed up computation. zzs This can be re-used to speed up computation. xxs This can be re-used to speed up computation.

References

Osborne, M., Presnell, B., and Turlach, B. (2000) A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.

Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198.

Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58, 267-288.

Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137<e2><80><93>158.

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 data(simu.data) n <- dim(simu.data) p <- dim(simu.data)-2 y <- ,1] tr <- ,2] x <- ,2+1:p] x <- scale(x) ### Example 1: linear regression # rhos should be a vector of length p, even though a constant vector out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus") # the intercept out.rgaus\$inter # the estimated coefficients and generalized signs; the first 10 are shown cbind(out.rgaus\$bet, out.rgaus\$tau)[1:10,] # the number of nonzero coefficients out.rgaus\$nz ### Example 2: logistic regression using likelihood loss out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml") out.rml\$inter cbind(out.rml\$bet, out.rml\$tau)[1:10,] out.rml\$nz ### Example 3: logistic regression using calibration loss out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal") out.rcal\$inter cbind(out.rcal\$bet, out.rcal\$tau)[1:10,] out.rcal\$nz

RCAL documentation built on Nov. 8, 2020, 4:22 p.m.