glm.regu: Regularied M-estimation for fitting generalized linear models...

Description Usage Arguments Details Value References Examples

View source: R/regu-est-c.r

Description

This function implements regularized M-estimation for fitting generalized linear models with continuous or binary responses for a fixed choice of tuning parameters.

Usage

1
2
3
4
glm.regu(y, x, iw = NULL, loss = "cal", init = NULL, rhos, test = NULL,
  offs = NULL, id = NULL, Wmat = NULL, Rmat = NULL, zzs = NULL,
  xxs = NULL, n.iter = 100, eps = 1e-06, bt.lim = 3, nz.lab = NULL,
  pos = 10000)

Arguments

y

An n x 1 response vector.

x

An n x p matix of covariates, excluding a constant.

iw

An n x 1 weight vector.

loss

A loss function, which can be specified as "gaus" for continuous responses, or "ml" or "cal" for binary respones.

init

A (p+1) x 1 vector of initial values (the intercept and coefficients).

rhos

A p x 1 vector of Lasso tuning parameters, usually a constant vector, associated with the p coefficients.

test

A vector giving the indices of observations between 1 and n which are included in the test set.

offs

An n x 1 vector of offset values, similarly as in glm.

id

An argument which can be used to speed up computation.

Wmat

An argument which can be used to speed up computation.

Rmat

An argument which can be used to speed up computation.

zzs

An argument which can be used to speed up computation.

xxs

An argument which can be used to speed up computation.

n.iter

The maximum number of iterations allowed. An iteration is defined by computing an quadratic approximation and solving a least-squares Lasso problem.

eps

The tolerance at which the difference in the objective (loss plus penalty) values is considered close enough to 0 to declare convergence.

bt.lim

The maximum number of backtracking steps allowed.

nz.lab

A p x 1 logical vector (useful for simulations), indicating which covariates are included when calculating the number of nonzero coefficients. If nz.lab=NULL, then nz.lab is reset to a vector of 0s.

pos

A value which can be used to facilitate recording the numbers of nonzero coefficients with or without the restriction by nz.lab. If nz.lab=NULL, then pos is reset to 1.

Details

For continuous responses, this function uses an active-set descent algorithm (Osborne et al. 2000; Yang and Tan 2018) to solve the least-squares Lasso problem. For binary responses, regularized calibrated estimation is implemented using the Fisher scoring descent algorithm in Tan (2020), whereas regularized maximum likelihood estimation is implemented in a similar manner based on quadratic approximation as in the R package glmnet.

Value

iter

The number of iterations performed up to n.iter.

conv

1 if convergence is obtained, 0 if exceeding the maximum number of iterations, or -1 if exceeding maximum number of backtracking steps.

nz

A value defined as (nz0 * pos + nz1) to record the numbers of nonzero coefficients without or with the restriction (denoted as nz0 and nz1) by nz.lab. If nz.lab=NULL, then nz1 is 0, pos is 1, and hence nz is nz0.

inter

The estimated intercept.

bet

The p x 1 vector of estimated coefficients, excluding the intercept.

fit

The vector of fitted values in the training set.

eta

The vector of linear predictors in the training set.

tau

The p x 1 vector of generalized signs, which should be -1 or 1 for a negative or positive estimate and between -1 and 1 for a zero estimate.

obj.train

The average loss in the training set.

pen

The Lasso penalty of the estimates.

obj

The average loss plus the Lasso penalty.

fit.test

The vector of fitted values in the test set.

eta.test

The vector of linear predictors in the test set.

obj.test

The average loss in the test set.

id

This can be re-used to speed up computation.

Wmat

This can be re-used to speed up computation.

Rmat

This can be re-used to speed up computation.

zzs

This can be re-used to speed up computation.

xxs

This can be re-used to speed up computation.

References

Osborne, M., Presnell, B., and Turlach, B. (2000) A new approach to variable selection in least squares problems, IMA Journal of Numerical Analysis, 20, 389-404.

Yang, T. and Tan, Z. (2018) Backfitting algorithms for total-variation and empirical-norm penalized additive modeling with high-dimensional data, Stat, 7, e198.

Tibshirani, R. (1996) Regression shrinkage and selection via the Lasso, Journal of the Royal Statistical Society, Ser. B, 58, 267-288.

Tan, Z. (2020) Regularized calibrated estimation of propensity scores with model misspecification and high-dimensional data, Biometrika, 107, 137<e2><80><93>158.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
data(simu.data)
n <- dim(simu.data)[1]
p <- dim(simu.data)[2]-2

y <- simu.data[,1]
tr <- simu.data[,2]
x <- simu.data[,2+1:p]
x <- scale(x)

### Example 1: linear regression
# rhos should be a vector of length p, even though a constant vector
out.rgaus <- glm.regu(y[tr==1], x[tr==1,], rhos=rep(.05,p), loss="gaus")

# the intercept
out.rgaus$inter

# the estimated coefficients and generalized signs; the first 10 are shown
cbind(out.rgaus$bet, out.rgaus$tau)[1:10,]

# the number of nonzero coefficients 
out.rgaus$nz

### Example 2: logistic regression using likelihood loss
out.rml <- glm.regu(tr, x, rhos=rep(.01,p), loss="ml")
out.rml$inter
cbind(out.rml$bet, out.rml$tau)[1:10,]
out.rml$nz

### Example 3: logistic regression using calibration loss
out.rcal <- glm.regu(tr, x, rhos=rep(.05,p), loss="cal")
out.rcal$inter
cbind(out.rcal$bet, out.rcal$tau)[1:10,]
out.rcal$nz

RCAL documentation built on Nov. 8, 2020, 4:22 p.m.