# Fit zero-inflated count data linear model with lasso (or elastic net), snet or mnet regularization

### Description

Fit zero-inflated regression models for count data via penalized maximum likelihood.

### Usage

1 2 3 4 5 6 7 8 9 10 11 | ```
zipath(formula, data, weights, subset, na.action, offset,
standardize = TRUE, family = c("poisson", "negbin","geometric"),
link = c("logit", "probit", "cloglog", "cauchit", "log"),
penalty = c("enet", "mnet", "snet"), start = NULL, model = TRUE,
y = TRUE, x = FALSE, nlambda = 100, lambda.count = NULL,lambda.zero = NULL,
penalty.factor.count=NULL, penalty.factor.zero=NULL,
lambda.count.min.ratio = .0001, lambda.zero.min.ratio = .1,
alpha.count = 1, alpha.zero = alpha.count, gamma.count = 3,
gamma.zero = gamma.count, rescale=FALSE, init.theta, theta.fixed=FALSE,
EM = TRUE, maxit.em=200, convtype=c("count", "both"), maxit = 1000,
maxit.theta = 1, reltol = 1e-5, eps.bino=1e-5, shortlist=FALSE, trace = FALSE, ...)
``` |

### Arguments

`formula` |
symbolic description of the model, see details. |

`weights` |
optional numeric vector of weights. |

`data, subset, na.action` |
arguments controlling formula processing
via |

`offset` |
optional numeric vector with an a priori known component to be included in the linear predictor of the count model. See below for more information on offsets. |

`standardize` |
Logical flag for x variable standardization, prior to
fitting the model sequence. The coefficients are always returned on
the original scale. Default is |

`family` |
character specification of count model family (a log link is always used). |

`link` |
character specification of link function in the binary zero-inflation model (a binomial family is always used). |

`model, y, x` |
logicals. If |

`penalty` |
penalty considered as one of |

`start` |
starting values for the parameters in the linear predictor. |

`nlambda` |
number of |

`lambda.count` |
A user supplied |

`lambda.zero` |
A user supplied |

`penalty.factor.count, penalty.factor.zero` |
These are numeric vectors with the same length as predictor variables. that multiply |

`lambda.count.min.ratio, lambda.zero.min.ratio` |
Smallest value for |

`alpha.count` |
The elastic net mixing parameter for the count part of model. |

`alpha.zero` |
The elastic net mixing parameter for the zero part of model. |

`gamma.count` |
The tuning parameter of the |

`gamma.zero` |
The tuning parameter of the |

`rescale` |
logical value, if TRUE, adaptive rescaling |

`init.theta` |
The initial value of |

`theta.fixed` |
Logical value only used for |

`EM` |
Using |

`convtype` |
convergency type, default is for count component only for speedy computation |

`maxit.em` |
Maximum number of EM algorithm |

`maxit` |
Maximum number of coordinate descent algorithm |

`maxit.theta` |
Maximum number of iterations for estimating |

`eps.bino` |
a lower bound of probabilities to be claimed as zero, for computing weights and related values when |

`reltol` |
Convergence criteria, default value 1e-5 may be reduced to make more accurate yet slow |

`shortlist` |
logical value, if TRUE, limited results return |

`trace` |
If |

`...` |
Other arguments which can be passed to from |

### Details

The algorithm fits penalized zero-inflated count data regression models using the coordinate descent algorithm within the EM algorithm.
The returned fitted model object is of class `"zipath"`

and is similar
to fitted `"glm"`

and `"zeroinfl"`

objects. For elements such as `"coefficients"`

a list is returned with elements for the zero and count component,
respectively. For details see below.

A set of standard extractor functions for fitted model objects is available for
objects of class `"zipath"`

, including methods to the generic functions
`print`

, `coef`

,
`logLik`

, `residuals`

,
`predict`

. See `predict.zipath`

for more details
on all methods.

The program may terminate with the following message:

`Error in: while (j <= maxit.em && !converged)`

`{ :`

` Missing value, where TRUE/FALSE is necessary`

`Calls: zipath`

`Additionally: Warning:`

`In glmreg_fit(Znew, probi, weights = weights, standardize = standardize, :`

` saturated model, exiting ...`

`Execution halted`

One possible reason is that the fitted model is too complex for the data. There are two suggestions to overcome the error. One is to reduce the number of variables. Second, find out what lambda values caused the problem and omit them. Try with other lambda values instead.

### Value

An object of class `"zipath"`

, i.e., a list with components including

`coefficients` |
a list with elements |

`residuals` |
a vector of raw residuals (observed - fitted), |

`fitted.values` |
a vector of fitted means, |

`weights` |
the case weights used, |

`terms` |
a list with elements |

`theta` |
estimate of the additional |

`loglik` |
log-likelihood of the fitted model, |

`family` |
character string describing the count distribution used, |

`link` |
character string describing the link of the zero-inflation model, |

`linkinv` |
the inverse link function corresponding to |

`converged` |
logical value, TRUE indicating successful convergence of |

`call` |
the original function call |

`formula` |
the original formula |

`levels` |
levels of the categorical regressors |

`contrasts` |
a list with elements |

`model` |
the full model frame (if |

`y` |
the response count vector (if |

`x` |
a list with elements |

### Author(s)

Zhu Wang <zwang@connecticutchildrens.org>

### References

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) *Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery*, *Statistical Methods in Medical Research*. 2014 Apr 17. [Epub ahead of print]

Zhu Wang, Shuangge Ma, Ching-Yun Wang, Michael Zappitelli, Prasad Devarajan and Chirag R. Parikh (2014)
*EM for Regularized Zero Inflated Regression Models with Applications to Postoperative Morbidity after Cardiac Surgery in Children*, *Statistics in Medicine*. 33(29):5192-208.

Zhu Wang, Shuangge Ma and Ching-Yun Wang (2015) *Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany*, *Biometrical Journal*. 57(5):867-84.

### See Also

`glm`

,
`glmreg`

, `glmregNB`

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ```
## Not run:
## data
data("bioChemists", package = "pscl")
## without inflation
## ("art ~ ." is "art ~ fem + mar + kid5 + phd + ment")
fm_pois <- glmreg(art ~ ., data = bioChemists, family = "poisson")
coef(fm_pois)
fm_nb <- glmregNB(art ~ ., data = bioChemists)
coef(fm_nb)
## with simple inflation (no regressors for zero component)
fm_zip <- zipath(art ~ . | 1, data = bioChemists, nlambda=10)
summary(fm_zip)
fm_zinb <- zipath(art ~ . | 1, data = bioChemists, family = "negbin", nlambda=10)
summary(fm_zinb)
## inflation with regressors
## ("art ~ . | ." is "art ~ fem + mar + kid5 + phd + ment | fem + mar + kid5 + phd + ment")
fm_zip2 <- zipath(art ~ . | ., data = bioChemists, nlambda=10)
summary(fm_zip2)
fm_zinb2 <- zipath(art ~ . | ., data = bioChemists, family = "negbin", nlambda=10)
summary(fm_zinb2)
### non-penalized regression, compare with zeroinfl
fm_zinb3 <- zipath(art ~ . | ., data = bioChemists, family = "negbin",
lambda.count=0, lambda.zero=0, reltol=1e-12)
summary(fm_zinb3)
fm_zinb4 <- zerofinfl(art ~ . | ., data = bioChemists, dist = "negbin")
summary(fm_zinb4)
## End(Not run)
``` |