gooogle: A group regularized fit to the zero inflated count data.
In himelmallick/Gooogle: Group Regularization for Zero-inflated Count Models

Description Usage Arguments Details Value Examples

View source: R/Gooogle.R

Fit zero inflated count data with a group regularization algorithm.

gooogle(data,xvars,zvars,yvar,group=1:ncol(data),samegrp.overlap=T,penalty=c("grLasso", "grMCP", "grSCAD", "gBridge"),dist=c("poisson","negbin"), nlambda=100, lambda,lambda.min=ifelse((nrow(data[,unique(c(xvars,zvars))])>ncol(data[,unique(c(xvars,zvars))])),1e-4,.05),lambda.max, crit="BIC",alpha=1, eps=.001, max.iter=1000, gmax=length(unique(group)),gamma=ifelse(penalty=="gBridge",0.5,ifelse(penalty == "grSCAD", 4, 3)), warn=TRUE)

`data`	The data frame or matrix consisting of outcome and predictors.
`xvars`	The vector of variable names to be included in count model.
`zvars`	The vector of variable names for excess zero model.
`yvar`	The outcome variable name.
`group`	The vector of integers describing the grouping of the coefficients. For greatest efficiency and least ambiguity, it is best if group is a vector of consecutive integers. If there are coefficientss to be included in the model without being penalized, assign them to group 0 (or "0").
`samegrp.overlap`	A logical argument. If TRUE (default) same grouping indices will be assigned to shared predictors in the count and degenerate distribution.
`penalty`	The penalty to be applied in the model. For group level selection, one of "grLasso", "grMCP" or "grSCAD". For bi-level selection "gBridge" can be specified.
`dist`	The distribution for count model - "poisson" for poisson or "negbin" for negative binomial.
`nlambda`	The number of lambda values. Default is 100.
`lambda`	A user specified sequence of lambda values.
`lambda.min`	The smallest value for lambda, as a fraction of lambda.max. Default is .0001 if the number of observations is larger than the number of covariates and .05 otherwise.
`lambda.max`	The maximum value for lambda (only needed for gBridge penalty).
`crit`	The selection criteria for the best model. It can either be "AIC" or `BIC` (default).
`alpha`	The tuning parameter for the balance between the group penalty and the L2 penalty, as in grpreg. Default value is 1.
`eps`	The convergence threshhold, as in grpreg.
`max.iter`	Maximum number of iterations allowed.
`gmax`	Maximum number of non-zero groups allowed.
`gamma`	Tuning parameter of group MCP/SCAD. Default is 3 for MCP and 4 for SCAD.
`warn`	A logical argument indicating whether this function gives warning in case of convergence issue.

The algorithm fits zero inflated count data to conduct variable selection in the presence of intrinsic grouping structure in the predictor set. Group wise penalties are considered for both count and zero abundance part of the mixture model where the likelihood is optimized using group level or bi-level co-ordinate descent algorithms.

A list containing the following components is returned

`coefficients`	A list with two sets of coefficients corresponding to count and zero inflation parts of the mixture model.
`aic`	The AIC of the selected model.
`bic`	The BIC of the selected model.
`loglik`	The log-likelihood of the selected model.

## Not run: 
## Auto Insurance Claim Data
library(HDtweedie)
data("auto")
y<-auto$y
y<-round(y)
x<-auto$x
data<-cbind.data.frame(y,x)
group=c(rep(1,5),rep(2,7),rep(3,4),rep(4:14,each=3),15:21)
yvar<-names(data)[1]
xvars<-names(data)[-1]
zvars<-xvars

## ZIP regression
fit.poisson<-gooogle(data=data,yvar=yvar,xvars=xvars,zvars=zvars,group=group,samegrp.overlap=T,dist="poisson",penalty="gBridge")
fit.poisson$aic

## ZINB regression
fit.negbin<-gooogle(data=data,yvar=yvar,xvars=xvars,zvars=zvars,group=group,samegrp.overlap=T,dist="negbin",penalty="gBridge")
fit.negbin$aic

## End(Not run)