cv.zipath: Cross-validation for zipath
In mpath: Regularized Linear Models

cv.zipath

R Documentation

Cross-validation for zipath

Description

Does k-fold cross-validation for zipath, produces a plot, and returns cross-validated log-likelihood values for lambda

Usage

## S3 method for class 'formula'
cv.zipath(formula, data, weights, offset=NULL, contrasts=NULL, ...)
## S3 method for class 'matrix'
cv.zipath(X, Z, Y, weights, offsetx=NULL, offsetz=NULL, ...)
## Default S3 method:
cv.zipath(X, ...)
## S3 method for class 'cv.zipath'
predict(object, newdata, ...)
## S3 method for class 'cv.zipath'
coef(object, which=object$lambda.which, model = c("full", "count", "zero"), ...)

Arguments

`formula`	symbolic description of the model with an optional numeric vector `offset` with an a priori known component to be included in the linear predictor of the count model or zero model. Offset must be a variable in `data` if used, while this is optional in `zipath`. See an example below.
`data`	arguments controlling formula processing via `model.frame`.
`weights`	Observation weights; defaults to 1 per observation
`offset`	optional numeric vector with an a priori known component to be included in the linear predictor of the count model or zero model. See below for an example.
`X`	predictor matrix of the count model
`Z`	predictor matrix of the zero model
`Y`	response variable
`offsetx`, `offsetz`	optional numeric vector with an a priori known component to be included in the linear predictor of the count model (offsetx)or zero model (offsetz).
`contrasts`	a list with elements `"count"` and `"zero"` containing the contrasts corresponding to `levels` from the respective models
`object`	object of class `cv.zipath`.
`newdata`	optionally, a data frame in which to look for variables with which to predict. If omitted, the original observations are used.
`which`	Indices of the pair of penalty parameters `lambda.count` and `lambda.zero` at which estimates are extracted. By default, the one which generates the optimal cross-validation value.
`model`	character specifying for which component of the model the estimated coefficients should be extracted.
`...`	Other arguments that can be passed to `zipath`.

Details

The function runs zipath nfolds+1 times; the first to compute the (lambda.count, lambda.zero) sequence, and then to compute the fit with each of the folds omitted. The model is fitted to the training data and then given the fitted model the log-likelihood is evaluated at the observations left out, i.e., the test data. The average value of log-likelihood and standard deviation over the folds is computed. Note that cv.zipath can be used to search for values for count.alpha or zero.alpha: it is required to call cv.zipath with a fixed vector foldid for different values of count.alpha or zero.alpha.

The methods for coef and predict were deprecated since version 0.3-25. In fact, the fit object was removed in the output of cv.zipath so that predict an object of cv.zipath is not feasible, and should be via zipath. See examples below. The reason for such a change is that cv.zipath can take both formula and matrix, hence predict on cv. zipath object can easily lead to problems in codes.

When family="negbin", it can be slow because there is a repeated search for the theta values by default. One may change the default values from init.theta=NULL, theta.fixed=FALSE to init.theta=MLE, theta.fixed=TRUE, where MLE is a number from glm.nb in the R package MASS or something desired.

Value

an object of class "cv.zipath" is returned, which is a list with the components of the cross-validation fit.

`fit`	a fitted zipath object for the full data.
`residmat`	matrix for cross-validated log-likelihood at each `(count.lambda, zero.lambda)` sequence
`bic`	matrix of BIC values with row values for `lambda` and column values for `k`th cross-validation
`cv`	The mean cross-validated log-likelihood - a vector of length `length(count.lambda)`.
`cv.error`	estimate of standard error of `cv`.
`foldid`	an optional vector of values between 1 and `nfold` identifying what fold each observation is in.
`lambda.which`	index of `(count.lambda, zero.lambda)` that gives maximum `cv`.
`lambda.optim`	value of `(count.lambda, zero.lambda)` that gives maximum `cv`.

Author(s)

Zhu Wang <zwang145@uthsc.edu>

References

Zhu Wang, Shuangge Ma, Michael Zappitelli, Chirag Parikh, Ching-Yun Wang and Prasad Devarajan (2014) Penalized Count Data Regression with Application to Hospital Stay after Pediatric Cardiac Surgery, Statistical Methods in Medical Research. 2014 Apr 17. [Epub ahead of print]

Zhu Wang, Shuangge Ma, Ching-Yun Wang, Michael Zappitelli, Prasad Devarajan and Chirag R. Parikh (2014) EM for Regularized Zero Inflated Regression Models with Applications to Postoperative Morbidity after Cardiac Surgery in Children, Statistics in Medicine. 33(29):5192-208.

Zhu Wang, Shuangge Ma and Ching-Yun Wang (2015) Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany, Biometrical Journal. 57(5):867-84.

Examples

## Not run: 
data("bioChemists", package = "pscl")
fm_zip <- zipath(art ~ . | ., data = bioChemists, family = "poisson", nlambda=10)
fm_cvzip <- cv.zipath(art ~ . | ., data = bioChemists, family = "poisson", nlambda=10)
### prediction from the best model
pred <- predict(fm_zip, newdata=bioChemists, which=fm_cvzip$lambda.which)
coef(fm_zip, which=fm_cvzip$lambda.which)
fm_znb <- zipath(art ~ . | ., data = bioChemists, family = "negbin", nlambda=10)
fm_cvznb <- cv.zipath(art ~ . | ., data = bioChemists, family = "negbin", nlambda=10)
pred <- predict(fm_znb, which=fm_cvznb$lambda.which)
coef(fm_znb, which=fm_cvznb$lambda.which)
fm_zinb2 <- zipath(art ~ . +offset(log(phd))| ., data = bioChemists, 
		      family = "poisson", nlambda=10)
fm_cvzinb2 <- cv.zipath(art ~ . +offset(log(phd))| ., data = bioChemists, 
		      family = "poisson", nlambda=10)
pred <- predict(fm_zinb2, which=fm_cvzinb2$lambda.which)
coef(fm_zinb2, which=fm_cvzinb2$lambda.which)

## End(Not run)

mpath documentation built on June 28, 2024, 1:06 a.m.