cv.glmTLP: Cross-validation for glmTLP
In ChongWu-Biostat/glmtlp: Truncated Lasso Regularized Generalized Linear Models

Description Usage Arguments Details Value Author(s) References Examples

View source: R/cvglmTLP.R

Does k-fold cross-validation for glmTLP, produces a plot, and returns a value for lambda with pre-specified tau.

cv.glmTLP(x, y, family=c("gaussian","binomial","poisson","multinomial","cox","mgaussian"),
nfolds = 10, weights, offset=NULL, lambda, tau = 0.3, 
nlambda=100, penalty.factor = rep(1, nvars), 
 lambda.min.ratio=ifelse(nobs<nvars,1e-3,1e-4),
standardize=TRUE,intercept=TRUE,dfmax=nvars+1,
pmax=min(dfmax*2+20,nvars), lower.limits=-Inf,upper.limits=Inf,
standardize.response=FALSE, maxIter=100, Tol=1e-4)

`x`	`x` matrix as in `glmnet`.
`y`	response variable. Quantitative for `family="gaussian"`, or `family="poisson"` (non-negative counts). For `family="binomial"` should be either a factor with two levels, or a two-column matrix of counts or proportions (the second column is treated as the target class; for a factor, the last level in alphabetical order is the target class). For `family="multinomial"`, can be a `nc>=2` level factor, or a matrix with `nc` columns of counts or proportions. For either `"binomial"` or `"multinomial"`, if `y` is presented as a vector, it will be coerced into a factor. For `family="cox"`, `y` should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating death, and '0' indicating right censored. The function `Surv()` in package survival produces such a matrix. For `family="mgaussian"`, `y` is a matrix of quantitative responses.
`family`	Response type (see above)
`nfolds`	number of folds - default is 10. Although `nfolds` can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets. Smallest value allowable is `nfolds=3`
`weights`	Observation weights; defaults to 1 per observation
`offset`	Offset vector (matrix) as in `glmnet`
`lambda`	Optional user-supplied lambda sequence; default is `NULL`, and `glmTLP` chooses its own sequence
`tau`	Tuning parameter.
`nlambda`	The number of `lambda` values - default is 100.
`penalty.factor`	Separate penalty factors can be applied to each coefficient. This is a number that multiplies `lambda` to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in `exclude`). Note: the penalty factors are internally rescaled to sum to nvars, and the lambda sequence will reflect this change.
`lambda.min.ratio`	Smallest value for `lambda`, as a fraction of `lambda.max`, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs > nvars`, the default is `0.0001`, close to zero. If `nobs < nvars`, the default is `0.01`. A very small value of `lambda.min.ratio` will lead to a saturated fit in the `nobs < nvars` case. This is undefined for `"binomial"` and `"multinomial"` models, and `glmnet` will exit gracefully when the percentage deviance explained is almost 1.
`standardize`	Logical flag for x variable standardization, prior to fitting the model sequence. The coefficients are always returned on the original scale. Default is `standardize=TRUE`. If variables are in the same units already, you might not wish to standardize. See details below for y standardization with `family="gaussian"`.
`intercept`	Should intercept(s) be fitted (default=TRUE) or set to zero (FALSE)
`dfmax`	Limit the maximum number of variables in the model. Useful for very large `nvars`, if a partial path is desired.
`pmax`	Limit the maximum number of variables ever to be nonzero
`lower.limits`	Vector of lower limits for each coefficient; default `-Inf`. Each of these must be non-positive. Can be presented as a single value (which will then be replicated), else a vector of length `nvars`
`upper.limits`	Vector of upper limits for each coefficient; default `Inf`. See `lower.limits`
`standardize.response`	This is for the `family="mgaussian"` family, and allows the user to standardize the response variables
`maxIter`	Maximum iteration for TLP.
`Tol`	Tolerance.

The function runs glmTLP nfolds+1 times; the first to get the lambda sequence, and then the remainder to compute the fit with each of the folds omitted. The error is accumulated, and the average error and standard deviation over the folds is computed. Note that cv.glmnet does NOT search for values for tau. A specific value should be supplied, else tau= 0.3 is assumed by default.

an object of class "cv.glmnet" is returned, which is a list with the ingredients of the cross-validation fit. Although the implementation is different, we try to mimic returning as "cv.glment" in a popular package glmnet such that users can use truncated lasso as using elastic net.

`lambda`	the values of `lambda` used in the fits.
`cvm`	The mean cross-validated error - a vector of length `length(lambda)`.
`cvsd`	estimate of standard error of `cvm`.
`cvup`	upper curve = `cvm+cvsd`.
`cvlo`	lower curve = `cvm-cvsd`.
`nzero`	number of non-zero coefficients at each `lambda`.
`name`	a text string indicating type of measure (for plotting purposes).
`glmnet.fit`	a fitted glmnet object for the full data.
`lambda.min`	value of `lambda` that gives minimum `cvm`.
`lambda.1se`	largest value of `lambda` such that error is within 1 standard error of the minimum.
`fit.preval`	if `keep=TRUE`, this is the array of prevalidated fits. Some entries can be `NA`, if that and subsequent values of `lambda` are not reached for that fold
`foldid`	if `keep=TRUE`, the fold assignments used

Chong Wu
Maintainer: Chong Wu <wuxx0845@umn.edu>

Xiaotong Shen , Wei Pan and Yunzhang Zhu (2012) Likelihood-Based Selection and Sharp Parameter Estimation, Journal of the American Statistical Association, 107:497, 223-232