cvic: Cross-validation information criterion
In reams: Resampling-Based Adaptive Model Selection

Description Usage Arguments Details Value Author(s) References See Also Examples

A model selection criterion proposed by Reiss et al. (2012), which employs cross-validation to estimate the overoptimism associated with the best candidate model of each size.

1	cvic(y, X, nfold = length(y), pvec = 1:(ncol(X) + 1))

`y`	outcome vector
`X`	model matrix. This should not include an intercept column; such a column is added by the function.
`nfold`	number of "folds" (validation sets). The sample size must be divisible by this number.
`pvec`	vector of possible dimensions of the model to consider: by default, ranges from 1 (intercept only) to `ncol(X) + 1` (full model).

CVIC is similar to corrected AIC (Sugiura, 1978; Hurvich and Tsai, 1989), but instead of the nominal model dimension, it substitutes a measure of effective degrees of freedom (edf) that takes best-subset selection into account. The "raw" edf is obtained by cross-validation. Alternatively, one can refine the edf via constrained monotone smoothing, as described by Reiss et al. (2011).

A list with components

`nlogsig2hat`	value of the first (non-penalty) term of the criterion, i.e., sample size times log of MLE of the variance, for best model of each dimension in `pvec`.
`cv.pen`	cross-validation penalty, as described by Reiss et al. (2011).
`edf, edf.mon`	effective degrees of freedom, before and after constrained monotone smoothing.
`cvic`	CVIC based on the raw edf.
`cvic.mon`	CVIC based on edf to which constrained monotone smoothing has been applied.
`best, best.mon`	vectors of logicals indicating which columns of the model matrix are included in the CVIC-minimizing model, without and with constrained monotone smoothing.

Lei Huang huangracer@gmail.com and Philip Reiss phil.reiss@nyumc.org

Hurvich, C. M., and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297–307.

Reiss, P. T., Huang, L., Cavanaugh, J. E., and Roy, A. K. (2012). Resampling-based information criteria for adaptive linear model selection. Annals of the Institute of Statistical Mathematics, to appear. Available at http://works.bepress.com/phil_reiss/17

Sugiura, N. (1978). Further analysis of the data by Akaike's information criterion and the finite corrections. Communications in Statistics: Theory & Methods, 7, 13–26.

leaps in package leaps for best-subset selection; pcls in package mgcv for the constrained monotone smoothing.

# Predicting fertility from provincial socioeconomic indicators
data(swiss)
cvicobj <- cvic(swiss$Fertility, swiss[ , -1])
cvicobj$best
cvicobj$best.mon

Loading required package: leaps
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
    1     2     3     4     5 
 TRUE FALSE  TRUE  TRUE  TRUE 
   1    2    3    4    5 
TRUE TRUE TRUE TRUE TRUE