cvic: Cross-validation information criterion

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

A model selection criterion proposed by Reiss et al. (2012), which employs cross-validation to estimate the overoptimism associated with the best candidate model of each size.

Usage

1
cvic(y, X, nfold = length(y), pvec = 1:(ncol(X) + 1))

Arguments

y

outcome vector

X

model matrix. This should not include an intercept column; such a column is added by the function.

nfold

number of "folds" (validation sets). The sample size must be divisible by this number.

pvec

vector of possible dimensions of the model to consider: by default, ranges from 1 (intercept only) to ncol(X) + 1 (full model).

Details

CVIC is similar to corrected AIC (Sugiura, 1978; Hurvich and Tsai, 1989), but instead of the nominal model dimension, it substitutes a measure of effective degrees of freedom (edf) that takes best-subset selection into account. The "raw" edf is obtained by cross-validation. Alternatively, one can refine the edf via constrained monotone smoothing, as described by Reiss et al. (2011).

Value

A list with components

nlogsig2hat

value of the first (non-penalty) term of the criterion, i.e., sample size times log of MLE of the variance, for best model of each dimension in pvec.

cv.pen

cross-validation penalty, as described by Reiss et al. (2011).

edf, edf.mon

effective degrees of freedom, before and after constrained monotone smoothing.

cvic

CVIC based on the raw edf.

cvic.mon

CVIC based on edf to which constrained monotone smoothing has been applied.

best, best.mon

vectors of logicals indicating which columns of the model matrix are included in the CVIC-minimizing model, without and with constrained monotone smoothing.

Author(s)

Lei Huang huangracer@gmail.com and Philip Reiss phil.reiss@nyumc.org

References

Hurvich, C. M., and Tsai, C.-L. (1989). Regression and time series model selection in small samples. Biometrika, 76, 297–307.

Reiss, P. T., Huang, L., Cavanaugh, J. E., and Roy, A. K. (2012). Resampling-based information criteria for adaptive linear model selection. Annals of the Institute of Statistical Mathematics, to appear. Available at http://works.bepress.com/phil_reiss/17

Sugiura, N. (1978). Further analysis of the data by Akaike's information criterion and the finite corrections. Communications in Statistics: Theory & Methods, 7, 13–26.

See Also

leaps in package leaps for best-subset selection; pcls in package mgcv for the constrained monotone smoothing.

Examples

1
2
3
4
5
# Predicting fertility from provincial socioeconomic indicators
data(swiss)
cvicobj <- cvic(swiss$Fertility, swiss[ , -1])
cvicobj$best
cvicobj$best.mon

Example output

Loading required package: leaps
Loading required package: mgcv
Loading required package: nlme
This is mgcv 1.8-33. For overview type 'help("mgcv-package")'.
    1     2     3     4     5 
 TRUE FALSE  TRUE  TRUE  TRUE 
   1    2    3    4    5 
TRUE TRUE TRUE TRUE TRUE 

reams documentation built on May 2, 2019, 2:23 p.m.