cv.bh: Cross-Validation for Bayesian Models or Elastic Net
In nyiuab/BhGLM: Bayesian hierarchical GLMs and survival models, with applications to Genomics and Epidemiology

cv.bh

R Documentation

Cross-Validation for Bayesian Models or Elastic Net

Description

The function cv.bh performs K-fold cross-validation and calculates cross-validated predictive measures for Bayesian hierarchical GLMs and Cox survival model, or for elastic net from the package glmnet.

Usage

cv.bh(object, nfolds = 10, foldid = NULL, ncv = 1, verbose = TRUE)

Arguments

`object`	a fitted object.
`nfolds`	number of folds(groups) into which the data should be split to estimate the cross-validation prediction error. default is 10. Although nfolds can be as large as the sample size (leave-one-out CV), it is not recommended for large datasets.
`foldid`	an optional vector (if ncv = 1) or matrix (if ncv > 1) of values between 1 and nfolds identifying what fold each observation is in. If supplied, nfolds can be missing.If `foldid = NULL`, `nfolds` subsets will be generated randomly.
`ncv`	repeated number of cross-validation.
`verbose`	logical. If `TRUE`, print out computational time and progress.

Details

The data is divided randomly into nfolds subsets with equal (or approximately equal) numbers of indivudals. For each subset, the model is fit to data omitting that subset, and then predict the omitted responses and calculate various prediction errors in that subset. Since the folds are selected at random, the cross-validation results are random. Users can reduce this randomness by running cross-validation several times, and averaging the predictive values. Cross-validation is repeated ncv times.

Value

The returned values include:

`y.obs`	The observed responses.
`lp`	linear predictors of all observations.
`foldid`	a vector (if ncv = 1) or matrix (if ncv > 1) indicating folds.
`measures`	various predictive values.

For GLMs and polr, also include:

y.fitted

the fitted mean values, obtained by transforming the linear predictors by the inverse of the link function.

For all GLMs and polr, measures includes:

`deviance`	estimate of deviance.
`mse`	estimate of mean squared error.

For binomial and polr models, measures also includes:

`auc`	area under ROC curve.
`misclassification`	estimate of misclassification error.

For Cox models, measures includes:

`deviance`	deviance using cross-validated prognostic index.
`Cindex`	concordance index.

Author(s)

Nengjun Yi, nyi@uab.edu

References

Steyerberg, E. W., 2009 Clinical Prediction Models: A Practical Approch to Development, Validation, and Updates. Springer, New York.

van Houwelinggen, H.G. & Putter, H. Dynamic Prediction in Clinical Survival Analysis, (CRC Press, 2012).

Examples

library(BhGLM)
library(survival)
library(glmnet)

N = 1000
K = 30
x = sim.x(n=N, m=K, corr=0.6) # simulate correlated continuous variables  
h = rep(0.1, 4) # assign four non-zero main effects to have the assumed heritabilty 
nz = as.integer(seq(5, K, by=K/length(h))); nz
yy = sim.y(x=x[, nz], mu = 0, herit=h, p.neg=0.5, sigma=1.6) # simulate responses
yy$coefs


#y = yy$y.normal; fam = "gaussian"
y = yy$y.ordinal; fam = "binomial"

f = glmNet(x, y, family = fam, alpha = 1, ncv = 2)
cv = cv.bh(f, ncv = 2)
cv$measures 

f1 = bglm(y ~ ., data = x, family = fam, prior = De(scale=f$prior.scale))
cv1 = cv.bh(f1, foldid = cv$foldid)
cv1$measures

par(mfrow = c(1, 2), cex.axis = 1, mar = c(3, 4, 4, 4))
plot.bh(coefs = f$coef[-1], threshold = 10, gap = 10) 
plot.bh(f1, vars.rm = 1, gap = 10, col.pts = c("red", "black"), threshold = 0.01) 


# censored survival data
y = yy$y.surv

f = glmNet(x, y, family = "cox", alpha = 1, ncv = 2)
cv = cv.bh(f, ncv = 2)
cv$measures 

f1 = bcoxph(y ~ ., data = x, prior = De(scale=f$prior.scale))
cv1 = cv.bh(f1, foldid = cv$foldid)
cv1$measures

par(mfrow = c(1, 2), cex.axis = 1, mar = c(3, 4, 4, 4))
plot.bh(coefs = f$coef, threshold = 10, gap = 10) 
plot.bh(f1, gap = 10, col.pts = c("red", "black"), threshold = 0.01)

nyiuab/BhGLM documentation built on June 12, 2024, 9:28 p.m.