cv.spikeslab: K-fold Cross-Validation for Spike and Slab Regression
In spikeslab: Prediction and Variable Selection Using Spike and Slab Regression

View source: R/cv.spikeslab.R

cv.spikeslab

R Documentation

K-fold Cross-Validation for Spike and Slab Regression

Description

Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.

Usage

cv.spikeslab(x = NULL, y = NULL, K = 10,
    plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE,
    bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln),
    r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE,
    fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE,
    ntree = 300, seed = NULL, ...)

Arguments

`x`	x-predictor matrix.
`y`	y-response values.
`K`	Number of folds.
`plot.it`	If TRUE, plots the mean prediction error and its standard error.
`n.iter1`	Number of burn-in Gibbs sampled values (i.e., discarded values).
`n.iter2`	Number of Gibbs sampled values, following burn-in.
`mse`	If TRUE, an external estimate for the overall variance is calculated.
`bigp.smalln`	Use if `p` >> `n`.
`bigp.smalln.factor`	Top `n` times this value of variables to be kept in the filtering step (used when `p` >> `n`).
`screen`	If TRUE, variables are first pre-filtered.
`r.effects`	List used for grouping variables (see details below).
`max.var`	Maximum number of variables allowed in the final model.
`center`	If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples.
`intercept`	If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE.
`fast`	If TRUE, use blocked Gibbs sampling to accelerate the algorithm.
`beta.blocks`	Update beta using this number of blocks (`fast` must be TRUE).
`verbose`	If TRUE, verbose output is sent to the terminal.
`save.all`	If TRUE, spikeslab object for each fold is saved and returned.
`ntree`	Number of trees used by random forests (applies only when `mse` is TRUE).
`seed`	Seed for random number generator. Must be a negative integer.
`...`	Further arguments passed to or from other methods.

Value

Invisibly returns a list with components:

`spikeslab.obj`	Spike and slab object from the full data.
`cv.spikeslab.obj`	List containing spike and slab objects from each fold. Can be NULL.
`cv.fold`	List containing the cv splits.
`cv`	Mean-squared error for each fold for the gnet.
`cv.path`	A matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds.
`stability`	Matrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values.
`bma`	bma coefficients from the full data in terms of the standardized x.
`bma.scale`	bma coefficients from the full data, scaled in terms of the original x.
`gnet`	cv-optimized gnet in terms of the standardized x.
`gnet.scale`	cv-optimized gnet in terms of the original x.
`gnet.model`	List of models selected by gnet over the K-folds.
`gnet.path`	gnet path from the full data, scaled in terms of the original x.
`gnet.obj`	gnet object from fitting the full data (a lars-type object).
`gnet.obj.vars`	Variables (in order) used to calculate the gnet object.
`verbose`	Verbose details (used for printing).

Author(s)

Hemant Ishwaran (hemant.ishwaran@gmail.com)

J. Sunil Rao (rao.jsunil@gmail.com)

Udaya B. Kogalur (ubk@kogalur.com)

References

Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.

Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.

Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.

Examples

## Not run: 
#------------------------------------------------------------
# Example 1: 10-fold validation using parallel processing
#------------------------------------------------------------

data(ozoneI, package = "spikeslab")
y <- ozoneI[,  1]
x <- ozoneI[, -1]
cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4)
plot(cv.obj, plot.type = "cv")
plot(cv.obj, plot.type = "path")

#------------------------------------------------------------
# Example 2: 10-fold validation using parallel processing
# (high dimensional diabetes data)
#------------------------------------------------------------

# add 2000 noise variables
data(diabetesI, package = "spikeslab")
diabetes.noise <- cbind(diabetesI,
      noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI)))
x <- diabetes.noise[, -1]
y <- diabetes.noise[, 1]

cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4)
plot(cv.obj)

## End(Not run)

spikeslab documentation built on April 27, 2022, 1:05 a.m.