cv.spikeslab | R Documentation |
Computes the K-fold cross-validated mean squared prediction error for the generalized elastic net from spike and slab regression. Returns a stability index for each variable.
cv.spikeslab(x = NULL, y = NULL, K = 10, plot.it = TRUE, n.iter1 = 500, n.iter2 = 500, mse = TRUE, bigp.smalln = FALSE, bigp.smalln.factor = 1, screen = (bigp.smalln), r.effects = NULL, max.var = 500, center = TRUE, intercept = TRUE, fast = TRUE, beta.blocks = 5, verbose = TRUE, save.all = TRUE, ntree = 300, seed = NULL, ...)
x |
x-predictor matrix. |
y |
y-response values. |
K |
Number of folds. |
plot.it |
If TRUE, plots the mean prediction error and its standard error. |
n.iter1 |
Number of burn-in Gibbs sampled values (i.e., discarded values). |
n.iter2 |
Number of Gibbs sampled values, following burn-in. |
mse |
If TRUE, an external estimate for the overall variance is calculated. |
bigp.smalln |
Use if |
bigp.smalln.factor |
Top |
screen |
If TRUE, variables are first pre-filtered. |
r.effects |
List used for grouping variables (see details below). |
max.var |
Maximum number of variables allowed in the final model. |
center |
If TRUE, variables are centered by their means. Default is TRUE and should only be adjusted in extreme examples. |
intercept |
If TRUE, an intercept is included in the model, otherwise no intercept is included. Default is TRUE. |
fast |
If TRUE, use blocked Gibbs sampling to accelerate the algorithm. |
beta.blocks |
Update beta using this number of blocks ( |
verbose |
If TRUE, verbose output is sent to the terminal. |
save.all |
If TRUE, spikeslab object for each fold is saved and returned. |
ntree |
Number of trees used by random forests (applies only when |
seed |
Seed for random number generator. Must be a negative integer. |
... |
Further arguments passed to or from other methods. |
Invisibly returns a list with components:
spikeslab.obj |
Spike and slab object from the full data. |
cv.spikeslab.obj |
List containing spike and slab objects from each fold. Can be NULL. |
cv.fold |
List containing the cv splits. |
cv |
Mean-squared error for each fold for the gnet. |
cv.path |
A matrix of mean-squared errors for the gnet solution path. Rows correspond to model sizes, columns are the folds. |
stability |
Matrix containing stability for each variable defined as the percentage of times a variable is identified over the K-folds. Also includes bma and gnet coefficient values and their cv-fold-averaged values. |
bma |
bma coefficients from the full data in terms of the standardized x. |
bma.scale |
bma coefficients from the full data, scaled in terms of the original x. |
gnet |
cv-optimized gnet in terms of the standardized x. |
gnet.scale |
cv-optimized gnet in terms of the original x. |
gnet.model |
List of models selected by gnet over the K-folds. |
gnet.path |
gnet path from the full data, scaled in terms of the original x. |
gnet.obj |
gnet object from fitting the full data (a lars-type object). |
gnet.obj.vars |
Variables (in order) used to calculate the gnet object. |
verbose |
Verbose details (used for printing). |
Hemant Ishwaran (hemant.ishwaran@gmail.com)
J. Sunil Rao (rao.jsunil@gmail.com)
Udaya B. Kogalur (ubk@kogalur.com)
Ishwaran H. and Rao J.S. (2005a). Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Statist., 33:730-773.
Ishwaran H. and Rao J.S. (2010). Generalized ridge regression: geometry and computational solutions when p is larger than n.
Ishwaran H. and Rao J.S. (2011). Mixing generalized ridge regressions.
sparsePC.spikeslab
,
plot.spikeslab
,
predict.spikeslab
,
print.spikeslab
.
## Not run: #------------------------------------------------------------ # Example 1: 10-fold validation using parallel processing #------------------------------------------------------------ data(ozoneI, package = "spikeslab") y <- ozoneI[, 1] x <- ozoneI[, -1] cv.obj <- cv.spikeslab(x = x, y = y, parallel = 4) plot(cv.obj, plot.type = "cv") plot(cv.obj, plot.type = "path") #------------------------------------------------------------ # Example 2: 10-fold validation using parallel processing # (high dimensional diabetes data) #------------------------------------------------------------ # add 2000 noise variables data(diabetesI, package = "spikeslab") diabetes.noise <- cbind(diabetesI, noise = matrix(rnorm(nrow(diabetesI) * 2000), nrow(diabetesI))) x <- diabetes.noise[, -1] y <- diabetes.noise[, 1] cv.obj <- cv.spikeslab(x = x, y = y, bigp.smalln=TRUE, parallel = 4) plot(cv.obj) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.