nestcv.glmnet | R Documentation |

This function enables nested cross-validation (CV) with glmnet including tuning of elastic net alpha parameter. The function also allows the option of embedded filtering of predictors for feature selection nested within the outer loop of CV. Predictions on the outer test folds are brought back together and error estimation/ accuracy determined. The default is 10x10 nested CV.

nestcv.glmnet( y, x, family = c("gaussian", "binomial", "poisson", "multinomial", "cox", "mgaussian"), filterFUN = NULL, filter_options = NULL, balance = NULL, balance_options = NULL, outer_method = c("cv", "LOOCV"), n_outer_folds = 10, n_inner_folds = 10, outer_folds = NULL, alphaSet = seq(0, 1, 0.1), min_1se = 0, keep = TRUE, outer_train_predict = FALSE, weights = NULL, penalty.factor = rep(1, ncol(x)), cv.cores = 1, finalCV = TRUE, na.option = "omit", ... )

`y` |
Response vector |

`x` |
Matrix of predictors. Dataframes will be coerced to a matrix as is necessary for glmnet. |

`family` |
Either a character string representing one of the built-in
families, or else a |

`filterFUN` |
Filter function, e.g. ttest_filter or relieff_filter.
Any function can be provided and is passed |

`filter_options` |
List of additional arguments passed to the filter
function specified by |

`balance` |
Specifies method for dealing with imbalanced class data.
Current options are |

`balance_options` |
List of additional arguments passed to the balancing function |

`outer_method` |
String of either |

`n_outer_folds` |
Number of outer CV folds |

`n_inner_folds` |
Number of inner CV folds |

`outer_folds` |
Optional list containing indices of test folds for outer
CV. If supplied, |

`alphaSet` |
Vector of alphas to be tuned |

`min_1se` |
Value from 0 to 1 specifying choice of optimal lambda from 0=lambda.min to 1=lambda.1se |

`keep` |
Logical indicating whether inner CV predictions are retained for
calculating left-out inner CV fold accuracy etc. See argument |

`outer_train_predict` |
Logical whether to save predictions on outer training folds to calculate performance on outer training folds. |

`weights` |
Weights applied to each sample. Note |

`penalty.factor` |
Separate penalty factors can be applied to each coefficient. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables. See glmnet |

`cv.cores` |
Number of cores for parallel processing of the outer loops.
NOTE: this uses |

`finalCV` |
Logical whether to perform one last round of CV on the whole
dataset to determine the final model parameters. If set to |

`na.option` |
Character value specifying how |

`...` |
Optional arguments passed to cv.glmnet |

glmnet does not tolerate missing values, so `na.option = "omit"`

is the
default.

An object with S3 class "nestcv.glmnet"

`call` |
the matched call |

`output` |
Predictions on the left-out outer folds |

`outer_result` |
List object of results from each outer fold containing predictions on left-out outer folds, best lambda, best alpha, fitted glmnet coefficients, list object of inner fitted cv.glmnet and number of filtered predictors at each fold. |

`outer_method` |
the |

`n_inner_folds` |
number of inner folds |

`outer_folds` |
List of indices of outer test folds |

`dimx` |
dimensions of |

`y` |
original response vector |

`yfinal` |
final response vector (post-balancing) |

`final_param` |
Final mean best lambda and alpha from each fold |

`final_fit` |
Final fitted glmnet model |

`final_coef` |
Final model coefficients and mean expression |

`roc` |
ROC AUC for binary classification where available. |

`summary` |
Overall performance summary. Accuracy and balanced accuracy for classification. ROC AUC for binary classification. RMSE for regression. |

Myles Lewis

## Example binary classification problem with P >> n x <- matrix(rnorm(150 * 2e+04), 150, 2e+04) # predictors y <- factor(rbinom(150, 1, 0.5)) # binary response ## Partition data into 2/3 training set, 1/3 test set trainSet <- caret::createDataPartition(y, p = 0.66, list = FALSE) ## t-test filter using whole dataset filt <- ttest_filter(y, x, nfilter = 100) filx <- x[, filt] ## Train glmnet on training set only using filtered predictor matrix library(glmnet) fit <- cv.glmnet(filx[trainSet, ], y[trainSet], family = "binomial") plot(fit) ## Predict response on test partition predy <- predict(fit, newx = filx[-trainSet, ], s = "lambda.min", type = "class") predy <- as.vector(predy) predyp <- predict(fit, newx = filx[-trainSet, ], s = "lambda.min", type = "response") predyp <- as.vector(predyp) output <- data.frame(testy = y[-trainSet], predy = predy, predyp = predyp) ## Results on test partition ## shows bias since univariate filtering was applied to whole dataset predSummary(output) ## Nested CV fit2 <- nestcv.glmnet(y, x, family = "binomial", alphaSet = 1, filterFUN = ttest_filter, filter_options = list(nfilter = 100)) summary(fit2) plot_lambdas(fit2, showLegend = "bottomright") ## ROC plots library(pROC) testroc <- roc(output$testy, output$predyp, direction = "<") inroc <- innercv_roc(fit2) plot(fit2$roc) lines(inroc, col = 'blue') lines(testroc, col = 'red') legend('bottomright', legend = c("Nested CV", "Left-out inner CV folds", "Test partition, non-nested filtering"), col = c("black", "blue", "red"), lty = 1, lwd = 2, bty = "n")

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.