nested.cv: Estimating predictive performance via nested cross-validation

Description Usage Arguments Value Examples

View source: R/functions.R

Description

Performs a nested cross-validation to assess the predictive performance. The inner loop is used to determine the optimal lambda (as in cv.glmnet) and the outer loop is used to asses the predictive performance in an unbiased way.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
nested.cv(
  x,
  y,
  upstream,
  method = "tandem",
  family = "gaussian",
  nfolds = 10,
  nfolds_inner = 10,
  foldid = NULL,
  lambda_upstream = "lambda.1se",
  lambda_downstream = "lambda.1se",
  lambda_glmnet = "lambda.1se",
  ...
)

Arguments

x

A feature matrix, where the rows correspond to samples and the columns to features.

y

A vector containing the response.

upstream

A logical index vector that indicates for each feature whether it's upstream (TRUE) or downstream (FALSE).

method

Indicates whether the nested cross-validation is performed on TANDEM or on the classic approach (glmnet). Should be either "tandem" or "glmnet".

family

The family parameter that's passed to cv.glmnet(). Currently, only family='gaussian' is supported.

nfolds

Number of cross-validation folds (default is 10) used in the outer cross-validation loop.

nfolds_inner

Number of cross-validation folds (default is 10) used to determine the optimal lambda in the inner cross-validation loop.

foldid

An optional vector indicating in which cross-validation fold each sample should be in the outer cross-validation loop. Overrides nfolds when used.

lambda_upstream

Only used when method='tandem'. For the first stage (using the upstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se.

lambda_downstream

Only used when method='tandem'. For the second stage (using the downstream features), should glmnet use lambda.min or lambda.1se? Default is lambda.1se.

lambda_glmnet

Only used when method='glmnet'. Should glmnet use lambda.min or lambda.1se? Default is lambda.1se.

...

Other parameters that are passed to cv.glmnet().

Value

The predicted response vector y_hat and the mean-squared error (MSE).

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
# unpack example data
x = example_data$x
y = example_data$y
upstream = example_data$upstream

# assess the prediction error in a nested cv-loop
# fix the seed to have the same foldids between the two methods
set.seed(1)
cv_tandem = nested.cv(x, y, upstream, method="tandem", alpha=0.5)
set.seed(1)
cv_glmnet = nested.cv(x, y, upstream, method="glmnet", alpha=0.5)
barplot(c(cv_tandem$mse, cv_glmnet$mse), ylab="MSE", names=c("TANDEM", "Classic approach"))

NKI-CCB/TANDEM documentation built on Nov. 25, 2019, 11:18 p.m.