doubleCV: Double cross-validation for estimating performance of...
In multiridge: Fast Cross-Validation for Multi-Penalty Ridge Regression

doubleCV

R Documentation

Double cross-validation for estimating performance of `multiridge`

Description

Double cross-validation for estimating performance of multiridge. Outer fold is for testing, inner fold for penalty parameter tuning

Usage

doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5,
  infold = 10, nrepeatout =   1, nrepeatin = 1, balance = TRUE, fixedfolds =
  TRUE, intercept = ifelse(is(Y, "Surv"), FALSE,     TRUE), frac1 = NULL,
  score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE,
  printCV   = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)

Arguments

`penaltiesinit`	Numeric vector. Initial values for penaltyparameters. May be obtained from `fastCV2`.
`XXblocks`	List of `nxn` matrices. Usually output of `createXXblocks`.
`Y`	Response vector: numeric, binary, factor or `survival`.
`X1`	Matrix. Dimension `n x p_0, p_0 < n`, representing unpenalized covariates
`pairing`	Numerical vector of length 3 or `NULL` when pairs are absent. Represents the indices (in `XXblocks`) of the two data blocks involved in pairing, plus the index of the paired block.
`outfold`	Integer. Outer fold for test samples.
`infold`	Integer. Inner fold for tuning penalty parameters.
`nrepeatout`	Integer. Number of repeated splits for outer fold.
`nrepeatin`	Integer. Number of repeated splits for inner fold.
`balance`	Boolean. Should the splits be balanced in terms of response labels?
`fixedfolds`	Boolean. Should fixed splits be used for reproducibility?
`intercept`	Boolean. Should an intercept be included?
`frac1`	Scalar. Prior fraction of cases. Only relevant for `model=" logistic"`.
`score`	Character. See Details.
`model`	Character. Any of `c("linear", "logistic", "cox")`. Is inferred from `Y` when `NULL`.
`eps`	Scalar. Numerical bound for IWLS convergence.
`maxItr`	Integer. Maximum number of iterations used in IWLS.
`trace`	Boolean. Should the output of the IWLS algorithm be traced?
`printCV`	Boolean. Should the CV-score be printed on screen?
`reltol`	Scalar. Relative tolerance for optimization methods.
`optmethod1`	Character. First, global search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but simulated annealing by `"Sann"` is recommended to search a wide landscape. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`optmethod2`	Character. Second, local search method. Any of the methods `c("Brent", "Nelder-Mead", "Sann")` may be used, but `"Nelder-Mead"` is generally recommended. Other unconstrained methods offered by `optim` may also be used, but have not been tested.
`maxItropt1`	Integer. Maximum number of iterations for `optmethod1`.
`maxItropt2`	Integer. Maximum number of iterations for `optmethod2`.
`save`	Boolean. If TRUE appends the penalties and resulting CVscore to global variable `allscores`
`parallel`	Boolean. Should computation be done in parallel? If `TRUE`, requires to run `setupParallel` first.
`pref`	Integer vector or `NULL`. Contains indices of data types in `XXblocks` that are preferential.
`fixedpen`	Integer vector or `NULL`. Contains indices of data types of which penalty is fixed to the corresponding value in `penaltiesinit`.

Details

WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*nrepeatin*innerfold*maxItr*(maxItropt1+maxItropt2). Computing time may be estimated by multiplying computing time of optLambdasWrap by nrepeatout*outerfold. See Scoring for details on score.

Value

List with the following components:

`sampleindex`	Numerical vector: sample indices
`true`	True responses
`linpred`	Cross-validated linear predictors

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Double cross-validation
## Not run: 
perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,
score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE)

# Performance metrics
Scoring(perf$linpred,perf$true,score="auc",print=TRUE)
Scoring(perf$linpred,perf$true,score="brier",print=TRUE)
Scoring(perf$linpred,perf$true,score="loglik",print=TRUE)

## End(Not run)

multiridge documentation built on June 13, 2022, 5:07 p.m.