doubleCV: Double cross-validation for estimating performance of...

View source: R/MultiLambdaCVfun.R

doubleCVR Documentation

Double cross-validation for estimating performance of multiridge

Description

Double cross-validation for estimating performance of multiridge. Outer fold is for testing, inner fold for penalty parameter tuning

Usage

doubleCV(penaltiesinit, XXblocks, Y, X1 = NULL, pairing = NULL, outfold = 5,
  infold = 10, nrepeatout =   1, nrepeatin = 1, balance = TRUE, fixedfolds =
  TRUE, intercept = ifelse(is(Y, "Surv"), FALSE,     TRUE), frac1 = NULL,
  score = "loglik",model = NULL, eps = 1e-07, maxItr = 10, trace = FALSE,
  printCV   = TRUE, reltol = 1e-04, optmethod1 = "SANN", optmethod2 =
  ifelse(length(penaltiesinit) == 1, "Brent", "Nelder-Mead"), maxItropt1 = 10,
  maxItropt2 = 25, save = FALSE, parallel = FALSE, pref = NULL, fixedpen = NULL)

Arguments

penaltiesinit

Numeric vector. Initial values for penaltyparameters. May be obtained from fastCV2.

XXblocks

List of nxn matrices. Usually output of createXXblocks.

Y

Response vector: numeric, binary, factor or survival.

X1

Matrix. Dimension n x p_0, p_0 < n, representing unpenalized covariates

pairing

Numerical vector of length 3 or NULL when pairs are absent. Represents the indices (in XXblocks) of the two data blocks involved in pairing, plus the index of the paired block.

outfold

Integer. Outer fold for test samples.

infold

Integer. Inner fold for tuning penalty parameters.

nrepeatout

Integer. Number of repeated splits for outer fold.

nrepeatin

Integer. Number of repeated splits for inner fold.

balance

Boolean. Should the splits be balanced in terms of response labels?

fixedfolds

Boolean. Should fixed splits be used for reproducibility?

intercept

Boolean. Should an intercept be included?

frac1

Scalar. Prior fraction of cases. Only relevant for model=" logistic".

score

Character. See Details.

model

Character. Any of c("linear", "logistic", "cox"). Is inferred from Y when NULL.

eps

Scalar. Numerical bound for IWLS convergence.

maxItr

Integer. Maximum number of iterations used in IWLS.

trace

Boolean. Should the output of the IWLS algorithm be traced?

printCV

Boolean. Should the CV-score be printed on screen?

reltol

Scalar. Relative tolerance for optimization methods.

optmethod1

Character. First, global search method. Any of the methods c("Brent", "Nelder-Mead", "Sann") may be used, but simulated annealing by "Sann" is recommended to search a wide landscape. Other unconstrained methods offered by optim may also be used, but have not been tested.

optmethod2

Character. Second, local search method. Any of the methods c("Brent", "Nelder-Mead", "Sann") may be used, but "Nelder-Mead" is generally recommended. Other unconstrained methods offered by optim may also be used, but have not been tested.

maxItropt1

Integer. Maximum number of iterations for optmethod1.

maxItropt2

Integer. Maximum number of iterations for optmethod2.

save

Boolean. If TRUE appends the penalties and resulting CVscore to global variable allscores

parallel

Boolean. Should computation be done in parallel? If TRUE, requires to run setupParallel first.

pref

Integer vector or NULL. Contains indices of data types in XXblocks that are preferential.

fixedpen

Integer vector or NULL. Contains indices of data types of which penalty is fixed to the corresponding value in penaltiesinit.

Details

WARNING: this function may be very time-consuming. The number of evaluations may equal nrepeatout*outerfold*nrepeatin*innerfold*maxItr*(maxItropt1+maxItropt2). Computing time may be estimated by multiplying computing time of optLambdasWrap by nrepeatout*outerfold. See Scoring for details on score.

Value

List with the following components:

sampleindex

Numerical vector: sample indices

true

True responses

linpred

Cross-validated linear predictors

See Also

optLambdas, optLambdasWrap which optimize the penalties. Scoring which may applied to output of this function to obtain overall cross-validated performance score. A full demo and data are available from:
https://drive.google.com/open?id=1NUfeOtN8-KZ8A2HZzveG506nBwgW64e4

Examples

data(dataXXmirmeth)
resp <- dataXXmirmeth[[1]]
XXmirmeth <- dataXXmirmeth[[2]]

# Find initial lambdas: fast CV per data block separately.
cvperblock2 <- fastCV2(XXblocks=XXmirmeth,Y=resp,kfold=10,fixedfolds = TRUE)
lambdas <- cvperblock2$lambdas

# Double cross-validation
## Not run: 
perf <- doubleCV(penaltiesinit=lambdas,XXblocks=XXmirmeth,Y=resp,
score="loglik",outfold=10, infold=10, nrepeatout=1, nrepeatin=3, parallel=TRUE)

# Performance metrics
Scoring(perf$linpred,perf$true,score="auc",print=TRUE)
Scoring(perf$linpred,perf$true,score="brier",print=TRUE)
Scoring(perf$linpred,perf$true,score="loglik",print=TRUE)

## End(Not run)

multiridge documentation built on June 13, 2022, 5:07 p.m.