amalgamate_cv.glmnet: amalgamate_cv.glmnet

View source: R/dCVnet_utilities.R

amalgamate_cv.glmnetR Documentation

amalgamate_cv.glmnet

Description

Gathers results from a list of cv.glmnet objects and returns a merged, averaged object.

Usage

amalgamate_cv.glmnet(
  cvglmlist,
  checks = list(alpha = TRUE, lambda = FALSE, type.measure = TRUE)
)

Arguments

cvglmlist

a list of cv.glmnet models

checks

which input checks to run

Details

The arithmetic mean k-fold cross-validated loss (i.e. type.measure) is taken over the models (with the sd averaged via variance). The cv SE upper and lower limits (used in lambda.1se calculation) are then calculated from on the averaged data and finally the cv optimal lambda.1se and lambda.min values calculated for the averaged performance.

Consistent with cv.glmnet, the model coefficients within folds are not made available, averaged or otherwise investigable, but a whole data model is returned in the glmnet.fit slot.

The cvglmlist must contain cv.glmnet models suitable for averaging together. This typically means all models having the same:

  • family

  • x and y data

  • alpha value

  • lambda sequence

  • type.measure

  • number of k-fold CV folds

  • other cv.glmnet options

in order for the amalgamated results to "make sense". Essentially the models in the list should only differ on the random allocation of folds to cases (usually specified in foldid).

Some limited checks are implemented to ensure alpha, lambda and type.measure are identical. There is an option to turn these checks off, but this is not recommended.

This function presently does not honour the "keep" argument of cv.glmnet and all additional arrays/vectors are silently dropped.

Value

an object of class "cv.glmnet" is returned, which is a list with the ingredients of the cross-validation fit. If the object was created with relax=TRUE then this class has a prefix class of "cv.relaxed".

lambda

the values of lambda used in the fits.

cvm

The mean cross-validated error - a vector of length length(lambda).

cvsd

estimate of standard error of cvm.

cvup

upper curve = cvm+cvsd.

cvlo

lower curve = cvm-cvsd.

nzero

number of non-zero coefficients at each lambda.

name

a text string indicating type of measure (for plotting purposes).

glmnet.fit

a fitted glmnet object for the full data.

lambda.min

value of lambda that gives minimum cvm.

lambda.1se

largest value of lambda such that error is within 1 standard error of the minimum.

fit.preval

if keep=TRUE, this is the array of prevalidated fits. Some entries can be NA, if that and subsequent values of lambda are not reached for that fold

foldid

if keep=TRUE, the fold assignments used

index

a one column matrix with the indices of lambda.min and lambda.1se in the sequence of coefficients, fits etc.

relaxed

if relax=TRUE, this additional item has the CV info for each of the mixed fits. In particular it also selects lambda, gamma pairs corresponding to the 1se rule, as well as the minimum error. It also has a component index, a two-column matrix which contains the lambda and gamma indices corresponding to the "min" and "1se" solutions.

See Also

cv.glmnet

Examples

## Not run: 
data("CoxExample", package = "glmnet") # x and y
# folds for unstratified 10x-repeated 5-fold cv:
foldlist <- replicate(10,
sample(1:5, size = NROW(CoxExample$x), replace = TRUE),
simplify = FALSE)
names(foldlist) <- paste0("Rep", 1:10) # label the replications.
lambdaseq <- glmnet::cv.glmnet(x=CoxExample$x,
    y=CoxExample$y, family = "cox")$lambda
# create a list of models:
modellist <- lapply(foldlist, function(ff) {
glmnet::cv.glmnet(x = CoxExample$x, y = CoxExample$y,
family = "cox", foldid = ff,
    lambda = lambdaseq) } )

# use amalgamate to average results:
mod <- amalgamate_cv.glmnet(modellist)

# compare rep-rep performance variability with the average performance:
# rep1:
plot(modellist[[1]], main = "rep1")
# rep2:
plot(modellist[[2]], main = "rep2")
# etc.
# mean:
plot(mod, main = "averaged")

## End(Not run)

AndrewLawrence/dCVnet documentation built on Sept. 24, 2024, 5:24 a.m.