computeError: Compute CV statistics from a prediction matrix

Description Usage Arguments Details Value Examples

View source: R/computeError.R

Description

Compute CV statistics from a matrix of predictions.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
computeError(
  predmat,
  y,
  lambda,
  foldid,
  type.measure,
  family,
  weights = rep(1, dim(predmat)[1]),
  grouped = TRUE
)

Arguments

predmat

Array of predictions. If 'y' is univariate, this has dimensions 'c(nobs, nlambda)'. If 'y' is multivariate with 'nc' levels/columns (e.g. for 'family = "multionmial"' or 'family = "mgaussian"'), this has dimensions 'c(nobs, nc, nlambda)'. Note that these should be on the same scale as 'y' (unlike in the glmnet package where it is the linear predictor).

y

Response variable. Either a vector or a matrix, depending on the type of model.

lambda

Lambda values associated with the errors in 'predmat'.

foldid

Vector of values identifying which fold each observation is in.

type.measure

Loss function to use for cross-validation. See 'availableTypeMeasures()' for possible values for 'type.measure'. Note that the package does not check if the user-specified measure is appropriate for the family.

family

Model family; used to determine the correct loss function.

weights

Observation weights.

grouped

This is an experimental argument, with default 'TRUE', and can be ignored by most users. For all models except 'family = "cox"', this refers to computing 'nfolds' separate statistics, and then using their mean and estimated standard error to describe the CV curve. If 'FALSE', an error matrix is built up at the observation level from the predictions from the 'nfolds' fits, and then summarized (does not apply to 'type.measure="auc"'). For the "cox" family, 'grouped=TRUE' obtains the CV partial likelihood for the Kth fold by subtraction; by subtracting the log partial likelihood evaluated on the full dataset from that evaluated on the on the (K-1)/K dataset. This makes more efficient use of risk sets. With 'grouped=FALSE' the log partial likelihood is computed only on the Kth fold.

Details

Note that for the setting where 'family = "cox"' and 'type.measure = "deviance"' and 'grouped = TRUE', 'predmat' needs to have a 'cvraw' attribute as computed by 'buildPredMat()'. This is because the usual matrix of pre-validated fits does not contain all the information needed to compute the model deviance for this setting.

Value

An object of class "cvobj".

lambda

The values of lambda used in the fits.

cvm

The mean cross-validated error: a vector of length 'length(lambda)'.

cvsd

Estimate of standard error of 'cvm'.

cvup

Upper curve = 'cvm + cvsd'.

cvlo

Lower curve = 'cvm - cvsd'.

lambda.min

Value of 'lambda' that gives minimum 'cvm'.

lambda.1se

Largest value of 'lambda' such that the error is within 1 standard error of the minimum.

index

A one-column matrix with the indices of 'lambda.min' and 'lambda.1se' in the sequence of coefficients, fits etc.

name

A text string indicating the loss function used (for plotting purposes).

Examples

1
2
3
4
5
6
7
8
set.seed(1)
x <- matrix(rnorm(500), nrow = 50)
y <- rnorm(50)
cv_fit <- kfoldcv(x, y, train_fun = glmnet::glmnet,
                  predict_fun = predict, keep = TRUE)
mae_err <- computeError(cv_fit$fit.preval, y, cv_fit$lambda,
                        cv_fit$foldid, type.measure = "mae",
                        family = "gaussian")

cvwrapr documentation built on June 11, 2021, 5:21 p.m.