cv_do: Perform Cross Validation

View source: R/cv.R

cv_doR Documentation

Perform Cross Validation

Description

Perform k-fold cross-validation with an arbitrary modeling function.

Usage

cv_do(fit., x, y, folds, ...,
	mi = !is.null(bags), bags = NULL, pos = 1L,
	predict. = predict, transpose = FALSE, keep.models = TRUE,
	trainProcess = NULL, trainArgs = list(),
	testProcess = NULL, testArgs = list(),
	verbose = NA, nchunks = NA, BPPARAM = bpparam())

## S3 method for class 'cv'
fitted(object, type = c("response", "class"),
	simplify = TRUE, ...)

Arguments

fit.

The function used to fit the model.

x, y

The data and response variable.

folds

A vector coercible to a factor giving the fold for each row or column of x.

mi

Should mi_learn be called with fit. for multiple instance learning?

bags

If provided, subsetted and passed to fit. or mi_learn if mi=TRUE.

pos

The positive class for multiple instance learning. Only used if mi=TRUE.

...

Additional arguments passed to fit. and predict..

predict.

The function used to predict on new data from the fitted model. The fitted model is passed as the 1st argument and the test data is passed as the 2nd argument.

transpose

A logical value indicating whether x should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for matter_mat and sparse_mat objects, but can be useful for large in-memory (P x N) matrices.

keep.models

Should the models be kept and returned?

trainProcess, trainArgs

A function and arguments used for processing the training sets. The training set is passed as the 1st argument to trainProcess.

testProcess, testArgs

A function and arguments used for processing the test sets. The test set is passed as the 1st argument to trainProcess, and the processed training set is passed as the 2nd argument.

verbose

Should progress be printed for each iteration?

nchunks

The number of chunks to use. Passed to fit., predict., trainProcess and testProcess.

BPPARAM

An optional instance of BiocParallelParam. See documentation for bplapply. Passed to fit., predict., trainProcess and testProcess.

object

An object inheriting from cv.

type

The type of prediction, where "response" means the fitted response matrix and "class" will be the vector of class predictions (only for classification).

simplify

Should the predictions be simplified (from a list) to an array (type="response") or data frame (type="class")?

Details

The cross-validation is not performed in parallel, because it is assumed the pre-processing functions, modeling function, and prediction function may make use of parallelization. Therefore, these functions need to be able to handle (or ignore) the arguments nchunks and BPPARAM, which will be passed to them.

If bags is specified, then multiple instance learning is assumed, where observations from the same bag are all assumed to have the same label. The labels for bags are automatically pooled (from y) so that if any observation in a bag is pos, then the entire bag is labeled pos. If mi=TRUE then mi_learn will be called by cv_do; otherwise it is assumed fn will handle the multiple instance learning. The accuracy metrics are calculated with the original y labels.

Value

An object of class cv, with the following components:

  • average: The average accuracy metrics.

  • scores: The fold-specific accuracy metrics.

  • folds: The fold memberships.

  • fitted.values: The fold-specific predictions.

  • models: (Optional) The fitted models.

Author(s)

Kylie A. Bemis

See Also

predscore

Examples

register(SerialParam())

set.seed(1)
n <- 100
p <- 5
nfolds <- 3
y <- rep(c(rep.int("yes", 60), rep.int("no", 40)), nfolds)
x <- matrix(rnorm(nfolds * n * p), nrow=nfolds * n, ncol=p)
x[,1L] <- x[,1L] + 2 * ifelse(y == "yes", runif(n), -runif(n))
x[,2L] <- x[,2L] + 2 * ifelse(y == "no", runif(n), -runif(n))
folds <- rep(paste0("set", seq_len(nfolds)), each=n)

cv_do(pls_nipals, x, y, k=1:5, folds=folds)

kuwisdelu/matter documentation built on May 1, 2024, 5:17 a.m.