cv_do: Perform Cross Validation
In kuwisdelu/matter: Out-of-core statistical computing and signal processing

cv_do

R Documentation

Perform Cross Validation

Description

Perform k-fold cross-validation with an arbitrary modeling function.

Usage

cv_do(fit., x, y, folds, ...,
	mi = !is.null(bags), bags = NULL, pos = 1L,
	predict. = predict, transpose = FALSE, keep.models = TRUE,
	trainProcess = NULL, trainArgs = list(),
	testProcess = NULL, testArgs = list(),
	verbose = NA, chunkopts = list(),
	BPPARAM = bpparam())

## S3 method for class 'cv'
fitted(object, type = c("response", "class"),
	simplify = TRUE, ...)

Arguments

`fit.`	The function used to fit the model.
`x`, `y`	The data and response variable.
`folds`	A vector coercible to a factor giving the fold for each row or column of `x`.
`mi`	Should `mi_learn` be called with `fit.` for multiple instance learning?
`bags`	If provided, subsetted and passed to `fit.` or `mi_learn` if `mi=TRUE`.
`pos`	The positive class for multiple instance learning. Only used if `mi=TRUE`.
`...`	Additional arguments passed to `fit.` and `predict.`.
`predict.`	The function used to predict on new data from the fitted model. The fitted model is passed as the 1st argument and the test data is passed as the 2nd argument.
`transpose`	A logical value indicating whether `x` should be considered transposed or not. This can be useful if the input matrix is (P x N) instead of (N x P) and storing the transpose is expensive. This is not necessary for `matter_mat` and `sparse_mat` objects, but can be useful for large in-memory (P x N) matrices.
`keep.models`	Should the models be kept and returned?
`trainProcess`, `trainArgs`	A function and arguments used for processing the training sets. The training set is passed as the 1st argument to `trainProcess`.
`testProcess`, `testArgs`	A function and arguments used for processing the test sets. The test set is passed as the 1st argument to `trainProcess`, and the processed training set is passed as the 2nd argument.
`verbose`	Should progress be printed for each iteration?
`chunkopts`	Passed to `fit.`, `predict.`, `trainProcess` and `testProcess`. See `chunkApply` for details.
`BPPARAM`	An optional instance of `BiocParallelParam`. See documentation for `bplapply`. Passed to `fit.`, `predict.`, `trainProcess` and `testProcess`.
`object`	An object inheriting from `cv`.
`type`	The type of prediction, where `"response"` means the fitted response matrix and `"class"` will be the vector of class predictions (only for classification).
`simplify`	Should the predictions be simplified (from a list) to an array (`type="response"`) or data frame (`type="class"`)?

Details

The cross-validation is not performed in parallel, because it is assumed the pre-processing functions, modeling function, and prediction function may make use of parallelization. Therefore, these functions need to be able to handle (or ignore) the arguments nchunks and BPPARAM, which will be passed to them.

If bags is specified, then multiple instance learning is assumed, where observations from the same bag are all assumed to have the same label. The labels for bags are automatically pooled (from y) so that if any observation in a bag is pos, then the entire bag is labeled pos. If mi=TRUE then mi_learn will be called by cv_do; otherwise it is assumed fn will handle the multiple instance learning. The accuracy metrics are calculated with the original y labels.

Value

An object of class cv, with the following components:

average: The average accuracy metrics.
scores: The fold-specific accuracy metrics.
folds: The fold memberships.
fitted.values: The fold-specific predictions.
models: (Optional) The fitted models.

Author(s)

Kylie A. Bemis

Examples

register(SerialParam())

set.seed(1)
n <- 100
p <- 5
nfolds <- 3
y <- rep(c(rep.int("yes", 60), rep.int("no", 40)), nfolds)
x <- matrix(rnorm(nfolds * n * p), nrow=nfolds * n, ncol=p)
x[,1L] <- x[,1L] + 2 * ifelse(y == "yes", runif(n), -runif(n))
x[,2L] <- x[,2L] + 2 * ifelse(y == "no", runif(n), -runif(n))
folds <- rep(paste0("set", seq_len(nfolds)), each=n)

cv_do(pls_nipals, x, y, k=1:5, folds=folds)

kuwisdelu/matter documentation built on April 12, 2025, 2:41 p.m.