subsampling: Subsampling permutations of clustering dataset

View source: R/utility_functions.R

subsamplingR Documentation

Subsampling permutations of clustering dataset

Description

Creates subsamples via cross-validation folds or bootstrapping.

Usage

subsampling(..., subsampling_strategy = "cv")

cv_fold(
  dat_list,
  nfolds = 5,
  nruns = 2,
  stratified_cv = FALSE,
  anti_stratified = FALSE,
  cv_stratification_var = NULL,
  extra_fold = TRUE,
  ...
)

bootstrap(dat_list, nruns = 100, ...)

Arguments

...

extra arguments are ignored

subsampling_strategy

either "cv" or "bootstrap" (not implemented)

dat_list

list of datasets, each either data.table or data.frame (samples x features) with an "id" column or expression matrix (genes x samples) with named columns

nfolds

number of cross-validation folds

nruns

number of cross-validation replicates

stratified_cv

if TRUE, perform stratified sampling for folds

anti_stratified

if TRUE, maximize separation of batch labels within folds, opposite of stratified sampling

cv_stratification_var

labels used for stratification

extra_fold

if TRUE, generates an extra fold (nfolds+1) that corresponds to the full dataset

Value

list of data.frames with added columns "fold", "run" and "cv_index" as well as duplicated rows of the original data corresponding to different folds.

list of data.frames with added columns "fold", "run" and "cv_index" as well as duplicated rows of the original data corresponding to different folds.

list of data.frames

Functions

  • cv_fold(): Cross-validation based subsampling

  • bootstrap(): Subsampling via bootstrapping.


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.