crissCrossValidate: A function to perform pairwise cross validation

View source: R/crissCrossValidate.R

crissCrossValidateR Documentation

A function to perform pairwise cross validation

Description

This function has been designed to perform cross-validation and model prediction on datasets in a pairwise manner.

Usage

crissCrossValidate(
  measurements,
  outcomes,
  nFeatures = 20,
  selectionMethod = "auto",
  selectionOptimisation = "Resubstitution",
  trainType = c("modelTrain", "modelTest"),
  performanceType = "auto",
  doRandomFeatures = FALSE,
  classifier = "auto",
  nFolds = 5,
  nRepeats = 20,
  nCores = 1,
  verbose = 0
)

Arguments

measurements

A list of either DataFrame, data.frame or matrix class measurements.

outcomes

A list of vectors that respectively correspond to outcomes of the samples in measurements list.

nFeatures

The number of features to be used for modelling.

selectionMethod

Default: "auto". A character keyword of the feature algorithm to be used. If "auto", t-test (two categories) / F-test (three or more categories) ranking and top nFeatures optimisation is done. Otherwise, the ranking method is per-feature Cox proportional hazards p-value.

selectionOptimisation

A character of "Resubstitution", "Nested CV" or "none" specifying the approach used to optimise nFeatures.

trainType

Default: "modelTrain". A keyword specifying whether a fully trained model is used to make predictions on the test set or if only the feature identifiers are chosen using the training data set and a number of training-predictions are made by cross-validation in the test set.

performanceType

Default: "auto". If "auto", then balanced accuracy for classification or C-index for survival. Otherwise, any one of the options described in calcPerformance may otherwise be specified.

doRandomFeatures

Default: FALSE. Whether to perform random feature selection to establish a baseline performance. Either FALSE or TRUE are permitted values.

classifier

Default: "auto". A character keyword of the modelling algorithm to be used. If "auto", then a random forest is used for a classification task or Cox proportional hazards model for a survival task.

nFolds

A numeric specifying the number of folds to use for cross-validation.

nRepeats

A numeric specifying the the number of repeats or permutations to use for cross-validation.

nCores

A numeric specifying the number of cores used if the user wants to use parallelisation.

verbose

Default: 0. A number between 0 and 3 for the amount of progress messages to give. A higher number will produce more messages as more lower-level functions print messages.

Value

A list with elements "real" for the matrix of pairwise performance metrics using real feature selection, "random" if doRandomFeatures is TRUE for metrics of random selection and "params" for a list of parameters used during the execution of this function.

Author(s)

Harry Robertson


DarioS/ClassifyR documentation built on Dec. 19, 2024, 8:22 p.m.