compareAlgorithms: Compare performance of several algorithms on the same data,...
In AleMorales/SeedSorter: Supervised classification of seeds from flow cytometry

Description Usage Arguments Details Value

Compare performance of several algorithms on the same data, with or without hyperparameter tuning

1	compareAlgorithms(algorithms, task, tuning = FALSE, control = list())

`algorithms`	Vector with the names of algorithms to be compared (same algorithms as in `trainAlgorithm()`).
`task`	Either one classification task for comparison using cross-validation or a list of tasks for comparisons across tasks (see Details).
`tuning`	Whether to tune the learners or not (default: `FALSE`).
`control`	Optional list of settings (see Details).

The comparison of algorithms differs depending on where a single classification task or multiple classification tasks are used. In the first approach, a repeated cross-validation scheme is used to partition the task into subsets multiple times, resulting in a comparison for each combination of subsets. If the algorithms are being tuned (which uses five-fold cross-validation), the resampling using for this tuning is nested within the training folds of the outer cross-validation scheme.

In the second approach, each learner is trained on each tasks (without resampling) and used to make prediction on all other tasks. That is, if there are n tasks, this will result in (n - 1)*n predictions, performed with n trained models.

Parallelization is always applied over the outermost loop for a given learner. That is, when comparing algorithms within one classification task, the parallization will be applied over the resampling iterations of the outer cross-validation scheme. When comparing across tasks, the parallelization will be applied over the tasks used for training the models.

The following settings can be passed to the control argument:

folds: Number of cross-validation folds used in the outer resampling scheme when comparing algorithms within one task It has no effect when comparing algorithms across multiple tasks. Default: 5.
reps: Number of repetitions of the cross-validation in the outer resampling scheme when comparing algorithms within one task It has no effect when comparing algorithms across multiple tasks. Default: 3.
parallel: Whether to use parallelization or not. Default: FALSE.
nthreads: Number of threads/workers to be used for parallelization. Default is the number of cores as reported by parallel::detectCores().
maxiter: Maximum number of iterations in the CMA-ES optimization of hyperparameters. Default: 10.
lambda: Number of offspring in each iteration of the CMA-ES optimization of hyperparameters. Default: 10.
seed: Random seed used for resampling schemes.