Evaluate: Evaluate (run) a benchmark

Description Usage Arguments Optional scoring of projection steps Parallelisation See Also

View source: R/02_Evaluate_.R

Description

This function evaluates a benchmark pipeline, specified by an object of type Benchmark. This means that all the projection, clustering or projection->clustering subpipelines that were set up when creating the benchmark object are executed, and their performance is scored. Both the benchmark object and its auxiliary HDF5 file (created when the Benchmark constructor was called) are needed for this.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
Evaluate(
  benchmark,
  score_projections,
  n_cores,
  which_python,
  seed.projection,
  seed.clustering,
  ask_overwrite,
  verbose
)

Arguments

benchmark

object of class Benchmark, as generated by the constructor Benchmark

score_projections

logical: whether results of projection steps should be scored. Default value is FALSE

n_cores

optional integer: number of CPU threads to use for parallelisation of repeated runs of clustering for stability analysis. Default value is NULL (no parallelisation)

which_python

optional string: path to Python if Python needs to be used via reticulate. Default value is NULL (reticulate uses its default Python configuration)

seed.projection

optional numeric value: value random seed to be used prior to each deployment of a projection method. (Use NULL to avoid setting a seed.) Default value is 1

seed.clustering

optional numeric value: value random seed to be used prior to each deployment of a clustering method. (Use NULL to avoid setting a seed.) Default value is 1

ask_overwrite

logical: if benchmark was evaluated before, should the user be asked prior to overwriting the previous evaluation results? Default value is TRUE

verbose

logical: should progress messages be printed during evaluation? Default value is TRUE

projection_collapse_n

integer: upper bound of dataset size for which full distance matrices should be computed in evaluation (if score_projections is set to TRUE). Default value is 500

projection_neighbourhood

integer: number of nearest neighbours to use in K-ary neighbourhood-based evaluation of projection quality (if score_projections is set to TRUE and size of input dataset is less than projection_collapse_n). Default value is 100

Optional scoring of projection steps

Optionally, results of projection steps (if included) can be scored using evaluation metrics designed to measure the quality of dimension reduction (preservation of information versus original high-dimensional data). This makes sense for methods that reduce dimensionality of the original data for the purposes of visualisation or to make the data more amenable to clustering. To turn on scoring of projection steps, set parameter score_projection to TRUE. Based on a numeric bound (parameter projection_collapse_n), metrics based exclusively on k-nearest-neighbour graphs of original data and each projection will be computed if the row count of input data exceeds that value. If the row count is lower than or equal to the limit, full distance matrices (quadratic complexity) will be computed. In the first case, the local continuity meta-criterion (LCMC) as well as B_NX ('local intrusiveness versus extrusiveness') can be computed. In the second case, trustworthiness and continuity are also computed. By default, projection_collapse_n is set to 500, preventing the computation of dull distance matrices except for very small datasets. Additionally, the parameter projection_neighbourhood specifies the number of nearest neighbours used for partitioning the full co-ranking matrix (if the size of data is less than or equal to projection_collapse_n).

Parallelisation

For stability analysis of clustering tools, repeated runs of the tool can be run in parallel (unless this is forbidden in the tool wrapper). To do this, specify the parameter n_cores. To use all available CPU cores, you can use parallel::detectCores() as the value of n_cores.

See Also


davnovak/SingleBench documentation built on Dec. 19, 2021, 9:10 p.m.