Description Usage Arguments Optional scoring of projection steps Parallelisation See Also
This function evaluates a benchmark pipeline, specified by an object of type Benchmark
.
This means that all the projection, clustering or projection->clustering subpipelines that were set up when creating the benchmark object are executed, and their performance is scored.
Both the benchmark object and its auxiliary HDF5 file (created when the Benchmark
constructor was called) are needed for this.
1 2 3 4 5 6 7 8 9 10 | Evaluate(
benchmark,
score_projections,
n_cores,
which_python,
seed.projection,
seed.clustering,
ask_overwrite,
verbose
)
|
benchmark |
object of class |
score_projections |
logical: whether results of projection steps should be scored. Default value is |
n_cores |
optional integer: number of CPU threads to use for parallelisation of repeated runs of clustering for stability analysis. Default value is |
which_python |
optional string: path to Python if Python needs to be used via |
seed.projection |
optional numeric value: value random seed to be used prior to each deployment of a projection method. (Use |
seed.clustering |
optional numeric value: value random seed to be used prior to each deployment of a clustering method. (Use |
ask_overwrite |
logical: if |
verbose |
logical: should progress messages be printed during evaluation? Default value is |
projection_collapse_n |
integer: upper bound of dataset size for which full distance matrices should be computed in evaluation (if |
projection_neighbourhood |
integer: number of nearest neighbours to use in K-ary neighbourhood-based evaluation of projection quality (if |
Optionally, results of projection steps (if included) can be scored using evaluation metrics designed to measure the quality of dimension reduction (preservation of information versus original high-dimensional data).
This makes sense for methods that reduce dimensionality of the original data for the purposes of visualisation or to make the data more amenable to clustering.
To turn on scoring of projection steps, set parameter score_projection
to TRUE
.
Based on a numeric bound (parameter projection_collapse_n
), metrics based exclusively on k-nearest-neighbour graphs of original data and each projection will be computed if the row count of input data exceeds that value.
If the row count is lower than or equal to the limit, full distance matrices (quadratic complexity) will be computed.
In the first case, the local continuity meta-criterion (LCMC) as well as B_NX ('local intrusiveness versus extrusiveness') can be computed.
In the second case, trustworthiness and continuity are also computed.
By default, projection_collapse_n
is set to 500
, preventing the computation of dull distance matrices except for very small datasets.
Additionally, the parameter projection_neighbourhood
specifies the number of nearest neighbours used for partitioning the full co-ranking matrix (if the size of data is less than or equal to projection_collapse_n
).
For stability analysis of clustering tools, repeated runs of the tool can be run in parallel (unless this is forbidden in the tool wrapper).
To do this, specify the parameter n_cores
.
To use all available CPU cores, you can use parallel::detectCores()
as the value of n_cores
.
AddLayout
: allows you to add a separate 2-dimensional layout of the input dataset or to use an existing projection (produced in the evaluation) as a visualisation layout.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.