Jacc: Match clusters across two clusterings and plot results

Description Usage Arguments Details Value

View source: R/99_Jacc.R

Description

Compares two labels of cluster assignment per data point (or a vector of ground-truth labels and a clustering vector) c1 and c2, matching groups in each vector to each other while maximising the value of an evaluation metric obj. The evaluation metric obj is either f1 (default), precision or recall.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
Jacc(
  c1,
  c2,
  obj = "f1",
  title = "Jaccard heatmap",
  unassigned = NULL,
  generate_plot = TRUE,
  c1_name = "c1",
  c2_name = "c2",
  scoring_matrix = NULL,
  verbose = FALSE
)

Arguments

c1

factor, numeric or character vector: assignment of each data point to a cluster or otherwise defined population

c2

factor, numeric or character vector: assignment of each data point to a cluster

obj

string: evaluation metric used for matching groups in c1 and c2; one of f1 (default), precision and recall

title

string: tile of Jaccard similarity heatmap plot (default value is 'Jaccard heatmap')

unassigned

optional string vector: names of levels of c1 denoting unlabelled data points

generate_plot

logical: whether a Jaccard heatmap-style plot should be generated (default value is TRUE)

c1_name

optional string: name of the c1 vector to be used in text of the plot (default value is 'c1')

c2_name

optional string: name of the c2 vector to be used in text of the plot (default value is 'c2')

scoring_matrix

optional numeric matrix: scoring matrix for hierarchical penalties (see function Benchmark). Default value is NULL

verbose

logical: indicates whether to display progress messages (default value is FALSE)

Details

Three approaches are used to solve the cluster-cluster (or label-cluster) matching problem. All of them seek to maximise the total value of obj. Approach (i) gives 1-to-1 matches, whereby each group in c1 is matched to a (different) group in c2. (In the special case where the number of groups in c1 is equal to the number of groups in c2, this guarantees no unmatched groups.) Approach (ii) uses a relaxed fixed-c1 matching, whereby each group in c1 is matched to the group in c2 that maximises obj value of the match. This can result in 1-to-many matches. Approach (iii) uses a relaxed fixed-c2 matching, which mirrors approach (ii).

If c1 is in fact a vector of ground-truth labels (or manual annotation of each data point), there may be de-facto unlabelled data points in the original data. unassigned is an optional vector of the labels given to data points which don't belong to an annotated population. If specified, the unassigned groups in c1 are left out of the evaluation: points that are unassigned are ignored in constructing the contingency tables for each match and groups in c2 may not be matched to these unassigned points.

In addition to evaluation results, a heatmap showing agreement between c1 and c2 and agreement between the different matching approaches is produced by default.

Value

list of results for evaluation approach (i) Results.Bijective, approach (ii) Results.FixedC1 and approach Results.FixedC2, as well as a Jaccard similarity heatmap diagram Plot (if produced)


davnovak/SingleBench documentation built on Dec. 19, 2021, 9:10 p.m.