CreatePenaltyScoringMatrix: Create a match/mismatch scoring matrix with hierarchical...

Description Usage Arguments Details See Also

View source: R/99_HierarchicalPenalties.R

Description

Creates a matrix of scores for evaluating the prediction of cell population labels. Each pair of labels gets assigned a score for match (if they are identical) or mismatch. Rows of the matrix correspond to true labels and columns correspond to predicted labels.

Usage

1
CreatePenaltyScoringMatrix(labels, g_c = 0, g_a = 0.2, m_c = 0.4, m_a = 0)

Arguments

labels

character vector: all possible manual labels

g_c

numeric: constant generalisation penalty. Default value is 0

g_a

numeric: additive generalisation penalty. Default value is 0.2

m_c

numeric: constant misidentification penalty. Default value is 0.4

m_a

numeric: additive misidentification penalty. Default value is 0

Details

Hierarchical penalties

If the manual labels of your input data are derived from a hierarchy of populations (ie. gating hierarchy for cytometry data), you can make use of the entire hierarchy for evaluation purposes. For instance, instead of using a 'CD4+ T cell' label, you can use 'Lymphocyte/T cell/CD4+ T cell' (using a path-like label with '/' as separator). Then, if you apply a clustering tool and match each cluster to a population present in the data, SingleBench can evaluate the quality of clustering more carefully. Specifically, instead of distinguishing between match versus mismatch, a scoring matrix is produced which penalises mismatches with different severity. For instance, to misclassify 'Lymphocyte/T cell/CD4+ T cell' as 'Lymphocyte/T cell' can be better than misclassifying it as 'Lymphocyte/T cell/CD8+ T cell', which is still better than misclassifying it as 'Lymphocyte/B cell/Alpha-Beta Mature B Cell'.

The scoring of each potential mismatch is based on the route from the true label to the predicted label through the label hierarchy tree. To parametrise the hierarchical penalty model, you can set 4 custom values. Firstly, the 'constant generalisation penalty' g_c penalises the first step taken in the direction of the tree root and the 'additive generalisation penalty' g_a penalises each step in that direction. Secondly, the 'constant misidentification penalty' m_a panelises the first step taken in the direction of the tree leaves and the 'additive misidentification penalty' m_a penalises each step in that direction. The values of these penalties are positive values, and the sum of penalties for a misclassification get subtracted from 1, which is the score for correct match. By default, g_c = 0, g_a = 0.2, m_c = 0.4, m_a = 0.

See Also


davnovak/SingleBench documentation built on Dec. 19, 2021, 9:10 p.m.