Taxonomy

The similarity matrix represents a graph with vertices and edges.

Each vertex belongs to 3 nested sets

We calculate metrics hierarchically:

We can aggregate each of these metrics to produce more metrics:

Consider a compound perturbation experiment done in replicates in a multi-well plate. Each compound belongs to one (or more) MOAs.

Further,

The metrics implemented in matric are defined below.

Level 1-0

Raw metrics

| Metric | Description | |:-------------|:------------------------------------------------------| | sim_mean_i | mean similarity of a vertex to its replicate vertices |

Related: sim_median_i which uses median instead of mean.

Scaled metrics

+-----------------------------+--------------------------------------------------------------------------------+ | Metric | Description | +:============================+:===============================================================================+ | sim_scaled_mean_non_rep_i | scale sim_mean_i using sim_mean_stat_non_rep_i and sim_sd_stat_non_rep_i | +-----------------------------+--------------------------------------------------------------------------------+

where

Related:

Rank-based and retrieval-based metrics

Consider a list of vertices comprising

+---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Metric | Description | +:============================================+:========================================================================================================================================================================================+ | sim_ranked_relrank_mean_non_rep_i | the mean percentile of the vertex's replicates in this list | +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sim_retrieval_average_precision_non_rep_i | the average precision reported on the list, with the replicates being the positive class | +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | sim_retrieval_r_precision_non_rep_i | similarly, the R-precision reported on the list | +---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Related:

Level 1 aggregations Level 1-0 metrics

Note: These are Level 1 summaries of scaling parameters; they are not used for scaling, themselves:

Level 2-1

Raw metrics

+--------------+--------------------------------------------------------------------------------+ | Metric | Description | +:=============+:===============================================================================+ | sim_mean_g | mean similarity of vertices in a replicate set to its group replicate vertices | +--------------+--------------------------------------------------------------------------------+

Related: sim_median_g which uses median instead of mean.

Scaled metrics

+-----------------------------+--------------------------------------------------------------------------------+ | Metric | Description | +:============================+:===============================================================================+ | sim_scaled_mean_non_rep_g | scale sim_mean_g using sim_mean_stat_non_rep_g and sim_sd_stat_non_rep_g | +-----------------------------+--------------------------------------------------------------------------------+

where

Related:

Rank-based and retrieval-based metrics

Consider a list of vertices comprising

We define metrics similar to the corresponding Level 1-0 metrics:

Level 2 aggregations of Level 2-1 metrics

These are not implemented.

Addendum

This a related discussion on metrics, from here.

We have a weighted graph where the vertices are perturbations with multiple labels (e.g. pathways in the case of genetic perturbations), and edges are the similarity between the vertices (e.g. the cosine similarity between image-based profiles of two CRISPR knockouts).

There are three levels of ranked lists of edges, each of which can produce global metrics (based on classification metrics like average precision or other so-called class probability metrics). These global metrics can be used to compare representations.

In all 3 cases, we pose it as a binary classification problem on the edges:

The three levels of ranked lists of edges, along with the metrics they induce, are below

(Not all the metrics are useful, and some may be very similar to others. I have highlighted the ones I think are useful.)

  1. Global: Single list, comprising all edges

```{=html}

a.  We can directly compute a single ***global metric*** from this list

```{=html}
<!-- -->
  1. Label-specific: One list per label, comprising all edges that have at least one vertex with the label

```{=html}

a.  We can compute a ***label-specific*** metric, from each list, with an additional constraint on Class 1 edges: both vertices should share the label being evaluated.
b.  We can then (weighted) average the label-specific metrics to get a single *global metric*.
c.  We can also directly compute a *global metric* directly across all the label-specific lists.

```{=html}
<!-- -->
  1. Sample-specific: One list per sample, comprising all edges that have at least one vertex as that sample

{=html} <!-- --> a. We can compute a sample-specific metric, from each list. b. We can then average the sample-specific metrics to get a label-specific metric, but filtered like in 1a although it may not be quite as straightforward; 2.d might be better. c. We can further (weighted) average the label-specific metrics to get a single global metric. d. We can also directly compute a label-specific metric directly across the sample-specific lists, but filtered like in 1a. e. We can also directly average the sample-specific metrics to get a single global metric. f. We can also directly compute a single global metric directly across all the sample-specific lists. g. We can also (weighted) average the label-specific metric in 2d to get a single global metric.

Notes:

Categorization based on https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification (I did not double-check; there could be errors)

| Index | Averaging | Metric type | |:------|:-------------------------------------|:---------------| | 0.a | micro | global | | 1.a | micro | label-specific | | 1.b | macro | global | | 1.c | micro | global | | 2.b | macro | label-specific | | 2.c | macro of macro-label-specific | global | | 2.d | micro | label-specific | | 2.e | macro | global | | 2.f | micro | global | | 2.g | macro of micro-label-specific | global |



Try the matric package in your browser

Any scripts or data that you put into this service are public.

matric documentation built on April 1, 2023, 12:19 a.m.