The similarity matrix represents a graph with vertices and edges.
Each vertex belongs to 3 nested sets
We calculate metrics hierarchically:
We can aggregate each of these metrics to produce more metrics:
Consider a compound perturbation experiment done in replicates in a multi-well plate. Each compound belongs to one (or more) MOAs.
Further,
The metrics implemented in matric
are defined below.
| Metric | Description |
|:-------------|:------------------------------------------------------|
| sim_mean_i
| mean similarity of a vertex to its replicate vertices |
Related: sim_median_i
which uses median instead of mean.
+-----------------------------+--------------------------------------------------------------------------------+
| Metric | Description |
+:============================+:===============================================================================+
| sim_scaled_mean_non_rep_i
| scale sim_mean_i
using sim_mean_stat_non_rep_i
and sim_sd_stat_non_rep_i
|
+-----------------------------+--------------------------------------------------------------------------------+
where
sim_mean_stat_non_rep_i
and sim_sd_stat_non_rep_i
are the mean and s.d. of similarity of a vertex to its non-replicate vertices.Related:
sim_scaled_median_non_rep_i
which scales sim_median_i
instead of sim_mean_i
.sim_scaled_mean_ref_i
which scales sim_mean_i
w.r.t. reference vertices (i.e. uses sim_mean_stat_ref_i
and sim_sd_stat_ref_i
-- the mean and s.d. of similarity of a vertex to the references vertices -- to scale).sim_scaled_median_ref_i
which is the same as sim_scaled_mean_ref_i
except that is scales sim_median_i
instead of sim_mean_i
.Consider a list of vertices comprising
+---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Metric | Description |
+:============================================+:========================================================================================================================================================================================+
| sim_ranked_relrank_mean_non_rep_i
| the mean percentile of the vertex's replicates in this list |
+---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sim_retrieval_average_precision_non_rep_i
| the average precision reported on the list, with the replicates being the positive class |
+---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| sim_retrieval_r_precision_non_rep_i
| similarly, the R-precision reported on the list |
+---------------------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Related:
sim_ranked_relrank_median_non_rep_i
reports the median percentile instead of the mean percentile.sim_ranked_relrank_mean_ref_i
, sim_ranked_relrank_median_ref_i
, sim_retrieval_average_precision_ref_i
, and sim_retrieval_r_precision_non_rep_i
use a list of vertices comprising the reference vertices instead of the non-replicate vertices.sim_mean_i_mean_i
is the mean sim_mean_i
across all replicate vertices in a replicate set.sim_mean_i_median_i
, sim_median_i_mean_i
, and sim_median_i_median_i
are the corresponding Level 1 aggregated metrics for other combinations of Level 1-0 raw metrics and summary statistics.sim_scaled_mean_non_rep_i_mean_i
, sim_scaled_median_non_rep_i_median_i
, sim_scaled_mean_ref_i_mean_i
, sim_scaled_median_ref_i_median_i
are the corresponding Level 1 aggregated metrics for the scaled Level 1-0 metrics.sim_ranked_relrank_mean_ref_i_mean_i
, sim_ranked_relrank_mean_ref_i_median_i
, sim_ranked_relrank_median_ref_i_mean_i
, sim_ranked_relrank_median_ref_i_median_i
are the corresponding Level 1 aggregated metrics for the rank-based Level 1-0 metrics.sim_retrieval_average_precision_ref_i_mean_i
, sim_retrieval_average_precision_ref_i_median_i
, sim_retrieval_r_precision_ref_i_mean_i
, sim_retrieval_r_precision_ref_i_median_i
are the corresponding Level 1 aggregated metrics for the retrieval-based Level 1-0 metrics.Note: These are Level 1 summaries of scaling parameters; they are not used for scaling, themselves:
sim_mean_stat_non_rep_i_mean_i
, sim_sd_stat_non_rep_i_mean_i
, sim_mean_stat_non_rep_i_median_i
, sim_sd_stat_non_rep_i_median_i
sim_mean_stat_ref_i_mean_i
, sim_sd_stat_ref_i_mean_i
, sim_mean_stat_ref_i_median_i
, sim_sd_stat_ref_i_median_i
+--------------+--------------------------------------------------------------------------------+
| Metric | Description |
+:=============+:===============================================================================+
| sim_mean_g
| mean similarity of vertices in a replicate set to its group replicate vertices |
+--------------+--------------------------------------------------------------------------------+
Related: sim_median_g
which uses median instead of mean.
+-----------------------------+--------------------------------------------------------------------------------+
| Metric | Description |
+:============================+:===============================================================================+
| sim_scaled_mean_non_rep_g
| scale sim_mean_g
using sim_mean_stat_non_rep_g
and sim_sd_stat_non_rep_g
|
+-----------------------------+--------------------------------------------------------------------------------+
where
sim_mean_stat_non_rep_g
and sim_sd_stat_non_rep_g
are the mean and s.d. of similarity of vertices in a replicate set to their non-replicate (and non-group replicate) vertices.Related:
sim_scaled_median_non_rep_g
which scales sim_median_g
instead of sim_mean_g
.sim_scaled_mean_ref_g
which scales sim_mean_g
w.r.t. reference vertices (i.e. uses sim_mean_stat_ref_g
and sim_sd_stat_ref_g
-- the mean and s.d. of similarity of vertices in a replicate set to the references vertices -- to scale).sim_scaled_median_ref_i
which is the same as sim_scaled_mean_ref_i
except that is scales sim_median_i
instead of sim_mean_i
.Consider a list of vertices comprising
We define metrics similar to the corresponding Level 1-0 metrics:
sim_ranked_relrank_mean_non_rep_g
sim_ranked_relrank_median_non_rep_g
sim_retrieval_average_precision_non_rep_g
sim_retrieval_r_precision_non_rep_g
sim_ranked_relrank_median_ref_g
sim_ranked_relrank_median_ref_g
sim_retrieval_average_precision_ref_g
sim_retrieval_r_precision_ref_g
These are not implemented.
This a related discussion on metrics, from here.
We have a weighted graph where the vertices are perturbations with multiple labels (e.g. pathways in the case of genetic perturbations), and edges are the similarity between the vertices (e.g. the cosine similarity between image-based profiles of two CRISPR knockouts).
There are three levels of ranked lists of edges, each of which can produce global metrics (based on classification metrics like average precision or other so-called class probability metrics). These global metrics can be used to compare representations.
In all 3 cases, we pose it as a binary classification problem on the edges:
The three levels of ranked lists of edges, along with the metrics they induce, are below
(Not all the metrics are useful, and some may be very similar to others. I have highlighted the ones I think are useful.)
```{=html}
a. We can directly compute a single ***global metric*** from this list ```{=html} <!-- -->
```{=html}
a. We can compute a ***label-specific*** metric, from each list, with an additional constraint on Class 1 edges: both vertices should share the label being evaluated. b. We can then (weighted) average the label-specific metrics to get a single *global metric*. c. We can also directly compute a *global metric* directly across all the label-specific lists. ```{=html} <!-- -->
{=html}
<!-- -->
a. We can compute a sample-specific metric, from each list.
b. We can then average the sample-specific metrics to get a label-specific metric, but filtered like in 1a although it may not be quite as straightforward; 2.d might be better.
c. We can further (weighted) average the label-specific metrics to get a single global metric.
d. We can also directly compute a label-specific metric directly across the sample-specific lists, but filtered like in 1a.
e. We can also directly average the sample-specific metrics to get a single global metric.
f. We can also directly compute a single global metric directly across all the sample-specific lists.
g. We can also (weighted) average the label-specific metric in 2d to get a single global metric.
Notes:
sim_retrieval_average_precision_non_rep_i
is an example of 2.asim_retrieval_average_precision_non_rep_i_mean_i
is an example of 2.bCategorization based on https://scikit-learn.org/stable/modules/model_evaluation.html#multiclass-and-multilabel-classification (I did not double-check; there could be errors)
| Index | Averaging | Metric type | |:------|:-------------------------------------|:---------------| | 0.a | micro | global | | 1.a | micro | label-specific | | 1.b | macro | global | | 1.c | micro | global | | 2.b | macro | label-specific | | 2.c | macro of macro-label-specific | global | | 2.d | micro | label-specific | | 2.e | macro | global | | 2.f | micro | global | | 2.g | macro of micro-label-specific | global |
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.