estimate_corrected_score: Calculates geneset scores (corrected)
In scfurl/m3addon: This package adds to the popular "monocle3"

This function was implemented by Scott Furlan in the spirit of the text below.

The following text is taken from: Puram, S. V. et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell 171, 1611.e1–1611.e24 (2017).

Cell scores (can be calculated) in order to evaluate the degree to which individual cells express a certain pre-defined expression program. These are initially based on the average expression of the genes from the pre-defined program in the respective cell: Given an input set of genes (Gj), we define a score, SCj(i), for each cell i, as the average relative expression (Er) of the genes in Gj. However, such initial scores may be confounded by cell complexity, as cells with higher complexity have more genes detected (i.e., less zeros) and consequently would be expected to have higher cell scores for any gene-set. To control for this effect we also add a control gene-set (Gjcont); we calculate a similar cell score with the control gene-set and subtract it from the initial cell scores:

SCj(i) = average[Er(Gj,i)] – average[Er(Gjcont,i)].

The control gene-set is selected in a way that ensures similar properties (distribution of expression levels) to that of the input gene-set to properly control for the effect of complexity. First, all analyzed genes are binned into 25 bins of equal size based on their aggregate expression levels (Ea). Next, for each gene in the given gene-set, we randomly select 100 genes from the same expression bin. In this way, the control gene-set has a comparable distribution of expression levels to that of the considered gene-set, and is 100-fold larger, such that its average expression is analogous to averaging over 100 randomly-selected gene-sets of the same size as the considered gene-set.

1	estimate_corrected_score(cds, marker_set1, fData_col = "gene_short_name")

`cds`	Input cell_data_set object.
`marker_set1`	Vector of genes in the gene_metadata DataFrame (fData) that can be found in the column 'fData_col'
`fData_col`	Character string denoting the gene_metadata DataFrame (fData) column that contains the genes in marker_set1. Default = 'gene_short_name'