estimate_corrected_score: Calculates geneset scores (corrected)

Description Usage Arguments Value

View source: R/scores.R

Description

This function was implemented by Scott Furlan in the spirit of the text below.

The following text is taken from: Puram, S. V. et al. Single-Cell Transcriptomic Analysis of Primary and Metastatic Tumor Ecosystems in Head and Neck Cancer. Cell 171, 1611.e1–1611.e24 (2017).

Cell scores (can be calculated) in order to evaluate the degree to which individual cells express a certain pre-defined expression program. These are initially based on the average expression of the genes from the pre-defined program in the respective cell: Given an input set of genes (Gj), we define a score, SCj(i), for each cell i, as the average relative expression (Er) of the genes in Gj. However, such initial scores may be confounded by cell complexity, as cells with higher complexity have more genes detected (i.e., less zeros) and consequently would be expected to have higher cell scores for any gene-set. To control for this effect we also add a control gene-set (Gjcont); we calculate a similar cell score with the control gene-set and subtract it from the initial cell scores:

SCj(i) = average[Er(Gj,i)] – average[Er(Gjcont,i)].

The control gene-set is selected in a way that ensures similar properties (distribution of expression levels) to that of the input gene-set to properly control for the effect of complexity. First, all analyzed genes are binned into 25 bins of equal size based on their aggregate expression levels (Ea). Next, for each gene in the given gene-set, we randomly select 100 genes from the same expression bin. In this way, the control gene-set has a comparable distribution of expression levels to that of the considered gene-set, and is 100-fold larger, such that its average expression is analogous to averaging over 100 randomly-selected gene-sets of the same size as the considered gene-set.

Usage

1
estimate_corrected_score(cds, marker_set1, fData_col = "gene_short_name")

Arguments

cds

Input cell_data_set object.

marker_set1

Vector of genes in the gene_metadata DataFrame (fData) that can be found in the column 'fData_col'

fData_col

Character string denoting the gene_metadata DataFrame (fData) column that contains the genes in marker_set1. Default = 'gene_short_name'

Value

Single cell scores for a give gene set that have been "corrected" using 100X genes with similar expression levels


scfurl/m3addon documentation built on Aug. 9, 2021, 5:30 p.m.