de_param: Set differential expression (DE) parameters for genes and...

View source: R/de.genes.R

de_paramR Documentation

Set differential expression (DE) parameters for genes and clusters.

Description

This function provides a convenient way to manage settings for differential expression tests in scrattch.hicat.

Usage

de_param(
  low.th = 1,
  padj.th = 0.01,
  lfc.th = 1,
  q1.th = 0.5,
  q2.th = NULL,
  q.diff.th = 0.7,
  de.score.th = 150,
  min.cells = 4,
  min.genes = 5
)

Arguments

low.th

Lower boundary for normalized gene expression. Default = 1. See details.

padj.th

Upper boundary threshold for adjusted p-values for differential expression tests. Default = 0.01. See details.

lfc.th

Lower boundary threshold for log2(fold change) for differential expression tests. Default = 1 (i.e. 2-fold). See details

q1.th

Lower boundary threshold for foreground detection in proportional comparisons. Default = 0.5. See details.

q2.th

Upper boundary threshold for background detection in proportional comparisons. Default = NULL. See details.

q.diff.th

Threshold for scaled difference in proportions. Default = 0.7. See details.

de.score.th

Lower boundary of total differential expression scores for cluster comparisons. Default = 150. See details.

min.cells

The minimum number of cells allowed in each cluster. Default = 4. See details.

min.genes

The minimum number of differentially expressed genes required to separate clusters. Default = 5. See details.

Details

Calling de.param() without additional parameters provides reasonable defaults for high depth (e.g. SMART-seq) datasets.

Gene detection threshold:

low.th sets a lower bound for normalized gene expression to determine whether or not a gene is considered to be detected. This is used to filter genes that are too low in expression to be reliably detected.
This parameter can be set globally by providing a single value, or per-gene by providing a named vector.

Differential expression test thresholds:

scrattch.hicat utilizes limma's eBayes or Chi-Square tests for differential gene expression. These parameters are used to determine which genes are considered differentially expressed:
padj.th is the threshold for adjusted p-values. Adjusted p-values must be below this threshold to be considered significant.
lfc.th is the threshold for abs(log2(Fold Change)).

Cluster proportion thresholds:

We use q1.th, q2.th and q.diff.th for additional tests based on the proportion of cells in each cluster that express each gene. For every pair of clusters, we define q1 and q2 as the proportion of cells with expression greater than low.th (above) in the foregound and background cluster, respectively. We use q1.th to select genes in a high proportion of foreground clusters, and q2.th to select genes in a low proportion of background clusters. Finally, we use q.diff.th to test for the difference between the foreground and background proportions.

q1.th: The minimum proportion of cells in the foreground cluster with expression greater than low.th.
q2.th: The maximum proportion of cells in the background cluster with expression greater than low.th.
q.diff.th: The scaled proportional difference between q1 and q2, defined as abs(q1 - q2) / max(q1, q2) .

Cluster-wise p-value threshold:

After performing differential expression tests between a pair of clusters, we use de.score as a way to determine if enough overall differential expression is observed to consider the two clusters distinct from each other.

We define de.score for each gene as min(-log10(p.adj), 20). This sets a cap on the contribution of each gene to the cluster-wise de.score value at 20.
The de.score for a pair of clusters is the sum of the gene-wise de.score values.

Only genes passing the padj.th and lfc.th thresholds (above) contribute to the de.score.

de.score.th is used as a minimum value for the cluster-wise de.score in a pairwise comparison between clusters.

Cell and gene count thresholds:

min.cells is the minimum size allowed for a cluster. If a cluster size is below min.cells, it will be merged with the nearest cluster.

min.genes is the minimum number of differentially expressed genes (passing the padj.th and lfc.th thresholds, above) required to consider two clusters separate.

Value

returns a list of parameters for reuse

Examples


# Recommended initial parameters for SMART-Seq (> 8,000 genes per sample):

sm_param <- de_param(low.th = 1,
                     padj.th = 0.01,
                     lfc.th = 1,
                     q1.th = 0.5,
                     q2.th = NULL,
                     q.diff.th = 0.7,
                     de.score.th = 150,
                     min.cells = 4,
                     min.genes = 5)

# Recommended initial parameters for 10x Cells (> 3,000 genes per sample):

tx_param <- de_param(low.th = 1,
                     padj.th = 0.01,
                     lfc.th = 1,
                     q1.th = 0.4, # Reduced due to dropout
                     q2.th = NULL,
                     q.diff.th = 0.7,
                     de.score.th = 150,
                     min.cells = 10, # Increased due to higher number of cells
                     min.genes = 5)

# Recommended initial parameters for 10x Nuclei (> 1,000 genes per sample):

tx_param <- de_param(low.th = 1,
                     padj.th = 0.01,
                     lfc.th = 1,
                     q1.th = 0.3, # Reduced due to dropout
                     q2.th = NULL,
                     q.diff.th = 0.7,
                     de.score.th = 100, # Reduced due to decreased detection
                     min.cells = 10, # Increased due to higher number of cells
                     min.genes = 5) 


AllenInstitute/scrattch.hicat documentation built on Oct. 20, 2023, 6:55 a.m.