de_param | R Documentation |
This function provides a convenient way to manage settings for differential expression tests in scrattch.hicat.
de_param(
low.th = 1,
padj.th = 0.01,
lfc.th = 1,
q1.th = 0.5,
q2.th = NULL,
q.diff.th = 0.7,
de.score.th = 150,
min.cells = 4,
min.genes = 5
)
low.th |
Lower boundary for normalized gene expression. Default = 1. See details. |
padj.th |
Upper boundary threshold for adjusted p-values for differential expression tests. Default = 0.01. See details. |
lfc.th |
Lower boundary threshold for log2(fold change) for differential expression tests. Default = 1 (i.e. 2-fold). See details |
q1.th |
Lower boundary threshold for foreground detection in proportional comparisons. Default = 0.5. See details. |
q2.th |
Upper boundary threshold for background detection in proportional comparisons. Default = NULL. See details. |
q.diff.th |
Threshold for scaled difference in proportions. Default = 0.7. See details. |
de.score.th |
Lower boundary of total differential expression scores for cluster comparisons. Default = 150. See details. |
min.cells |
The minimum number of cells allowed in each cluster. Default = 4. See details. |
min.genes |
The minimum number of differentially expressed genes required to separate clusters. Default = 5. See details. |
Calling de.param()
without additional parameters provides reasonable defaults for high depth (e.g. SMART-seq) datasets.
Gene detection threshold:
low.th
sets a lower bound for normalized gene expression to determine whether or not a gene is considered to be detected.
This is used to filter genes that are too low in expression to be reliably detected.
This parameter can be set globally by providing a single value, or per-gene by providing a named vector.
Differential expression test thresholds:
scrattch.hicat
utilizes limma
's eBayes or Chi-Square tests for differential gene expression. These parameters are used to
determine which genes are considered differentially expressed:
padj.th
is the threshold for adjusted p-values. Adjusted p-values must be below this threshold to be considered significant.
lfc.th
is the threshold for abs(log2(Fold Change)).
Cluster proportion thresholds:
We use q1.th
, q2.th
and q.diff.th
for additional tests based on the proportion of cells in each cluster that express each gene.
For every pair of clusters, we define q1 and q2 as the proportion of cells with expression greater than low.th
(above) in the foregound and background cluster, respectively.
We use q1.th
to select genes in a high proportion of foreground clusters, and q2.th
to select genes in a low proportion of background clusters.
Finally, we use q.diff.th
to test for the difference between the foreground and background proportions.
q1.th
: The minimum proportion of cells in the foreground cluster with expression greater than low.th
.
q2.th
: The maximum proportion of cells in the background cluster with expression greater than low.th
.
q.diff.th
: The scaled proportional difference between q1 and q2, defined as abs(q1 - q2) / max(q1, q2)
.
Cluster-wise p-value threshold:
After performing differential expression tests between a pair of clusters, we use de.score as a way to determine if enough overall differential expression is observed to consider the two clusters distinct from each other.
We define de.score for each gene as min(-log10(p.adj), 20)
. This sets a cap on the contribution of each gene to the cluster-wise de.score value at 20.
The de.score for a pair of clusters is the sum of the gene-wise de.score values.
Only genes passing the padj.th
and lfc.th
thresholds (above) contribute to the de.score.
de.score.th
is used as a minimum value for the cluster-wise de.score in a pairwise comparison between clusters.
Cell and gene count thresholds:
min.cells
is the minimum size allowed for a cluster. If a cluster size is below min.cells
, it will be merged with the nearest cluster.
min.genes
is the minimum number of differentially expressed genes (passing the padj.th
and lfc.th
thresholds, above)
required to consider two clusters separate.
returns a list of parameters for reuse
# Recommended initial parameters for SMART-Seq (> 8,000 genes per sample):
sm_param <- de_param(low.th = 1,
padj.th = 0.01,
lfc.th = 1,
q1.th = 0.5,
q2.th = NULL,
q.diff.th = 0.7,
de.score.th = 150,
min.cells = 4,
min.genes = 5)
# Recommended initial parameters for 10x Cells (> 3,000 genes per sample):
tx_param <- de_param(low.th = 1,
padj.th = 0.01,
lfc.th = 1,
q1.th = 0.4, # Reduced due to dropout
q2.th = NULL,
q.diff.th = 0.7,
de.score.th = 150,
min.cells = 10, # Increased due to higher number of cells
min.genes = 5)
# Recommended initial parameters for 10x Nuclei (> 1,000 genes per sample):
tx_param <- de_param(low.th = 1,
padj.th = 0.01,
lfc.th = 1,
q1.th = 0.3, # Reduced due to dropout
q2.th = NULL,
q.diff.th = 0.7,
de.score.th = 100, # Reduced due to decreased detection
min.cells = 10, # Increased due to higher number of cells
min.genes = 5)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.