View source: R/differential_expression.R

diff_mean_test | R Documentation |

Non-parametric differential expression test for sparse non-negative data

diff_mean_test( y, group_labels, compare = "each_vs_rest", R = 99, log2FC_th = log2(1.2), mean_th = 0.05, cells_th = 5, only_pos = FALSE, only_top_n = NULL, mean_type = "geometric", verbosity = 1 )

`y` |
A matrix of counts; must be (or inherit from) class dgCMatrix; genes are row, cells are columns |

`group_labels` |
The group labels (e.g. cluster identities); will be converted to factor |

`compare` |
Specifies which groups to compare, see details; default is 'each_vs_rest' |

`R` |
The number of random permutations used to derive the p-values; default is 99 |

`log2FC_th` |
Threshold to remove genes from testing; absolute log2FC must be at least
this large for a gene to be tested; default is |

`mean_th` |
Threshold to remove genes from testing; gene mean must be at least this large for a gene to be tested; default is 0.05 |

`cells_th` |
Threshold to remove genes from testing; gene must be detected (non-zero count) in at least this many cells in the group with higher mean; default is 5 |

`only_pos` |
Test only genes with positive fold change (mean in group 1 > mean in group2); default is FALSE |

`only_top_n` |
Test only the this number of genes from both ends of the log2FC spectrum after all of the above filters have been applied; useful to get only the top markers; only used if set to a numeric value; default is NULL |

`mean_type` |
Which type of mean to use; if |

`verbosity` |
Integer controlling how many messages the function prints; 0 is silent, 1 (default) is not |

Data frame of results

This model-free test is applied to each gene (row) individually but is optimized to make use of the efficient sparse data representation of the input. A permutation null distribution us used to assess the significance of the observed difference in mean between two groups.

The observed difference in mean is compared against a distribution
obtained by random shuffling of the group labels. For each gene every
random permutation yields a difference in mean and from the population of
these background differences we estimate a mean and standard
deviation for the null distribution.
This mean and standard deviation are used to turn the observed
difference in mean into a z-score and then into a p-value. Finally,
all p-values (for the tested genes) are adjusted using the Benjamini & Hochberg
method (fdr). The log2FC values in the output are `log2(mean1 / mean2)`

.
Empirical p-values are also calculated: `emp_pval = (b + 1) / (R + 1)`

where b is the number of times the absolute difference in mean from a random
permutation is at least as large as the absolute value of the observed difference
in mean, R is the number of random permutations. This is an upper bound of
the real empirical p-value that would be obtained by enumerating all possible
group label permutations.

There are multiple ways the group comparisons can be specified based on the compare
parameter. The default, `'each_vs_rest'`

, does multiple comparisons, one per
group vs all remaining cells. `'all_vs_all'`

, also does multiple comparisons,
covering all groups pairs. If compare is set to a length two character vector, e.g.
`c('T-cells', 'B-cells')`

, one comparison between those two groups is done.
To put multiple groups on either side of a single comparison, use a list of length two.
E.g. `compare = list(c('cluster1', 'cluster5'), c('cluster3'))`

.

clustering <- 1:ncol(pbmc) %% 2 vst_out <- vst(pbmc, return_corrected_umi = TRUE) de_res <- diff_mean_test(y = vst_out$umi_corrected, group_labels = clustering)

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.