liger-DEG: Find DEG between groups
In rliger: Linked Inference of Genomic Experimental Relationships

runPairwiseDEG

R Documentation

Find DEG between groups

Description

Two methods are supported: "pseudoBulk" and "wilcoxon". Pseudo-bulk method aggregates cells basing on biological replicates and calls bulk RNAseq DE methods, DESeq2 wald test, while Wilcoxon rank sum test is performed on single-cell level. runPairwiseDEG() is generally used for flexibly comparing two specific groups of cells, while runMarkerDEG() is used for a one-vs-rest marker test strategy.

While using pseudo-bulk method, it is generally recommended that you have these variables available in your object:

The cell type or cluster labeling. This can be obtained from prior study or computed with runCluster
The biological replicate labeling, most of the time the "dataset" variable automatically generated when the liger object is created. Users may use other variables if a "dataset" is merged from multiple replicates.
The condition labeling that reflects the study design, such as the treatment or disease status for each sample/dataset.

Please see below for detailed scenarios.

Usage

runPairwiseDEG(
  object,
  groupTest,
  groupCtrl,
  variable1 = NULL,
  variable2 = NULL,
  splitBy = NULL,
  method = c("pseudoBulk", "wilcoxon"),
  usePeak = FALSE,
  useReplicate = "dataset",
  nPsdRep = NULL,
  minCellPerRep = 3,
  printDiagnostic = FALSE,
  chunk = NULL,
  seed = 1,
  verbose = getOption("ligerVerbose", TRUE)
)

runMarkerDEG(
  object,
  conditionBy = NULL,
  splitBy = NULL,
  method = c("pseudoBulk", "wilcoxon"),
  useDatasets = NULL,
  usePeak = FALSE,
  useReplicate = "dataset",
  nPsdRep = NULL,
  minCellPerRep = 3,
  printDiagnostic = FALSE,
  chunk = NULL,
  seed = 1,
  verbose = getOption("ligerVerbose", TRUE)
)

runWilcoxon(
  object,
  data.use = NULL,
  compare.method = c("clusters", "datasets")
)

Arguments

`object`	A liger object, with normalized data available
`groupTest`, `groupCtrl`, `variable1`, `variable2`	Condition specification. See `?runPairwiseDEG` section Pairwise DEG Scenarios for detail.
`splitBy`	Name(s) of the variable(s) in `cellMeta` to split the comparison. See Details. Default `NULL`.
`method`	DEG test method to use. Choose from `"pseudoBulk"` or `"wilcoxon"`. Default `"pseudoBulk"`
`usePeak`	Logical. Whether to use peak count instead of gene count. Only supported when ATAC datasets are involved. Default `FALSE`.
`useReplicate`	`cellMeta` variable of biological replicate annotation. Only used with `method = "pseudoBulk"`. Default `"dataset"`.
`nPsdRep`	Number of pseudo-replicates to create. Only used when `method = "pseudoBulk"`. Default `NULL`. See Details.
`minCellPerRep`	Numeric, will not make pseudo-bulk for replicate with less than this number of cells. Default `3`.
`printDiagnostic`	Logical. Whether to show more detail when `verbose = TRUE`. Default `FALSE`.
`chunk`	Number of features to process at a time during Wilcoxon test. Useful when memory is limited. Default `NULL` will process all features at once.
`seed`	Random seed to use for pseudo-replicate generation. Default `1`.
`verbose`	Logical. Whether to show information of the progress. Default `getOption("ligerVerbose")` or `TRUE` if users have not set.
`conditionBy`	`cellMeta` variable(s). Marker detection will be performed for each level of this variable. Multiple variables will be combined. Default `NULL` uses default cluster.
`useDatasets`	Datasets to perform marker detection within. Default `NULL` will use all datasets.
`data.use`	Same as `useDatasets`.
`compare.method`	Choose from `"clusters"` (default) or `"datasets"`. `"clusters"` compares each cluster against all other cells, while `"datasets"` run within each cluster and compare each dataset against all other datasets.

Value

A data.frame with DEG information with the all or some of the following fields:

`feature`	Gene names
`group`	Test group name. Multiple tests might be present for each function call. This is the main variable to distinguish the tests. For a pairwise test, a row with a certain group name represents the test result between the this group against the other control group; When split by a variable, it would be presented in "split.group" format, meaning the stats is by comparing the group in the split level against the control group in the same split level. When running marker detection without splitting, a row with group "a" represents the stats of the gene in group "a" against all other cells. When running split marker detection, the group name would be in "split.group" format, meaning the stats is by comparing the group in the split level against all other cells in the same split level.
`logFC`	Log fold change
`pval`	P-value
`padj`	Adjusted p-value
`avgExpr`	Mean expression in the test group indicated by the "group" field. Only available for wilcoxon tests.
`statistic`	Wilcoxon rank-sum test statistic. Only available for wilcoxon tests.
`auc`	Area under the ROC curve. Only available for wilcoxon tests.
`pct_in`	Percentage of cells in the test group, indicated by the "group" field, that express the feature. Only available for wilcoxon tests.
`pct_out`	Percentage of cells in the control group or other cells, as explained for the "group" field, that express the feature. Only available for wilcoxon tests.

Using Wilcoxon rank-sum test

Wilcoxon rank-sum test works for each gene and is based on the rank of the expression in each cell. LIGER provides dataset integration but does not "correct" the expression values. Projects with strong batch effects or integrate drastically different modalities should be cautious when using this method.

Comparing difference between/across cell types

Most of times, people would want to know what cell types are for each cluster after clustering. This can be done with a marker detection method that test each cluster against all the other cells. This can be done with a command like runMarkerDEG(object, conditionBy = "cluster_var"). When using default pseudo-bulk method, users should additionaly determine the pseudo-bulk setup parameters. If the real biological replicate variable is available, it should be supplied to argument useReplicate, otherwise, pseudo-replicates should be created. See "Pseudo-Replicate" section for more.

Compare between conditions

It is frequently needed to identify the difference between conditions. Users can simply set conditionBy = "condition_var". However, most of time, such comparisons should be ideally done in a per-cluster manner. This can be done by setting splitBy = "cluster_var". This will run a loop for each cluster, and within the group of cells, compare each condition against all other cells in the cluster.

In the scenario when users only need to compare two conditions for each cluster, running runPairwiseDEG(object, groupTest = "condition1", groupCtrl = "condition2", variable1 = "condition_var", splitBy = "cluster_var") would address the need.

For both use case, if pseudo-bulk (default) method is used, users should determine the pseudo-bulk setup parameters as mentioned in the previous section.

Detailed `runMarkerDEG` usage

Marker detection is performed in a one vs. rest manner. The grouping of such condition is specified by conditionBy, which should be a column name in cellMeta. When splitBy is specified as another variable name in cellMeta, the marker detection will be iteratively done for within each level of splitBy variable.

For example, when conditionBy = "celltype" and splitBy = NULL, marker detection will be performed by comparing all cells of "celltype_i" against all other cells, and etc. This is analogous to the old version when running runWilcoxon(method = "cluster").

When conditionBy = "gender" and splitBy = "leiden_cluster", marker detection will be performed by comparing "gender_i" cells from "cluster_j" against other cells from "cluster_j", and etc. This is analogous to the old version when running runWilcoxon(method = "dataset").

Detailed `runPairwiseDEG` usage

Users can select classes of cells from a variable in cellMeta. variable1 and variable2 are used to specify a column in cellMeta, and groupTest and groupCtrl are used to specify existing classes from variable1 and variable2, respectively. When variable2 is missing, groupCtrl will be considered from variable1.

For example, when variable1 = "celltype" and variable2 = NULL, groupTest and groupCtrl should be valid cell types in object$celltype.

When variable1 is "celltype" and variable2 is "gender", groupTest should be a valid cell type from object$celltype and groupCtrl should be a valid class from object$gender.

When both variable1 and variable2 are missing, groupTest and groupCtrl should be valid index of cells in object.

Pseudo-Replicate

Pseudo-replicate assignment is a technique to complement the lack of real biological replicates when using pseudo-bulk DE methods. LIGER's pseudo-bulk method generally requires that each comparison group has at least 3 replicates each composed of at least 3 cells, in order to ensure the statistic power. When less than 3 real replicates are found for a comparison, the default setting (nPsdRep = NULL) splits each into 3 pseudo-replicates, otherwise no pseudo-replicates are automatically generated. When nPsdRep is given a number, LIGER will always go through each comparison group and split each real replicate into the given number of pseudo-replicates.

Examples


pbmc$leiden_cluster <- pbmcPlot$leiden_cluster

# Identify cluster markers
degStats1 <- runMarkerDEG(pbmc, conditionBy = "leiden_cluster")

# Compare "stim" data against "ctrl" data within each cluster
degStats3 <- runPairwiseDEG(pbmc, groupTest = "stim", groupCtrl = "ctrl",
                            variable1 = "dataset",
                            splitBy = "leiden_cluster",
                            minCellPerRep = 4)

rliger documentation built on June 8, 2025, 1:56 p.m.

rliger index

Package overview README.md Data integration with LIGER

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

rliger
Linked Inference of Genomic Experimental Relationships

liger-DEG: Find DEG between groups
In rliger: Linked Inference of Genomic Experimental Relationships

Find DEG between groups

Description

Usage

Arguments

Value

Using Wilcoxon rank-sum test

Comparing difference between/across cell types

Compare between conditions

Detailed `runMarkerDEG` usage

Detailed `runPairwiseDEG` usage

Pseudo-Replicate

Examples

Related to liger-DEG in rliger...

R Package Documentation

Browse R Packages

We want your feedback!

rliger Linked Inference of Genomic Experimental Relationships

liger-DEG: Find DEG between groups In rliger: Linked Inference of Genomic Experimental Relationships

Find DEG between groups

Description

Usage

Arguments

Value

Using Wilcoxon rank-sum test

Comparing difference between/across cell types

Compare between conditions

Detailed runMarkerDEG usage

Detailed runPairwiseDEG usage

Pseudo-Replicate

Examples

Related to liger-DEG in rliger...

R Package Documentation

Browse R Packages

We want your feedback!

rliger
Linked Inference of Genomic Experimental Relationships

liger-DEG: Find DEG between groups
In rliger: Linked Inference of Genomic Experimental Relationships

Detailed `runMarkerDEG` usage

Detailed `runPairwiseDEG` usage