findNhoodGroupMarkers: Identify post-hoc neighbourhood marker genes
In MikeDMorgan/miloR: Differential neighbourhood abundance testing on a graph

findNhoodGroupMarkers

R Documentation

Identify post-hoc neighbourhood marker genes

Description

This function will perform differential gene expression analysis on groups of neighbourhoods. Adjacent and concordantly DA neighbourhoods can be defined using groupNhoods or by the user. Cells between these aggregated groups are compared. For differential gene experession based on an input design within DA neighbourhoods see testDiffExp.

Usage

findNhoodGroupMarkers(
  x,
  da.res,
  assay = "logcounts",
  aggregate.samples = FALSE,
  sample_col = NULL,
  subset.row = NULL,
  gene.offset = TRUE,
  subset.nhoods = NULL,
  subset.groups = NULL,
  na.function = "na.pass"
)

Arguments

`x`	A `Milo` object containing single-cell gene expression and neighbourhoods.
`da.res`	A `data.frame` containing DA results, as expected from running `testNhoods`, as a `NhoodGroup` column specifying the grouping of neighbourhoods, as expected from
`assay`	A character scalar determining which `assays` slot to extract from the `Milo` object to use for DGE testing.
`aggregate.samples`	logical indicating wheather the expression values for cells in the same sample and neighbourhood group should be merged for DGE testing. This allows to perform testing exploiting the replication structure in the experimental design, rather than treating single-cells as independent replicates. The function used for aggregation depends on the selected gene expression assay: if `assay="counts"` the expression values are summed, otherwise we take the mean.
`sample_col`	a character scalar indicating the column in the colData storing sample information (only relevant if `aggregate.samples==TRUE`)
`subset.row`	A logical, integer or character vector indicating the rows of `x` to use for sumamrizing over cells in neighbourhoods.
`gene.offset`	A logical scalar the determines whether a per-cell offset is provided in the DGE GLM to adjust for the number of detected genes with expression > 0.
`subset.nhoods`	A logical, integer or character vector indicating which neighbourhoods to subset before aggregation and DGE testing (default: NULL).
`subset.groups`	A character vector indicating which groups to test for markers (default: NULL)
`na.function`	A valid NA action function to apply, should be one of `na.fail, na.omit, na.exclude, na.pass`.

Details

Using a one vs. all approach, each aggregated group of cells is compared to all others using the single-cell log normalized gene expression with a GLM (for details see limma-package), or the single-cell counts using a negative binomial GLM (for details see edgeR-package). When using the latter it is recommended to set gene.offset=TRUE as this behaviour adjusts the model offsets by the number of detected genes in each cell.

Value

A data.frame of DGE results containing a log fold change and adjusted p-value for each aggregated group of neighbourhoods. If return.groups then the return value is a list with the slots groups and dge containing the aggregated neighbourhood groups per single-cell and marker gene results, respectively.

Warning: If all neighbourhoods are grouped together, then it is impossible to run findNhoodMarkers. In this (hopefully rare) instance, this function will return a warning and return NULL.