testNhoods: Perform differential neighbourhood abundance testing

View source: R/testNhoods.R

testNhoodsR Documentation

Perform differential neighbourhood abundance testing

Description

This will perform differential neighbourhood abundance testing after cell counting.

Arguments

x

A Milo object with a non-empty nhoodCounts slot.

design

A formula or model.matrix object describing the experimental design for differential abundance testing. The last component of the formula or last column of the model matrix are by default the test variable. This behaviour can be overridden by setting the model.contrasts argument

design.df

A data.frame containing meta-data to which design refers to

kinship

(optional) An n X n matrix containing pair-wise relationships between observations, such as expected relationships or computed from SNPs/SNVs/other genetic variants. Row names and column names should correspond to the column names of nhoods(x) and rownames of design.df.

min.mean

A scalar used to threshold neighbourhoods on the minimum average cell counts across samples.

model.contrasts

A string vector that defines the contrasts used to perform DA testing. For a specific comparison we recommend a single contrast be passed to testNhoods. More details can be found in the vignette milo_contrasts.

fdr.weighting

The spatial FDR weighting scheme to use. Choice from max, neighbour-distance, graph-overlap or k-distance (default). If none is passed no spatial FDR correction is performed and returns a vector of NAs.

robust

If robust=TRUE then this is passed to edgeR and limma which use a robust estimation for the global quasilikelihood dispersion distribution. See edgeR and Phipson et al, 2013 for details.

norm.method

A character scalar, either "logMS", "TMM" or "RLE". The "logMS" method normalises the counts across samples using the log columns sums of the count matrix as a model offset. "TMM" uses the trimmed mean of M-values normalisation as described in Robinson & Oshlack, 2010, whilst "RLE" uses the relative log expression method by Anders & Huber, 2010, to compute normalisation factors relative to a reference computed from the geometric mean across samples. The latter methods provides a degree of robustness against false positives when there are very large compositional differences between samples.

cell.sizes

A named numeric vector of cell numbers per experimental samples. Names should correspond to the columns of nhoodCounts. This can be used to define the model normalisation factors based on a set of numbers instead of the colSums(nhoodCounts(x)). The example use-case is when performing an analysis of a subset of nhoods while retaining the need to normalisation based on the numbers of cells collected for each experimental sample to avoid compositional biases. Infinite or NA values will give an error.

reduced.dim

A character scalar referring to the reduced dimensional slot used to compute distances for the spatial FDR. This should be the same as used for graph building.

REML

A logical scalar that controls the variance component behaviour to use either restricted maximum likelihood (REML) or maximum likelihood (ML). The former is recommened to account for the bias in the ML variance estimates.

glmm.solver

A character scalar that determines which GLMM solver is applied. Must be one of: Fisher, HE or HE-NNLS. HE or HE-NNLS are recommended when supplying a user-defined covariance matrix.

max.iters

A scalar that determines the maximum number of iterations to run the GLMM solver if it does not reach the convergence tolerance threshold.

max.tol

A scalar that deterimines the GLMM solver convergence tolerance. It is recommended to keep this number small to provide some confidence that the parameter estimates are at least in a feasible region and close to a local optimum

subset.nhoods

A character, numeric or logical vector that will subset the analysis to the specific nhoods. If a character vector these should correspond to row names of nhoodCounts. If a logical vector then these should have the same length as nrow of nhoodCounts. If numeric, then these are assumed to correspond to indices of nhoodCounts - if the maximal index is greater than nrow(nhoodCounts(x)) an error will be produced.

fail.on.error

A logical scalar the determines the behaviour of the error reporting. Used for debugging only.

BPPARAM

A BiocParallelParam object specifying the arguments for parallelisation. By default this will evaluate using SerialParam(). See detailson how to use parallelisation in testNhoods.

Details

This function wraps up several steps of differential abundance testing using the edgeR functions. These could be performed separately for users who want to exercise more contol over their DA testing. By default this function sets the lib.sizes to the colSums(x), and uses the Quasi-Likelihood F-test in glmQLFTest for DA testing. FDR correction is performed separately as the default multiple-testing correction is inappropriate for neighbourhoods with overlapping cells. The GLMM testing cannot be performed using edgeR, however, a separate function fitGLMM can be used to fit a mixed effect model to each nhood (see fitGLMM docs for details).

Parallelisation is currently only enabled for the NB-LMM and uses the BiocParallel paradigm. In general the GLM implementation in glmQLFit is sufficiently fast that it does not require parallelisation. Parallelisation requires the user to pass a BiocParallelParam object with the parallelisation arguments contained therein. This relies on the user to specify how parallelisation - for details see the BiocParallel package.

Value

A data.frame of model results, which contain:

logFC:

Numeric, the log fold change between conditions, or for an ordered/continous variable the per-unit change in (normalized) cell counts per unit-change in experimental variable.

logCPM:

Numeric, the log counts per million (CPM), which equates to the average log normalized cell counts across all samples.

F:

Numeric, the F-test statistic from the quali-likelihood F-test implemented in edgeR.

PValue:

Numeric, the unadjusted p-value from the quasi-likelihood F-test.

FDR:

Numeric, the Benjamini & Hochberg false discovery weight computed from p.adjust.

Nhood:

Numeric, a unique identifier corresponding to the specific graph neighbourhood.

SpatialFDR:

Numeric, the weighted FDR, computed to adjust for spatial graph overlaps between neighbourhoods. For details see graphSpatialFDR.

Author(s)

Mike Morgan

Examples

library(SingleCellExperiment)
ux.1 <- matrix(rpois(12000, 5), ncol=400)
ux.2 <- matrix(rpois(12000, 4), ncol=400)
ux <- rbind(ux.1, ux.2)
vx <- log2(ux + 1)
pca <- prcomp(t(vx))

sce <- SingleCellExperiment(assays=list(counts=ux, logcounts=vx),
                            reducedDims=SimpleList(PCA=pca$x))

milo <- Milo(sce)
milo <- buildGraph(milo, k=20, d=10, transposed=TRUE)
milo <- makeNhoods(milo, k=20, d=10, prop=0.3)
milo <- calcNhoodDistance(milo, d=10)

cond <- rep("A", ncol(milo))
cond.a <- sample(1:ncol(milo), size=floor(ncol(milo)*0.25))
cond.b <- setdiff(1:ncol(milo), cond.a)
cond[cond.b] <- "B"
meta.df <- data.frame(Condition=cond, Replicate=c(rep("R1", 132), rep("R2", 132), rep("R3", 136)))
meta.df$SampID <- paste(meta.df$Condition, meta.df$Replicate, sep="_")
milo <- countCells(milo, meta.data=meta.df, samples="SampID")

test.meta <- data.frame("Condition"=c(rep("A", 3), rep("B", 3)), "Replicate"=rep(c("R1", "R2", "R3"), 2))
test.meta$Sample <- paste(test.meta$Condition, test.meta$Replicate, sep="_")
rownames(test.meta) <- test.meta$Sample
da.res <- testNhoods(milo, design=~Condition, design.df=test.meta[colnames(nhoodCounts(milo)), ], norm.method="TMM")
da.res


MikeDMorgan/miloR documentation built on Feb. 29, 2024, 6:20 p.m.