Introduction

LineagePulse is a differential expression algorithm for single-cell RNA-seq (scRNA-seq) data. LineagePulse is based on zero-inflated negative binomial noise model and can capture both discrete and continuous population structures: Discrete population structures are groups of cells (e.g. condition of an experiment or tSNE clusters). Continous population structures can for example be pseudotemporal orderings of cells or temporal orderings of cells. The main use and novelty of LineagePulse lies in its ability to fit gene expression trajectories on pseudotemporal orderings of cells well. Note that LineagePulse does not infer a pseudotemporal ordering but is a downstream analytic tool to analyse gene expression trajectories on a given pseudotemporal ordering (such as from diffusion pseudotime or monocle2).

To run LineagPulse on scRNA-seq data, the user needs to use a minimal input parameter set for the wrapper function runLineagePulse, which then performs all normalisation, model fitting and differential expression analysis steps without any more user interaction required:

Additionally, one can provide:

Lastly, the experienced user who has a solid grasp of the mathematical and algorithmic basis of LineagePulse may change the defaults of these advanced input options:

Differential expression analysis

Here, we present a differential expression analysis scenario on a longitudinal ordering. The differential expression results are in a data frame which can be accessed from the output object via list like properties ($). The core differential expression analysis result are p-value and false-discovery-rate corrected p-value of differential expression which are the result of a gene-wise hypothesis test of a non-constant expression model (impulse, splines or groups) versus a constant expression model.

library(LineagePulse)
lsSimulatedData <- simulateContinuousDataSet(
  scaNCells = 100,
  scaNConst = 10,
  scaNLin = 10,
  scaNImp = 10,
  scaMumax = 100,
  scaSDMuAmplitude = 3,
  vecNormConstExternal=NULL,
  vecDispExternal=rep(20, 30),
  vecGeneWiseDropoutRates = rep(0.1, 30))
objLP <- runLineagePulse(
  counts = lsSimulatedData$counts,
  dfAnnotation = lsSimulatedData$annot)
head(objLP$dfResults)

In addition to the raw p-values, one may be interested in further details of the expression models such as shape of the expression mean as a function of pseudotime, log fold changes (LFC) and global expression trends as function of pseudotime. We address each of these follow-up questions with separate sections in the following. Note that all of these follow-up questions are answered based on the model that were fit to compute the p-value of differential expression. Therefore, once runLineagePulse() was called once, no further model fitting is required.

Further inspection of results

Plot gene-wise trajectories

Multiple options are available for gene-wise expression trajectory plotting: Observations can be coloured by the posterior probability of drop-out (boolColourByDropout). Observations can be normalized based on the alternative expression model or taken as raw observerations for the scatter plot (boolH1NormCounts). Lineage contours can be added to aid visual interpretation of non-uniform population density in pseudotime related effects (boolLineageContour). Log counts can be displayed instead of counts if the fold changes are large (boolLogPlot). In any case, the output object of the gene-wise expression trajectors plotting function plotGene is a ggplot2 object which can then be printed or modified.

# plot the gene with the lowest p-value of differential expression
gplotExprProfile <- plotGene(
objLP = objLP, boolLogPlot = FALSE,
strGeneID = objLP$dfResults[which.min(objLP$dfResults$p),]$gene,
boolLineageContour = FALSE)
gplotExprProfile

The function plotGene also shows the H1 model fit under a negative binomial noise model ("H1(NB)") as a reference to show what the model fit looks like if drop-out is not accounted for.

Manual analysis of expression trajectories

LineagePulse provides the user with parameter extraction functions that allow the user to interact directly with the raw model fits for analytic tasks or questions not addressed above.

# extract the mean parameter fits per cell of the gene with the lowest p-value.
matMeanParamFit <- getFitsMean(
    lsMuModel = lsMuModelH1(objLP),
    vecGeneIDs = objLP$dfResults[which.min(objLP$dfResults$p),]$gene)
cat("Minimum fitted mean parameter: ", round(min(matMeanParamFit),1) )
cat("Mean fitted mean parameter: ", round(mean(matMeanParamFit),1) )

Fold changes

Given a discrete population structure, such as tSNE cluster or experimental conditions, a fold change is the ratio of the mean expression value of both groups. The definition of a fold change is less clear if a continous expression trajector is considered: Of interest may be for example the fold change from the first to the last cell on the expression trajectory or from the minimum to the maximum expression value. Note that in both cases, we compute fold changes on the model fit of the expression mean parameter which is corrected for noise and therefore more stable than the estimate based on the raw expression count observation.

# first, extract the model fits for a given gene again
vecMeanParamFit <- getFitsMean(
    lsMuModel = lsMuModelH1(objLP),
    vecGeneIDs = objLP$dfResults[which.min(objLP$dfResults$p),]$gene)
# compute log2-fold change from first to last cell on trajectory
idxFirstCell <- which.min(dfAnnotationProc(objLP)$pseudotime)
idxLastCell <- which.max(dfAnnotationProc(objLP)$pseudotime)
cat("LFC first to last cell on trajectory: ",
    round( (log(vecMeanParamFit[idxLastCell]) - 
                log(vecMeanParamFit[idxFirstCell])) / log(2) ,1) )
# compute log2-fold change from minimum to maximum value of expression trajectory
cat("LFC minimum to maximum expression value of model fit: ", 
    round( (log(max(vecMeanParamFit)) - 
                log(min(vecMeanParamFit))) / log(2),1) )

Global expression profiles

Global expression profiles or expression profiles across large groups of genes can be visualised via heatmaps of expression z-scores. One could extract the expression mean parameter fits as described above and create such heatmaps from scratch. LineaegePulse also offers a wrapper for creating such a heatmap:

# create heatmap with all differentially expressed genes
lsHeatmaps <- sortGeneTrajectories(
    vecIDs = objLP$dfResults[which(objLP$dfResults$padj < 0.01),]$gene,
    lsMuModel = lsMuModelH1(objLP),
    dirHeatmap=NULL)
print(lsHeatmaps$hmGeneSorted)

Session information

sessionInfo()


YosefLab/LineagePulse documentation built on May 6, 2019, 2:19 p.m.