Detecting Differential Expression in Single-Cell RNAseq data

knitr::opts_chunk$set(
    collapse = TRUE,
    comment = "#>"
)

The waddR package provides an adaptation of the semi-parametric testing procedure based on the 2-Wasserstein distance which is specifically tailored to identify differential distributions in single-cell RNA-seqencing (scRNA-seq) data.

In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.

Application to an example data set

This an example on how to analyse the expression of single cell data in two conditions. As example data we will look at single cell data from decidua and blood samples analysed by Vento-Tormo et al. 2018 (https://doi.org/10.1038/s41586-018-0698-6). For this demonstration we use only one replicate of the two conditions that has been normalized, scaled and is available for download on github.

First, we will first load all required packages:

suppressPackageStartupMessages({
    library("waddR")
    library("SingleCellExperiment")
    library("BiocFileCache")
})

Download the example data set

url.base <- "https://github.com/goncalves-lab/waddR-data/blob/master/data/"
sce.blood.url <- paste0(url.base, "sce_blood.RDa?raw=true")
sce.decidua.url <- paste0(url.base, "sce_decidua.RDa?raw=true")

getCachedPath <- function(url, rname){
    bfc <- BiocFileCache() # fire up cache
    res <- bfcquery(bfc, url, field="fpath", exact=TRUE)
    if (bfccount(res) == 0L)
        cachedFilePath <- bfcadd(bfc, rname=rname, fpath=url)
    else
        cachedFilePath <- bfcpath(bfc, res[["rid"]])
    cachedFilePath
}

load(getCachedPath(sce.blood.url, "sce.blood"))
load(getCachedPath(sce.decidua.url, "sce.decidua"))

We have downloaded the two SingleCellExperiment objects sce.blood and sce.decidua. Next we will randomly select a subset of genes with which we will proceed, a whole analysis would be out of scope for this vignette.

set.seed(28)
randgenes <- sample(rownames(sce.blood), 2500, replace=FALSE)
sce.blood <- sce.blood[randgenes, ]
sce.decidua <- sce.decidua[randgenes, ]

The input data to wasserstein.sc always discriminates two conditions. It can be given either as a data matrix and a vector explaining the condition of each column, or a SingleCellExperiment for each condition.

We will proceed to use the two SingleCellExperiment objects we loaded. The SingleCellExperiment objects passed to wasserstein.sc must contain a counts assay each.

assays(sce.blood)
assays(sce.decidua)

Before testing for differential expression in the two conditions, the counts should be normalized for cell or gene-specific biases. We can skip this here since the example data has already been normalized and an entire analysis would be out of scope for this vignette. The detailed processing steps performed on this data can be seen here.

wsres <- wasserstein.sc(sce.blood, sce.decidua, "OS")
head(wsres, n=10L)

The 2-Wasserstein single-cell testing procedure is performed on all genes in the data set -- one line of the output describes the result of testing one gene.

Note: The two-stage (TS) procedure includes a test for differential proportions of zero gene expression, considering the proportion of zeroes over all conditions and genes in a model. It may yield incorrect results, if it is applied on only a subset of genes is given.

Note: For reproducible p-values, see the "seed" argument.

: 2-Wasserstein single cell test output

| Output Column | Description | |:--------------|:------------------------------------------------------------| | d.wass | 2-Wasserstein distance between the two samples | | d.wass^2 | squared 2-Wasserstein distance between the two samples | | d.comp^2 | squared 2-Wasserstein distance (decomposition approx.) | | d.comp | 2-Wasserstein distance (decomposition approx.) | | location | location term in the squared 2-Wasserstein decomposition | | size | size term in the squared 2-Wasserstein decomposition | | shape | shape term in the squared 2-Wasserstein decomposition | | rho | correlation coefficient in the quantile-quantile plot | | pval | p-value of the semi-parametric test | | p.ad.gpd | p-value of the Anderson-Darling test to check whether the GPD actually fits the data well * | | N.exc | number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05) * | | perc.loc | fraction (in %) of the location part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | perc.size | fraction (in %) of the size part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | perc.shape | fraction (in %) of the shape part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | decomp.error | relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation | | p.zero | p-value of the test for differential proportions of zero expression (logistic regression model) ** | | p.combined | combined p-value of p.nonzero and p.zero obtained by Fisher’s method ** | | pval.adj | adjusted p-value of the semi-parametric 2-Wasserstein distance-based test according to the method of Benjamini-Hochberg | | p.adj.zero | adjusted p-value of the test for differential proportions of zero expression (logistic regression model) according to the method of Benjamini-Hochberg ** | | p.adj.combined | adjusted combined p-value of p.nonzero and p.zero obtained by Fisher’s method according to the method of Benjamini-Hochberg ** |

* only if GDP fitting is performed (NA otherwise); ** only if the two-stage test is run

See Also

Session Info

sessionInfo()


Try the waddR package in your browser

Any scripts or data that you put into this service are public.

waddR documentation built on Nov. 8, 2020, 8:32 p.m.