knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The waddR
package provides an adaptation of the semi-parametric
testing procedure based on the 2-Wasserstein distance which is specifically
tailored to identify differential distributions in single-cell RNA-seqencing
(scRNA-seq) data.
In particular, a two-stage (TS) approach has been implemented that takes account of the specific nature of scRNA-seq data by separately testing for differential proportions of zero gene expression (using a logistic regression model) and differences in non-zero gene expression (using the semi-parametric 2-Wasserstein distance-based test) between two conditions.
This an example on how to analyse the expression of single cell data in two conditions. As example data we will look at single cell data from decidua and blood samples analysed by Vento-Tormo et al. 2018 (https://doi.org/10.1038/s41586-018-0698-6). For this demonstration we use only one replicate of the two conditions that has been normalized, scaled and is available for download on github.
First, we will first load all required packages:
suppressPackageStartupMessages({ library("waddR") library("SingleCellExperiment") library("BiocFileCache") })
Download the example data set
url.base <- "https://github.com/goncalves-lab/waddR-data/blob/master/data/" sce.blood.url <- paste0(url.base, "sce_blood.RDa?raw=true") sce.decidua.url <- paste0(url.base, "sce_decidua.RDa?raw=true") getCachedPath <- function(url, rname){ bfc <- BiocFileCache() # fire up cache res <- bfcquery(bfc, url, field="fpath", exact=TRUE) if (bfccount(res) == 0L) cachedFilePath <- bfcadd(bfc, rname=rname, fpath=url) else cachedFilePath <- bfcpath(bfc, res[["rid"]]) cachedFilePath } load(getCachedPath(sce.blood.url, "sce.blood")) load(getCachedPath(sce.decidua.url, "sce.decidua"))
We have downloaded the two SingleCellExperiment
objects sce.blood
and
sce.decidua
.
Next we will randomly select a subset of genes with which we will proceed, a
whole analysis would be out of scope for this vignette.
set.seed(28) randgenes <- sample(rownames(sce.blood), 2500, replace=FALSE) sce.blood <- sce.blood[randgenes, ] sce.decidua <- sce.decidua[randgenes, ]
The input data to wasserstein.sc
always discriminates two conditions.
It can be given either as a data matrix and a vector explaining the condition
of each column, or a SingleCellExperiment
for each condition.
We will proceed to use the two SingleCellExperiment
objects we loaded.
The SingleCellExperiment
objects passed to wasserstein.sc
must contain a
counts
assay each.
assays(sce.blood) assays(sce.decidua)
Before testing for differential expression in the two conditions, the counts should be normalized for cell or gene-specific biases. We can skip this here since the example data has already been normalized and an entire analysis would be out of scope for this vignette. The detailed processing steps performed on this data can be seen here.
wsres <- wasserstein.sc(sce.blood, sce.decidua, "OS") head(wsres, n=10L)
The 2-Wasserstein single-cell testing procedure is performed on all genes in the data set -- one line of the output describes the result of testing one gene.
Note: The two-stage (TS) procedure includes a test for differential proportions of zero gene expression, considering the proportion of zeroes over all conditions and genes in a model. It may yield incorrect results, if it is applied on only a subset of genes is given.
Note: For reproducible p-values, see the "seed" argument.
: 2-Wasserstein single cell test output
| Output Column | Description | |:--------------|:------------------------------------------------------------| | d.wass | 2-Wasserstein distance between the two samples | | d.wass^2 | squared 2-Wasserstein distance between the two samples | | d.comp^2 | squared 2-Wasserstein distance (decomposition approx.) | | d.comp | 2-Wasserstein distance (decomposition approx.) | | location | location term in the squared 2-Wasserstein decomposition | | size | size term in the squared 2-Wasserstein decomposition | | shape | shape term in the squared 2-Wasserstein decomposition | | rho | correlation coefficient in the quantile-quantile plot | | pval | p-value of the semi-parametric test | | p.ad.gpd | p-value of the Anderson-Darling test to check whether the GPD actually fits the data well * | | N.exc | number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05) * | | perc.loc | fraction (in %) of the location part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | perc.size | fraction (in %) of the size part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | perc.shape | fraction (in %) of the shape part with respect to the overall squared 2-Wasserstein distance obtained by the decomposition approximation | | decomp.error | relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation | | p.zero | p-value of the test for differential proportions of zero expression (logistic regression model) ** | | p.combined | combined p-value of p.nonzero and p.zero obtained by Fisher’s method ** | | pval.adj | adjusted p-value of the semi-parametric 2-Wasserstein distance-based test according to the method of Benjamini-Hochberg | | p.adj.zero | adjusted p-value of the test for differential proportions of zero expression (logistic regression model) according to the method of Benjamini-Hochberg ** | | p.adj.combined | adjusted combined p-value of p.nonzero and p.zero obtained by Fisher’s method according to the method of Benjamini-Hochberg ** |
* only if GDP fitting is performed (NA otherwise); ** only if the two-stage test is run
The waddR
package
Fast and accurate calculation of the Wasserstein distance
Two-sample test to check for differences between two distributions
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.