Description Usage Arguments Details Value References Examples

Two-sample test for single-cell RNA-sequencing data to check for differences between two distributions (conditions) using the 2-Wasserstein distance: Semi-parametric implementation using a permutation test with a generalized Pareto distribution (GPD) approximation to estimate small p-values accurately

1 2 3 4 5 6 7 | ```
wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL)
## S4 method for signature 'matrix,vector'
wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL)
## S4 method for signature 'SingleCellExperiment,SingleCellExperiment'
wasserstein.sc(x, y, method = c("TS", "OS"), permnum = 10000, seed = NULL)
``` |

`x` |
matrix of single-cell RNA-sequencing expression data with genes in rows and samples (cells) in columns |

`y` |
vector of condition labels |

`method` |
method employed in the testing procedure: <e2><80><9c>OS<e2><80><9d> for the one-stage method (i.e. semi-parametric testing applied to all (zero and non-zero) expression values); <e2><80><9c>TS<e2><80><9d> for the two-stage method (i.e. semi-parametric testing applied to non-zero expression values only, combined with a separate testing for differential proportions of zero expression using logistic regression). If this argument is not given, a two-sided test is run by default. |

`permnum` |
number of permutations used in the permutation testing procedure. If this argument is not given, 10000 is used as default |

`seed` |
number to be used to generate a L'Ecuyer-CMRG seed, which itself seeds the generation of an nextRNGStream() for each gene to achieve reproducibility. By default, NULL is given and no seed is set. |

Details concerning the permutation testing procedures for single-cell RNA-sequencing data can be found in Schefzik and Goncalves (2019). Corresponds to the function .testWass when identifying the argument inclZero=TRUE in .testWass with the argument method=<e2><80><9d>OS<e2><80><9d> and the argument inclZero=FALSE in .testWass with the argument method=<e2><80><9d>TS<e2><80><9d>.

See the corresponding values in the description of the function .testWass, where the argument inclZero=TRUE in .testWass has to be identified with the argument method=<e2><80><9d>OS<e2><80><9d>, and the argument inclZero=FALSE in .testWass with the argument method=<e2><80><9d>TS<e2><80><9d>. A vector concerning the testing results, precisely (see Schefzik and Goncalves (2019) for details) in case of inclZero=TRUE:

d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation

d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation

d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation

d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation

location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples

size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples

shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples

rho: correlation coefficient in the quantile-quantile plot

pval: p-value of the semi-parametric 2-Wasserstein distance-based test

p.ad.gpd in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA)

N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05)(otherwise NA)

perc.loc: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

perc.size: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

perc.shape: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

decomp.error: relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation

pval.adj: adjusted p-value of the semi-parametric 2-Wasserstein distance-based test according to the method of Benjamini-Hochberg

In case of inclZero=FALSE:

d.wass: 2-Wasserstein distance between the two samples computed by quantile approximation

d.wass^2: squared 2-Wasserstein distance between the two samples computed by quantile approximation

d.comp^2: squared 2-Wasserstein distance between the two samples computed by decomposition approximation

d.comp: 2-Wasserstein distance between the two samples computed by decomposition approximation

location: location term in the decomposition of the squared 2-Wasserstein distance between the two samples

size: size term in the decomposition of the squared 2-Wasserstein distance between the two samples

shape: shape term in the decomposition of the squared 2-Wasserstein distance between the two samples

rho: correlation coefficient in the quantile-quantile plot

p.nonzero: p-value of the semi-parametric 2-Wasserstein distance-based test (based on non-zero expression only)

p.ad.gpd: in case the GPD fitting is performed: p-value of the Anderson-Darling test to check whether the GPD actually fits the data well (otherwise NA)

N.exc: in case the GPD fitting is performed: number of exceedances (starting with 250 and iteratively decreased by 10 if necessary) that are required to obtain a good GPD fit (i.e. p-value of Anderson-Darling test greater or eqaul to 0.05)(otherwise NA)

perc.loc: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

perc.size: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

perc.shape: fraction (in overall squared 2-Wasserstein distance obtained by the decomposition approximation

decomp.error: relative error between the squared 2-Wasserstein distance computed by the quantile approximation and the squared 2-Wasserstein distance computed by the decomposition approximation

p.zero: p-value of the test for differential proportions of zero expression (logistic regression model)

p.combined: combined p-value of p.nonzero and p.zero obtained by Fisher<e2><80><99>s method

p.adj.nonzero: adjusted p-value of the semi-parametric 2-Wasserstein distance-based test (based on non-zero expression only) according to the method of Benjamini-Hochberg

p.adj.zero: adjusted p-value of the test for differential proportions of zero expression (logistic regression model) according to the method of Benjamini-Hochberg

p.adj.combined: adjusted combined p-value of p.nonzero and p.zero obtained by Fisher<e2><80><99>s method according to the method of Benjamini-Hochberg

Schefzik and Goncalves (2019).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | ```
# some data in two conditions
cond1 <- matrix(rnorm(100, 42, 1), nrow=1)
cond2 <- matrix(rnorm(100, 45, 3), nrow=1)
# call wasserstein.sc with a matrix
# and a vector denoting conditions
dat <- cbind(cond1, cond2)
condition <- c(rep(1, 100), rep(2, 100))
wasserstein.sc(dat, condition, "TS", 100)
# call wasserstein.sc with two SingleCellExperiment objects
sce1 <- SingleCellExperiment::SingleCellExperiment(
assays=list(counts=cond1, logcounts=log10(cond1)))
sce2 <- SingleCellExperiment::SingleCellExperiment(
assays=list(counts=cond2, logcounts=log10(cond2)))
wasserstein.sc(sce1, sce2, "TS", 100)
# for reproducible p-values
wasserstein.sc(sce1, sce2, seed=123)
``` |

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.