propeller: Finding statistically significant differences in cell type...
In Oshlack/speckle: Statistical methods for analysing single cell RNA-seq data

propeller

R Documentation

Finding statistically significant differences in cell type proportions

Description

Calculates cell type proportions, performs a variance stabilising transformation on the proportions and determines whether the cell type proportions are statistically significant between different groups using linear modelling.

Usage

propeller(
  x = NULL,
  clusters = NULL,
  sample = NULL,
  group = NULL,
  trend = FALSE,
  robust = TRUE,
  transform = "logit"
)

Arguments

`x`	object of class `SingleCellExperiment` or `Seurat`
`clusters`	a factor specifying the cluster or cell type for every cell. For `SingleCellExperiment` objects this should correspond to a column called `clusters` in the `colData` assay. For `Seurat` objects this will be extracted by a call to `Idents(x)`.
`sample`	a factor specifying the biological replicate for each cell. For `SingleCellExperiment` objects this should correspond to a column called `sample` in the `colData` assay and for `Seurat` objects this should correspond to `x$sample`.
`group`	a factor specifying the groups of interest for performing the differential proportions analysis. For `SingleCellExperiment` objects this should correspond to a column called `group` in the `colData` assay. For `Seurat` objects this should correspond to `x$group`.
`trend`	logical, if true fits a mean variance trend on the transformed proportions
`robust`	logical, if true performs robust empirical Bayes shrinkage of the variances
`transform`	a character scalar specifying which transformation of the proportions to perform. Possible values include "asin" or "logit". Defaults to "logit".

Details

This function will take a SingleCellExperiment or Seurat object and extract the group, sample and clusters cell information. The user can either state these factor vectors explicitly in the call to the propeller function, or internal functions will extract them from the relevants objects. The user must ensure that group and sample are columns in the metadata assays of the relevant objects (any combination of upper/lower case is acceptable). For Seurat objects the clusters are extracted using the Idents function. For SingleCellExperiment objects, clusters needs to be a column in the colData assay.

The propeller function calculates cell type proportions for each biological replicate, performs a variance stabilising transformation on the matrix of proportions and fits a linear model for each cell type or cluster using the limma framework. There are two options for the transformation: arcsin square root or logit. Propeller tests whether there is a difference in the cell type proportions between multiple groups. If there are only 2 groups, a t-test is used to calculate p-values, and if there are more than 2 groups, an F-test (ANOVA) is used. Cell type proportions of 1 or 0 are accommodated. Benjamini and Hochberg false discovery rates are calculated to account to multiple testing of cell types/clusters.

Value

produces a dataframe of results

Author(s)

Belinda Phipson

References

Smyth, G.K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Volume 3, Article 3.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series, B, 57, 289-300.

Examples


  library(speckle)
  library(ggplot2)
  library(limma)

  # Make up some data
  # True cell type proportions for 4 samples
  p_s1 <- c(0.5,0.3,0.2)
  p_s2 <- c(0.6,0.3,0.1)
  p_s3 <- c(0.3,0.4,0.3)
  p_s4 <- c(0.4,0.3,0.3)

  # Total numbers of cells per sample
  numcells <- c(1000,1500,900,1200)

  # Generate cell-level vector for sample info
  biorep <- rep(c("s1","s2","s3","s4"),numcells)
  length(biorep)

  # Numbers of cells for each of the 3 clusters per sample
  n_s1 <- p_s1*numcells[1]
  n_s2 <- p_s2*numcells[2]
  n_s3 <- p_s3*numcells[3]
  n_s4 <- p_s4*numcells[4]

  # Assign cluster labels for 4 samples
  cl_s1 <- rep(c("c0","c1","c2"),n_s1)
  cl_s2 <- rep(c("c0","c1","c2"),n_s2)
  cl_s3 <- rep(c("c0","c1","c2"),n_s3)
  cl_s4 <- rep(c("c0","c1","c2"),n_s4)

  # Generate cell-level vector for cluster info
  clust <- c(cl_s1,cl_s2,cl_s3,cl_s4)
  length(clust)

  # Assume s1 and s2 belong to group 1 and s3 and s4 belong to group 2
  grp <- rep(c("grp1","grp2"),c(sum(numcells[1:2]),sum(numcells[3:4])))

  propeller(clusters = clust, sample = biorep, group = grp,
  robust = FALSE, trend = FALSE, transform="asin")

Oshlack/speckle documentation built on Oct. 16, 2022, 9:39 a.m.