propeller: Finding statistically significant differences in cell type...

View source: R/propeller.R

propellerR Documentation

Finding statistically significant differences in cell type proportions

Description

Calculates cell type proportions, performs a variance stabilising transformation on the proportions and determines whether the cell type proportions are statistically significant between different groups using linear modelling.

Usage

propeller(
  x = NULL,
  clusters = NULL,
  sample = NULL,
  group = NULL,
  trend = FALSE,
  robust = TRUE,
  transform = "logit"
)

Arguments

x

object of class SingleCellExperiment or Seurat

clusters

a factor specifying the cluster or cell type for every cell. For SingleCellExperiment objects this should correspond to a column called clusters in the colData assay. For Seurat objects this will be extracted by a call to Idents(x).

sample

a factor specifying the biological replicate for each cell. For SingleCellExperiment objects this should correspond to a column called sample in the colData assay and for Seurat objects this should correspond to x$sample.

group

a factor specifying the groups of interest for performing the differential proportions analysis. For SingleCellExperiment objects this should correspond to a column called group in the colData assay. For Seurat objects this should correspond to x$group.

trend

logical, if true fits a mean variance trend on the transformed proportions

robust

logical, if true performs robust empirical Bayes shrinkage of the variances

transform

a character scalar specifying which transformation of the proportions to perform. Possible values include "asin" or "logit". Defaults to "logit".

Details

This function will take a SingleCellExperiment or Seurat object and extract the group, sample and clusters cell information. The user can either state these factor vectors explicitly in the call to the propeller function, or internal functions will extract them from the relevants objects. The user must ensure that group and sample are columns in the metadata assays of the relevant objects (any combination of upper/lower case is acceptable). For Seurat objects the clusters are extracted using the Idents function. For SingleCellExperiment objects, clusters needs to be a column in the colData assay.

The propeller function calculates cell type proportions for each biological replicate, performs a variance stabilising transformation on the matrix of proportions and fits a linear model for each cell type or cluster using the limma framework. There are two options for the transformation: arcsin square root or logit. Propeller tests whether there is a difference in the cell type proportions between multiple groups. If there are only 2 groups, a t-test is used to calculate p-values, and if there are more than 2 groups, an F-test (ANOVA) is used. Cell type proportions of 1 or 0 are accommodated. Benjamini and Hochberg false discovery rates are calculated to account to multiple testing of cell types/clusters.

Value

produces a dataframe of results

Author(s)

Belinda Phipson

References

Smyth, G.K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, Volume 3, Article 3.

Benjamini, Y., and Hochberg, Y. (1995). Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series, B, 57, 289-300.

See Also

propeller.ttest propeller.anova lmFit, eBayes, getTransformedProps

Examples

  library(speckle)
  library(ggplot2)
  library(limma)

  # Make up some data
  # True cell type proportions for 4 samples
  p_s1 <- c(0.5,0.3,0.2)
  p_s2 <- c(0.6,0.3,0.1)
  p_s3 <- c(0.3,0.4,0.3)
  p_s4 <- c(0.4,0.3,0.3)

  # Total numbers of cells per sample
  numcells <- c(1000,1500,900,1200)

  # Generate cell-level vector for sample info
  biorep <- rep(c("s1","s2","s3","s4"),numcells)
  length(biorep)

  # Numbers of cells for each of the 3 clusters per sample
  n_s1 <- p_s1*numcells[1]
  n_s2 <- p_s2*numcells[2]
  n_s3 <- p_s3*numcells[3]
  n_s4 <- p_s4*numcells[4]

  # Assign cluster labels for 4 samples
  cl_s1 <- rep(c("c0","c1","c2"),n_s1)
  cl_s2 <- rep(c("c0","c1","c2"),n_s2)
  cl_s3 <- rep(c("c0","c1","c2"),n_s3)
  cl_s4 <- rep(c("c0","c1","c2"),n_s4)

  # Generate cell-level vector for cluster info
  clust <- c(cl_s1,cl_s2,cl_s3,cl_s4)
  length(clust)

  # Assume s1 and s2 belong to group 1 and s3 and s4 belong to group 2
  grp <- rep(c("grp1","grp2"),c(sum(numcells[1:2]),sum(numcells[3:4])))

  propeller(clusters = clust, sample = biorep, group = grp,
  robust = FALSE, trend = FALSE, transform="asin")


Oshlack/speckle documentation built on Oct. 16, 2022, 9:39 a.m.