celda: CEllular Latent Dirichlet Allocation

"celda" stands for "CEllular Latent Dirichlet Allocation". It is a suite of Bayesian hierarchical models and supporting functions to perform gene and cell clustering for count data generated by single cell RNA-seq platforms. This algorithm is an extension of the Latent Dirichlet Allocation (LDA) topic modeling framework that has been popular in text mining applications. This package also includes a method called DecontX which can be used to estimate and remove contamination in single cell genomic data.

Installation Instructions

To install the latest stable release of celda from Bioconductor (requires R version >= 3.6):

if (!requireNamespace("BiocManager", quietly = TRUE))

To install the development version (R >= 3.6) of celda from GitHub using devtools:


NOTE For MAC OSX users, devtools::install_github() requires installation of libgit2. This can be installed via homebrew:

brew install libgit2

Also, if you receive installation errors when Rcpp is being installed and compiled, try following the steps outlined here to solve the issue:

NOTE If you are trying to install celda using Rstudio and get this error: could not find tools necessary to compile a package, you can try this:

options(buildtools.check = function(action) TRUE)

Vignettes and examples

To build the vignettes for Celda and DecontX during installation from GitHub, use the following command:

install_github("campbio/celda", build_vignettes = TRUE)

Note that installation may take an extra 5-10 minutes for building of the vignettes. The Celda and DecontX vignettes can then be accessed via the following commands:


For developers

Check out our Wiki for developer's guide if you want to contribute! - Celda Development Coding Style Guide - Celda Development Robust and Efficient Code - Celda Development Rstudio configuration - FAQ on how to use celda - FAQ on package development

