celda_CG: Cell and feature clustering with Celda
In compbiomed/celda: CEllular Latent Dirichlet Allocation

Description Usage Arguments Value See Also Examples

Clusters the rows and columns of a count matrix containing single-cell data into L modules and K subpopulations, respectively.

celda_CG(counts, sample.label = NULL, K, L, alpha = 1, beta = 1,
  delta = 1, gamma = 1, algorithm = c("EM", "Gibbs"), stop.iter = 10,
  max.iter = 200, split.on.iter = 10, split.on.last = TRUE,
  seed = 12345, nchains = 3, initialize = c("random", "split"),
  count.checksum = NULL, z.init = NULL, y.init = NULL, logfile = NULL,
  verbose = TRUE)

`counts`	Integer matrix. Rows represent features and columns represent cells.
`sample.label`	Vector or factor. Denotes the sample label for each cell (column) in the count matrix.
`K`	Integer. Number of cell populations.
`L`	Integer. Number of feature modules.
`alpha`	Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.
`beta`	Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature module in each cell population. Default 1.
`delta`	Numeric. Concentration parameter for Psi. Adds a pseudocount to each feature in each module. Default 1.
`gamma`	Numeric. Concentration parameter for Eta. Adds a pseudocount to the number of features in each module. Default 1.
`algorithm`	String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm for cell clustering is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. Default 'EM'.
`stop.iter`	Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.
`max.iter`	Integer. Maximum number of iterations of Gibbs sampling to perform. Default 200.
`split.on.iter`	Integer. On every 'split.on.iter' iteration, a heuristic will be applied to determine if a cell population or feature module should be reassigned and another cell population or feature module should be split into two clusters. To disable splitting, set to -1. Default 10.
`split.on.last`	Integer. After ‘stop.iter' iterations have been performed without improvement, a heuristic will be applied to determine if a cell population or feature module should be reassigned and another cell population or feature module should be split into two clusters. If a split occurs, then ’stop.iter' will be reset. Default TRUE.
`seed`	Integer. Passed to 'set.seed()'. Default 12345.
`nchains`	Integer. Number of random cluster initializations. Default 3.
`initialize`	Chararacter. One of 'random' or 'split'. With 'random', cells and features are randomly assigned to a clusters. With 'split' cell and feature clusters will be recurssively split into two clusters using ‘celda_C' and 'celda_G', respectively, until the specified K and L is reached. Default ’random'.
`count.checksum`	Character. An MD5 checksum for the 'counts' matrix. Default NULL.
`z.init`	Integer vector. Sets initial starting values of z. If NULL, starting values for each cell will be randomly sampled from 1:K. 'z.init' can only be used when ‘initialize’ = 'random''. Default NULL.
`y.init`	Integer vector. Sets initial starting values of y. If NULL, starting values for each feature will be randomly sampled from 1:L. 'y.init' can only be used when ‘initialize = ’random''. Default NULL.
`logfile`	Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL.
`verbose`	Logical. Whether to print log messages. Default TRUE.

An object of class 'celda_CG' with the cell populations clusters stored in in 'z' and feature module clusters stored in 'y'.

'celda_G()' for feature clustering and 'celda_C()' for clustering cells. 'celdaGridSearch()' can be used to run multiple values of K/L and multiple chains in parallel.