celda_C: Cell clustering with Celda

Description Usage Arguments Value See Also Examples

View source: R/celda_C.R

Description

Clusters the columns of a count matrix containing single-cell data into K subpopulations.

Usage

1
2
3
4
5
celda_C(counts, sample.label = NULL, K, alpha = 1, beta = 1,
  algorithm = c("EM", "Gibbs"), stop.iter = 10, max.iter = 200,
  split.on.iter = 10, split.on.last = TRUE, seed = 12345, nchains = 3,
  initialize = c("random", "split"), count.checksum = NULL, z.init = NULL,
  logfile = NULL, verbose = TRUE)

Arguments

counts

Integer matrix. Rows represent features and columns represent cells.

sample.label

Vector or factor. Denotes the sample label for each cell (column) in the count matrix.

K

Integer. Number of cell populations.

alpha

Numeric. Concentration parameter for Theta. Adds a pseudocount to each cell population in each sample. Default 1.

beta

Numeric. Concentration parameter for Phi. Adds a pseudocount to each feature in each cell population. Default 1.

algorithm

String. Algorithm to use for clustering cell subpopulations. One of 'EM' or 'Gibbs'. The EM algorithm is faster, especially for larger numbers of cells. However, more chains may be required to ensure a good solution is found. Default 'EM'.

stop.iter

Integer. Number of iterations without improvement in the log likelihood to stop inference. Default 10.

max.iter

Integer. Maximum number of iterations of Gibbs sampling or EM to perform. Default 200.

split.on.iter

Integer. On every 'split.on.iter' iteration, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. To disable splitting, set to -1. Default 10.

split.on.last

Integer. After 'stop.iter' iterations have been performed without improvement, a heuristic will be applied to determine if a cell population should be reassigned and another cell population should be split into two clusters. If a split occurs, then 'stop.iter' will be reset. Default TRUE.

seed

Integer. Passed to 'set.seed()'. Default 12345.

nchains

Integer. Number of random cluster initializations. Default 3.

initialize

Chararacter. One of 'random' or 'split'. With 'random', cells are randomly assigned to a clusters. With 'split' cell clusters will be recurssively split into two clusters using ‘celda_C' until the specified K is reached. Default ’random'.

count.checksum

"Character. An MD5 checksum for the 'counts' matrix. Default NULL.

z.init

Integer vector. Sets initial starting values of z. If NULL, starting values for each cell will be randomly sampled from ‘1:K'. ’z.init' can only be used when ‘initialize = ’random''. Default NULL.

logfile

Character. Messages will be redirected to a file named 'logfile'. If NULL, messages will be printed to stdout. Default NULL.

verbose

Logical. Whether to print log messages. Default TRUE.

Value

An object of class 'celda_C' with the cell population clusters stored in in 'z'.

See Also

'celda_G()' for feature clustering and 'celda_CG()' for simultaneous clustering of features and cells. 'celdaGridSearch()' can be used to run multiple values of K and multiple chains in parallel.

Examples

1
2
celda.mod = celda_C(celda.C.sim$counts, K=celda.C.sim$K, 
                    sample.label=celda.C.sim$sample.label)

compbiomed/celda documentation built on May 25, 2019, 3:58 a.m.