initialize_clusters: Initialize clustering for EM
In diem: Debris-Containing Droplet Identification using EM

Description Usage Arguments Details Value

Given an SCE object, identify the cell types present.

1 2	initialize_clusters(x, use_var = TRUE, n_var = 2000, lss = 0.3, sf = "median", nn = 30, min_size = 20, verbose = FALSE)

`x`	An SCE object.
`use_var`	A logical indicating whether to subset the data to include only variable genes. This overrides `genes.use`. The default is TRUE as it may better identify cell types.
`n_var`	Number of variable genes to use.
`lss`	Numeric value of the span parameter of the loess regression.
`sf`	Either a numeric scaling factor to multiply counts after division by column sums, or "median" indicating to multiply by the median number of total read/UMI counts in droplets (default).
`nn`	Number of nearest neighbors to calculate in constructing the graph.
`min_size`	Numeric value giving the minimum number of droplets in cluster for it to be used for initialization as a cell type for EM.
`verbose`	verbosity.

Instead of randomly initializing the EM, cell types are estimated from droplets that are expected to contain cells/nuclei. The initialization is done with droplets in the cluster set. The data is then normalized by first calculating the variable genes. A loess regression line is fit between the log counts and log variance, and the only top genes ranked by residual are used to initialize the clusters. The number of genes is specified with n_var. Optionally, one can use all genes by setting use_var to FALSE. The span of the loess regression line is given by lss (default is 0.3). The data is normalized by dividing counts by the total counts per droplet. Then, the counts are multiplied by a scaling factor, given by sf (the median of total counts by default). Finally, the data is log transformed after adding a constant value of 1. After normalization, the k-nearest neighbors are identified in the cluster set. The number of nearest neighbors is specified by nn. Clusters are identified from the KNN graph using the Louvain algorithm. Finally, only clusters with at least min_size (20 by default) droplets are considered cell types.

An SCE object

diem documentation built on Nov. 16, 2019, 1:08 a.m.