Sampling: Sampling Primary Clusters
In debsin/dropClust: Single Cell Transcriptome Analysis

Description Usage Arguments Details Value References Examples

Performs sampling from the primary clusters in an inverse exponential order of cluster size.

1 2	Sampling(object, nsamples = 500, method = "sps", optm_parameters = FALSE, pinit = 0.195, pfin = 0.9, K = 500)

`object`	A SingleCellExperiment object containing normalized expression values in `"normcounts"`.
`nsamples`	integer, total number of samples to return post sampling; ignored when `optm_parameters = FALSE`.
`method`	character, one of c("sps","random"). Structure Preserving Sampling (sps) selects proportional number of members from each cluster obtained from partitioning an approximate nearest neighbour graph.
`optm_parameters`	logical, when TRUE the parameters (`pinit, pfin, K`) are optimized such that exactly `nsamples` are returned. Optimization is performed using simulated annealing
`pinit`	numeric [0,0.5], minimum probability of that sampling occurs from a cluster, ignored when `optm_parameters = TRUE`.
`pfin`	numeric [0.5,1], maximum probability of that sampling occurs from a cluster, ignored when `optm_parameters = TRUE`.
`K`	numeric, scaling factor analogous to Boltzmann constant, ignored when `optm_parameters = TRUE`.

Sampling in inverse proportion of cluster size following a exponential decay equation. To ensure selection of sufficient representative transcriptomes from small clusters, an exponential decay function is used to determine the proportion of transcriptomes to be sampled from each cluster. For $i^th$ cluster, the proportion of expression profiles $p_i$ was obtained as follows.
p_i = p_l - e^-(S_i)/(K) where S_i is the size of cluster i, K is a scaling factor, p_i is the proportion of cells to be sampled from the $i^th$ Louvain cluster. $p_l$ and $p_u$ are lower and upper bounds of the proportion value respectively.

A SingleCellExperiment object with an additional column named Sampling in colData column. The column stores a a logical value against each cell to indicate if it has been sampled.

\insertRef

sengupta2013reformulateddropClust

library(SingleCellExperiment)
ncells <- 100
ngenes <- 2000
x <- matrix(rpois(ncells*ngenes, lambda = 10), ncol=ncells, nrow=ngenes, byrow=TRUE)
rownames(x) <- paste0("Gene", seq_len(ngenes))
colnames(x) <- paste0("Cell", seq_len(ncells))
sce <- SingleCellExperiment(list(counts=x))
sce <- CountNormalize(sce)
sce <- RankGenes(sce)
sce <- Sampling(sce)