simulate_data: Generate simulated data for factorization

Description Usage Arguments Details Value Examples

View source: R/utils.R

Description

Use one of two schemes to generate simulated data suitable for testing factorization.

Usage

1
2
simulate_data(nfeatures, nsamples, generate.factors = FALSE,
  nfactor = 10, alpha0 = 0.5, shuffle = TRUE)

Arguments

nfeatures

Number of features m (e.g., genes).

nsamples

Vector of sample sizes in each cluster. Rank r is equal to the length of this vector. Sum of elements is the total sample size n.

generate.factors

Generate factor matrices W and H, each with dimension n x r and r x n. If FALSE, factor matrices are not used and count data are generated directly from r multinomials for m genes.

nfactor

Total RNA count of multinomials for each cluster with generate.factors = FALSE. Small nfactor will yield sparse count matrix.

alpha0

Variance parameter of Dirichlet distribution from which multinomial probabilities are sampled with generate.factors = FALSE.

shuffle

Randomly permute rows and columns of count matrix.

Details

In one scheme (generate.factors = TRUE), simulated factor matrices W and H are used to build count data X = WH. In the second scheme, factor matrices are not used and X is sampled directly from r (rank requested) sets of multinomial distributions.

Value

If generate.factors = TRUE, list of components w (basis matrix, nfeatures x rank), h (coefficient matrix, rank x ncells, where ncells is equal to n, the sum of nsamples), and x, a matrix of Poisson deviates with mean W x H. If generate.factors = FALSE, only the count matrix x is in the list.

Examples

1
2
3
4
set.seed(1)
x <- simulate_data(nfeatures=10,nsamples=c(20,20,60,40,30))
s <- scNMFSet(x)
s

hjunwoo/ccfindR documentation built on Oct. 4, 2019, 10:31 a.m.