Simulate data from a Poisson mixture model

Description

This function simulates data from a Poisson mixture model, as described by Rau et al. (2011). Data are simulated with varying expression level (w_i) for 4 clusters. Clusters may be simulated with “high” or “low” separation, and three different options are available for the library size setting: “equal”, “A”, and “B”, as described by Rau et al. (2011).

Usage

1
PoisMixSim(n = 2000, libsize, separation)

Arguments

n

Number of observations

libsize

The type of library size difference to be simulated (“equal”, “A”, or “B”, as described by Rau et al. (2011))

separation

Cluster separation (“high” or “low”, as described by Rau et al. (2011))

Value

y

(n x q) matrix of simulated counts for n observations and q variables

labels

Vector of length n defining the true cluster labels of the simulated data

pi

Vector of length 4 (the number of clusters) containing the true value of π

lambda

(d x 4) matrix of λ values for d conditions (3 in the case of libsize =equal” or “A”, and 2 otherwise) in 4 clusters (see note below)

w

Row sums of y (estimate of \hat{w})

conditions

Vector of length q defining the condition (treatment group) for each variable (column) in y

Note

If one or more observations are simulated such that all variables have a value of 0, those rows are removed from the data matrix; as such, in some cases the simulated data y may have less than n rows.

The PMM-I model includes the parameter constraint ∑_k λ_{jk} r_j = 1, where r_j is the number of replicates in condition (treatment group) j. Similarly, the parameter constraint in the PMM-II model is ∑_j ∑_l λ_{jk}s_{jl} = 1, where s_{jl} is the library size for replicate l of condition j. The value of lambda corresponds to that used to generate the simulated data, where the library sizes were set as described in Table 2 of Rau et al. (2011). However, due to variability in the simulation process, the actually library sizes of the data y are not exactly equal to these values; this means that the value of lambda may not be directly compared to an estimated value of \hat{λ} as obtained from the PoisMixClus function.

Author(s)

Andrea Rau <andrea.rau@jouy.inra.fr>

References

Rau, A., Celeux, G., Martin-Magniette, M.-L., Maugis-Rabusseau, C. (2011). Clustering high-throughput sequencing data with Poisson mixture models. Inria Research Report 7786. Available at http://hal.inria.fr/inria-00638082.

Examples

1
2
3
4
5
6
7
8
9
set.seed(12345)

## Simulate data as shown in Rau et al. (2011)
## Library size setting "A", high cluster separation
## n = 200 observations

simulate <- PoisMixSim(n = 200, libsize = "A", separation = "high")
y <- simulate$y
conds <- simulate$conditions