rGMM: Generate Data from Gaussian Mixture Models

View source: R/02_DataGen.R

rGMMR Documentation

Generate Data from Gaussian Mixture Models

Description

Generates an n\times d matrix of multivariate normal random vectors with observations (examples) as rows. If k=1, all observations belong to the same cluster. If k>1 the observations are generated via two-step procedure. First, the cluster membership is drawn from a multinomial distribution, with mixture proportions specified by pi. Conditional on cluster membership, the observation is drawn from a multivariate normal distribution, with cluster-specific mean and covariance. The cluster means are provided using means, and the cluster covariance matrices are provided using covs. If miss>0, missingness is introduced, completely at random, by setting that proportion of elements in the data matrix to NA.

Usage

rGMM(n, d = 2, k = 1, pi = NULL, miss = 0, means = NULL, covs = NULL)

Arguments

n

Observations (rows).

d

Observation dimension (columns).

k

Number of mixture components. Defaults to 1.

pi

Mixture proportions. If omitted, components are assumed equiprobable.

miss

Proportion of elements missing, miss\in[0,1).

means

Either a prototype mean vector, or a list of mean vectors. Defaults to the zero vector.

covs

Either a prototype covariance matrix, or a list of covariance matrices. Defaults to the identity matrix.

Value

Numeric matrix with observations as rows. Row numbers specify the true cluster assignments.

See Also

For estimation, see FitGMM.

Examples

set.seed(100)
# Single component without missingness.
# Bivariate normal observations.
cov <- matrix(c(1, 0.5, 0.5, 1), nrow = 2)
data <- rGMM(n = 1e3, d = 2, k = 1, means = c(2, 2), covs = cov)

# Single component with missingness.
# Trivariate normal observations.
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
cov <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = cov)

# Two components without missingness.
# Trivariate normal observations.
mean_list <- list(c(-2, -2, -2), c(2, 2, 2))
cov <- matrix(c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
data <- rGMM(n = 1e3, d = 3, k = 2, means = mean_list, covs = cov)

# Four components with missingness.
# Bivariate normal observations.
mean_list <- list(c(2, 2), c(2, -2), c(-2, 2), c(-2, -2))
cov <- 0.5 * diag(2)
data <- rGMM(
n = 1000, 
d = 2, 
k = 4, 
pi = c(0.35, 0.15, 0.15, 0.35), 
miss = 0.1, 
means = mean_list, 
covs = cov)

zrmacc/MNMix documentation built on April 30, 2023, 6:37 a.m.