fconstr_pGMM: Fit constrained penalized Gaussian mixture model

View source: R/gmm_functions.R

fconstr_pGMMR Documentation

Fit constrained penalized Gaussian mixture model

Description

For a given maximum number of clusters in the data, find the optimal penalization parameter lambda and the optimal clustering. Lambda penalizes the mixing proportions. Model optimality is determined by best BIC. The constraints here are that, for any number of replicates, there are 3 components h = -1,0,1. h=0 implies no associations, while 1 and -1 imply positive and negative associations. The mean of the no association component is restricted to 0, while the positive and negative associations have mean mu and -mu. The standard deviation of the no association component is restricted to 1, while the positive and negative associations have standard deviation sigma. For two replicates where both have positive association or both have negative association, they have correlation rho. For two replicates, one with positive association and one with negative association, they have correlation -rho. Otherwise, correlation between replicates is restricted to 0.

Usage

fconstr_pGMM(x, lambda = NULL, tol=1e-06,
                itermax = 300, penaltyType = c("SCAD", "LASSO"))

Arguments

x

An n by d numeric matrix of n observations with dimension d.

lambda

Penalty parameter. If unspecified, a grid is automatically generated.

tol

Tolerance for the stopping rule for the EM algorithm. A lower tolerance will require more iterations. Defaults to 1e-06.

itermax

Maximum number of iterations of the EM algorithm to perform. Defaults to 300.

penaltyType

Character string specifying which term to use to penalize the cluster mixing proportions. Defaults to SCAD.

Details

The model is fit using the expectation-maximization algorithm. Clusters are initialized using kmeans with 3^d clusters. Initial values of mixing proportions, means, and variance (covariance) matrices for EM are computed from these clusters.

Value

k

Optimal number of clusters.

prop

Mixing proportions in each cluster.

mu

Cluster means.

sigma

Cluster variances.

rho

Correlation between replicates with association.

df

Degrees of freedom for each cluster.

cluster

Cluster labels for each of the n observations.

BIC

BIC of optimal fit.

lambda

Lambda of optimal fit.

ll

Log likelihood of the data given the optimal fit.

post_prob

Posterior probability of each observation arising from each cluster.

combos

Used internally with fconstr_pGMCM. Can be ignored by the user.

Author(s)

hbk5086@psu.edu

See Also

fconstr_pGMCM

Examples

library(mvtnorm)
set.seed(234)
pal <- sample(get_pals(4), 9, replace = FALSE)

n <- 3600
prop <- c(0.4,0.05,0.1,0.05,0.1,0.05,0.1,0.05,0.1)
mu <- 4
sigma <- 1.3
rho <- 0.8
sim <- rconstr_GMCM(n, prop, mu, sigma, rho, 2)

################################################################################
# Not run:
# fit <- fconstr_pGMM(sim$data, itermax = 100)
#
# par(mfrow = c(1,2))
# plot(sim$data, col = pal[sim$cluster], main = "observed")
# plot(sim$data, col = pal[fit$cluster], main = "GMM classification")
################################################################################

hillarykoch/CLIMB documentation built on Oct. 24, 2022, 4:27 a.m.