fpGMM: Fit penalized Gaussian mixture model

fpGMMR Documentation

Fit penalized Gaussian mixture model

Description

For a given maximum number of clusters in the data, find the optimal penalization parameter lambda and the optimal clustering. Lambda penalized the mixing proportions. Model optimality is determined by best BIC.

Usage

fpGMM(x, kmax, lambda = NULL, tol=1e-06, itermax = 200)

Arguments

x

An n by d numeric matrix of n observations with dimension d.

kmax

Maximum number of clusters to consider, and the number of clusters the algorithm is initialized with.

lambda

Penalty parameter. If unspecified, a grid is automatically generated.

tol

Tolerance for the stopping rule for the EM algorithm. A lower tolerance will require more iterations. Defaults to 1e-06.

itermax

Maximum number of iterations of the EM algorithm to perform. Defaults to 200.

Details

The model is fit using the expectation-maximization algorithm. Clusters are initialized using kmeans with kmax clusters. Initial values of mixing proportions, means, and variance (covariance) matrices for EM are computed from these clusters.

Value

k

Optimal number of clusters.

prop

Mixing proportions in each cluster.

mu

Cluster means.

sigma

Cluster variance (covariance) matrices

cluster

Cluster labels for each of the n observations.

BIC

BIC of optimal fit.

lambda

Lambda of optimal fit.

ll

Log likelihood of the data given the optimal fit.

Author(s)

hbk5086@psu.edu

Examples

library(mvtnorm)
set.seed(123)
pal <- get_pals(1)
sim <- rbind(rmvnorm(500, mean = c(-1,0),
                sigma = matrix(c(.65,0,0,2), nrow = 2, byrow = T)),
         rmvnorm(500, mean = c(1,-3),
                sigma = matrix(c(2.25,.1,.1,.65), nrow = 2, byrow = T)),
         rmvnorm(100, mean = c(4,2),
                sigma = matrix(c(1,.5,.5,1), nrow = 2, byrow = T)))

################################################################################
# Not run:
# fit <- fpGMM(sim, lambda = 1, kmax = 10)
#
# par(mfrow = c(1,2))
# plot(sim, col = pal[rep(1:3,times = c(500, 500, 100))], pch = 16, cex = 0.75,
#     xlab = expression(x[1]), ylab = expression(x[2]), main = "true clusters")
# plot(sim, col = pal[fit$cluster], pch = 16, cex = 0.75,
#     xlab = expression(x[1]), ylab = expression(x[2]), main = "fitted clusters")
################################################################################

hillarykoch/CLIMB documentation built on Oct. 24, 2022, 4:27 a.m.