fpGMM: Fit penalized Gaussian mixture model
In hillarykoch/CLIMB: Composite LIkelihood eMpirical Bayes

fpGMM

R Documentation

Fit penalized Gaussian mixture model

Description

For a given maximum number of clusters in the data, find the optimal penalization parameter lambda and the optimal clustering. Lambda penalized the mixing proportions. Model optimality is determined by best BIC.

Usage

fpGMM(x, kmax, lambda = NULL, tol=1e-06, itermax = 200)

Arguments

`x`	An n by d numeric matrix of n observations with dimension d.
`kmax`	Maximum number of clusters to consider, and the number of clusters the algorithm is initialized with.
`lambda`	Penalty parameter. If unspecified, a grid is automatically generated.
`tol`	Tolerance for the stopping rule for the EM algorithm. A lower tolerance will require more iterations. Defaults to 1e-06.
`itermax`	Maximum number of iterations of the EM algorithm to perform. Defaults to 200.

Details

The model is fit using the expectation-maximization algorithm. Clusters are initialized using kmeans with kmax clusters. Initial values of mixing proportions, means, and variance (covariance) matrices for EM are computed from these clusters.

Value

`k`	Optimal number of clusters.
`prop`	Mixing proportions in each cluster.
`mu`	Cluster means.
`sigma`	Cluster variance (covariance) matrices
`cluster`	Cluster labels for each of the n observations.
`BIC`	BIC of optimal fit.
`lambda`	Lambda of optimal fit.
`ll`	Log likelihood of the data given the optimal fit.

Author(s)

hbk5086@psu.edu

Examples

library(mvtnorm)
set.seed(123)
pal <- get_pals(1)
sim <- rbind(rmvnorm(500, mean = c(-1,0),
                sigma = matrix(c(.65,0,0,2), nrow = 2, byrow = T)),
         rmvnorm(500, mean = c(1,-3),
                sigma = matrix(c(2.25,.1,.1,.65), nrow = 2, byrow = T)),
         rmvnorm(100, mean = c(4,2),
                sigma = matrix(c(1,.5,.5,1), nrow = 2, byrow = T)))

################################################################################
# Not run:
# fit <- fpGMM(sim, lambda = 1, kmax = 10)
#
# par(mfrow = c(1,2))
# plot(sim, col = pal[rep(1:3,times = c(500, 500, 100))], pch = 16, cex = 0.75,
#     xlab = expression(x[1]), ylab = expression(x[2]), main = "true clusters")
# plot(sim, col = pal[fit$cluster], pch = 16, cex = 0.75,
#     xlab = expression(x[1]), ylab = expression(x[2]), main = "fitted clusters")
################################################################################

hillarykoch/CLIMB documentation built on Oct. 24, 2022, 4:27 a.m.