estepRR: Function to perform the E-step for a Gaussian mixture...
In tclust: Robust Trimmed Clustering

estepRR

R Documentation

Function to perform the E-step for a Gaussian mixture distribution

Description

Compute the log PDF for each observation, the posterior probabilities and the objective function (total log-likelihood) for a Gaussian mixture distribution

Arguments

`ll`	Rcpp::NumericMatrix, n-by-k where `n` is the number of observations and `k` is the number of clusters.

Details

Formally a mixture model corresponds to the mixture distribution that represents the probability distribution of observations in the overall population. Mixture models are used to make statistical inferences about the properties of the sub-populations given only observations on the pooled population, without sub-population-identity information. Mixture modeling approaches assume that data at hand $y_1, ..., y_n in R^p come from a probability distribution with density given by the sum of k components

\sum_{j=1}^k \pi_j \phi( \cdot, \theta_j)

with \phi( \cdot, \theta_j) being the p-variate (generally multivariate normal) densities with parameters \theta_j, j=1, \ldots, k. Generally \theta_j= (\mu_j, \Sigma_j) where \mu_j is the population mean and \Sigma_j is the covariance matrix for component j. \pi_j is the (prior) probability of component j. The objective function is obj is equal to

obj = \log \left( \prod_{i=1}^n \sum_{j=1}^k \pi_j \phi (y_i; \; \theta_j) \right)

obj = \sum_{i=1}^n \log \left( \sum_{j=1}^k \pi_j \phi (y_i; \; \theta_j) \right)

where k is the number of components of the mixture and \pi_j are the component probabilitites and \theta_j are the parameters of the j-th mixture component.

Value

The function returns a list with the following elements:

obj The value of the objective function (total log-likelihood)
postprob an n-by-k matrix with the posterior probablilities
logpdf a vector of length n containing the log PDF for each observation

References

McLachlan, G.J.; Peel, D. (2000). Finite Mixture Models. Wiley. ISBN 0-471-00626-2

Examples

##      Generate two Gaussian normal distributions
##      and do not produce plots

       mu1 = c(1,2)
       sigma1 = matrix(c(2, 0, 0, .5), nrow=2, byrow=TRUE)    #[2 0; 0 .5];
       mu2 = c(-3, -5)
       sigma2 = matrix(c(1, 0, 0, 1), nrow=2, byrow=TRUE)
       n1 = 100
       n2 = 200
       Y = rbind(MASS::mvrnorm(n1, mu1, sigma1), 
                 MASS::mvrnorm(n2, mu2, sigma2))
       k = 2
       pi = c(1/3, 2/3)
       mu = rbind(mu1, mu2)
       sigma = array(0, dim=c(2,2,2))
       sigma[,,1] = sigma1
       sigma[,,2] = sigma2
       
       ll = matrix(0, nrow=n1+n2, ncol=2)
       for(j in 1:k)
           ll[,j] = log(pi[j]) +  tclust:::dmvnrm(Y, mu[j,], sigma[,,j])

       dd = tclust:::estepRR(ll)
       dd$obj
       dd$logpdf
       dd$postprob

tclust documentation built on June 29, 2025, 5:07 p.m.