em_clust_mvn_miss: Clustering for Multivariate Normal with missing data via EM

View source: R/clust_mvn_miss.R

em_clust_mvn_missR Documentation

Clustering for Multivariate Normal with missing data via EM

Description

This function uses the EM algorithm to do clustering in P-dimensions. It assumes all clusters are spherically N(\mu_m, \Sigma_m I) and allows for missing data. Observations may have missing data elements but entire rows may not be missing.

Usage

em_clust_mvn_miss(data, nclust, itmax = 10000, tol = 10^-8)

Arguments

data

An 'n x p' data matrix or data frame.

nclust

The number of clusters.

itmax

The maximum number of iterations allowed. Defaults to 10000.

tol

Tuning parameter for convergence. Defaults to 10^-8.

Value

A list containing: it the number of iterations; clust_prop the estimated mixture proportions; clust_params the estimated mixture parameters; mix_est a vector of the estimated mixture for each data point; pseudo_log_lik the pseudo log likelihood of the data; bic the modeled BIC. and mix_est a vector of the estimated mixture for each data point.

See Also

em_clust_mvn, em_clust_norm, gen_clust

Examples

# generate test data
c1 <- gen_clust(100, 10, mean= c(seq(-8, 10, 2)), sd= rep(1, 10))
c2 <- gen_clust(100, 10, mean= rep(0, 10), sd= rep(2, 10))
c3 <- gen_clust(100, 10, mean= rep(10, 10), sd= rep(1, 10))
c_tot <- rbind(c1,c2,c3); rm(c1,c2,c3)
c_tot <- apply(c_tot, 2, function(x) {
  samp <- sample(1:length(x), floor(length(x) * .2), replace=FALSE)
  x[samp] <- NA
  return(x)
})
# run example
mvn_miss <- em_clust_mvn_miss(c_tot, nclust= 3)

alexWhitworth/emclustr documentation built on June 12, 2024, 10:13 p.m.