Unsupervised clustering with general GMCMs"
In GMCM: Fast Estimation of Gaussian Mixture Copula Models

set.seed(7869670)
opts_knit$set(self.contained = TRUE)

Initialization

The GMCM^1^ package is loaded.

#install.packages("GMCM")  # Uncomment to install the GMCM package
library("GMCM")

If GMCM is not installed, please uncomment the above line and rerun the script.

Load data

The data is loaded and the first rows are printed

ds <- read.table(file   = "{{gsub("\\", "/", data_file, fixed = TRUE)}}",
                 header = {{header}},
                 sep    = "{{sep}}",
                 quote  = "\{{quote}}")
head(ds, n = 4)

*Please correct the path stated for file to compile the report locally.

Next, the data is subset to the columns of interest and copula values estimated (ranked).

x <- ds[, {{cput(model_cols)}}]
u <- Uhat(x)
head(cbind(x, u), n = 2)

Initial parameters

The initial parameters, as chosen in the application, are given by

init_theta <- as.theta({{cput(theta)}})
print(init_theta)

Model fitting

With the data loaded and defined initial parameters, the model is now fitted.

theta <- fit.full.GMCM(u = u,  # Ranking function is applied automatically
                       theta = init_theta,
                       method = "{{fit_method}}",
                       max.ite = {{max_ite}},
                       verbose = FALSE)
print(theta)

The fitting method is set to "{{fit_method}}" with a maximum number of iterations of {{max_ite}}.

Unsupervised clustering

The estimated parameters are used to calculated posterior component probabilities on which the classification is based:

kappa <- get.prob(u, theta) # Compute component probabilities
colnames(kappa) <- paste0("comp", seq_len(ncol(kappa))) # Add names
comps <- apply(kappa, 1, which.max) # Find index of maximum entry for each row

{{ifelse(full_class_type == "thres_prob", c("# Set to NA if probability is not sufficiently large\n", sprintf("ok_max <- apply(kappa, 1, max) > %.3f\n", full_thres_prob), "comps[!ok_max] <- NA\n"), "")}}
cols <- topo.colors(ncol(kappa))[comps]
cols[is.na(comps)] <- "gray"
res <- data.frame(kappa, comp = comps, col = cols,
                  stringsAsFactors = FALSE)
head(res)
summary(res)

Results

The classes are counted by

table(res$comp)

The results are also displayed by plotting

plot(x, col = res$col, asp = 1) # Plot of raw values
plot(u, col = res$col, asp = 1) # Plot of copula values
z <- GMCM:::qgmm.marginal(Uhat(x), theta = theta) # Estimate latent process
plot(z, col = res$col, asp = 1) # Plot of estimated latent process

The fitted theta object can also be plotted directly:

plot(theta)

Session information

This report was generated using rmarkdown^2^ and knitr^3^ under the session given below. The report utilizes parameterized reports and knitr::spin.

sessionInfo()

References

Please cite the GMCM paper^1^ if you use the package or shiny app.

cites <- lapply(c("GMCM", "knitr", "rmarkdown"), citation)
fmt_cites <- unlist(lapply(cites, format, style = "text"))
cat(paste0("  ", seq_along(fmt_cites), ". ", fmt_cites, "\n"))