knitr::opts_chunk$set(echo = TRUE, fig.align = "center", fig.width = 7, fig.height = 5)

Sys.setenv(MC_DETERMINISTIC = 2)

ClusVis is an R package that performs a gaussian-based visualization of gaussian and non-gaussian Model-Based Clustering. This visualization is based on the probabilities of classification. See this preprint for more details about the method. It allows to visualize clusters as bivariate spherical gaussian.

First, we load the required packages.

library(RMixtComp) library(ClusVis)

To illustrate the use of ClusVis with RMixtComp output, we use the iris dataset and the congress dataset.

The iris dataset gives the measurements in centimeters of the variables sepal length and width and petal length and width, respectively, for 50 flowers from each of 3 species of iris.

data("iris") head(iris)

First, we learn a mixture model with 3 classes for the 4 measurements varaibles.

res <- mixtCompLearn(iris[, -5], nClass = 3, criterion = "BIC", nRun = 3, nCore = 1, verbose = FALSE)

Then, we apply the `clusvis`

function. This function requires 2 parameters: the logarithm of the probabilities of classification of every individuals and the proportion of the mixture.

logTik <- getTik(res, log = TRUE) prop <- getProportion(res) resVisu <- clusvis(logTik, prop)

The results can be displayed using the `plotDensityClusVisu`

function. The first graph is generated with the parameter `add.obs = TRUE`

. It overlays on the most discriminative map the curve of iso-probabilities of classification and the cloud of observations.

plotDensityClusVisu(resVisu, add.obs = TRUE)

With `add.obs = FALSE`

, the goal of the plot is to represents the overlap between the clusters. Each clusters is represented by its centers and a 95% confidence level border. The differene between entropies displayed in the title defines the accuracy of the representation. A difference closed to 0 means that the representation is accurate.

plotDensityClusVisu(resVisu, add.obs = FALSE)

Here, we note that two clusters are closed and so they contains flowers with similar measures whereas the other cluster contains flowers with very different measures from the two others.

This data set includes votes for each of the U.S. House of Representatives Congressmen on the 16 key votes identified by the CQA in 1984.

data("congress") head(congress)

First, we change the format of the data. The vote "n" is refactored as 1 and "y" as 2. "democrat" is refactored as 1 and "republican" as 2.

## MixtComp Format congress$V1 = refactorCategorical(congress$V1, c("democrat", "republican", "?"), c(1, 2, "?")) for(i in 2:ncol(congress)) congress[, i] = refactorCategorical(congress[, i], c("n", "y", "?"), c(1, 2, "?")) head(congress)

We run MixtComp with a Multinomial model for each variable.

model <- rep("Multinomial", ncol(congress)) names(model) = colnames(congress) res <- mixtCompLearn(congress, model = model, nClass = 4, criterion = "BIC", nRun = 3, nCore = 1)

As before, we extract the required parameters.

logTik <- getTik(res, log = TRUE) prop <- getProportion(res) head(logTik)

It is important to notice that there are a lot of `-Inf`

values in the variable `logTik`

because some probabilities to be in a cluster are exactly 0.
If there are too many infinite values, it is a problem for the `cluvis`

function. One way to avoid this problem is to replace infinite values with the logarithm of a epsilon.

logTik[is.infinite(logTik)] = log(1e-20) head(logTik)

Now, the `clusvis`

function can be run.

resVisu <- clusvis(logTik, prop)

And the two associated plots generated.

plotDensityClusVisu(resVisu, add.obs = TRUE)

plotDensityClusVisu(resVisu, add.obs = FALSE)

Sys.unsetenv("MC_DETERMINISTIC")

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.