# integration: Computes integration and acontamination of the clustering In varclust: Variables Clustering

## Description

Integartion and acontamination are measures of the quality of a clustering with a reference to a true partition. Let X = (x_1, … x_p) be the data set, A be a partition into clusters A_1, … A_n (true partition) and B be a partition into clusters B_1, …, B_m. Then for cluster A_j integration is eqaul to:

Int(A_j) = \frac{max_{k = 1, …, m} \# \{ i \in \{ 1, … p \}: x_i \in A_j \wedge x_i \in B_k \} }{\# A_j}

The B_k for which the value is maximized is called the integrating cluster of A_j. Then the integration for the whole clustering equals is Int(A,B) = \frac{1}{n} ∑_{j=1}^n Int(A_j) .The acontamination is defined by:

Acont(A_j) = \frac{ \# \{ i \in \{ 1, … p \}: x_i \in A_j \wedge x_i \in B_k \} }{\# B_k}

where B_k is the integrating cluster for A_j. Then the acontamination for the whole dataset is Acont(A,B) = \frac{1}{n} ∑_{j=1}^n Acont(A_j)

## Usage

 1 integration(group, true_group) 

## Arguments

 group A vector, first partition. true_group A vector, second (reference) partition.

## Value

An array containing values of integration and acontamination.

## References

M. Sołtys. Metody analizy skupień. Master’s thesis, Wrocław University of Technology, 2010

## Examples

 1 2 3 4 sim.data <- data.simulation(n = 20, SNR = 1, K = 2, numb.vars = 50, max.dim = 2) true_segmentation <- rep(1:2, each=50) mlcc.fit <- mlcc.reps(sim.data$X, numb.clusters = 2, max.dim = 2, numb.cores=1) integration(mlcc.fit$segmentation, true_segmentation) 

varclust documentation built on June 27, 2019, 5:08 p.m.