cluscata: Perform a cluster analysis of subjects from a CATA experiment
In ClustBlock: Clustering of Datasets

cluscata

R Documentation

Perform a cluster analysis of subjects from a CATA experiment

Description

Clustering of subjects (blocks) from a CATA experiment. Each cluster of blocks is associated with a compromise computed by the CATATIS method. The hierarchical clustering is followed by a partitioning algorithm (consolidation). Non-binary data are accepted.

Usage

cluscata(Data, nblo, NameBlocks=NULL, NameVar=NULL, Noise_cluster=FALSE,
        Unique_threshold=TRUE, Itermax=30, Graph_dend=TRUE, Graph_bar=TRUE,
         printlevel=FALSE, gpmax=min(6, nblo-2), rhoparam=NULL,
         Testonlyoneclust=FALSE, alpha=0.05, nperm=50, Warnings=FALSE)

Arguments

`Data`	data frame or matrix where the blocks of binary variables are merged horizontally. If you have a different format, see `change_cata_format`
`nblo`	numerical. Number of blocks (subjects).
`NameBlocks`	string vector. Name of each block (subject). Length must be equal to the number of blocks. If NULL, the names are S1,...Sm. Default: NULL
`NameVar`	string vector. Name of each variable (attribute, the same names for each subject). Length must be equal to the number of attributes. If NULL, the colnames of the first block are taken. Default: NULL
`Noise_cluster`	logical. Should a noise cluster be computed? Default: FALSE
`Unique_threshold`	logical. Use same rho for every cluster? Default: TRUE
`Itermax`	numerical. Maximum of iteration for the partitioning algorithm. Default:30
`Graph_dend`	logical. Should the dendrogram be plotted? Default: TRUE
`Graph_bar`	logical. Should the barplot of the difference of the criterion and the barplot of the overall homogeneity at each merging step of the hierarchical algorithm be plotted? Default: TRUE
`printlevel`	logical. Print the number of remaining levels during the hierarchical clustering algorithm? Default: FALSE
`gpmax`	logical. What is maximum number of clusters to consider? Default: min(6, nblo-2)
`rhoparam`	numerical or vector. What is the threshold for the noise cluster? Between 0 and 1, high value can imply lot of blocks set aside. If NULL, automatic threshold is computed. Can be different for each group (in this case, provide a vector)
`Testonlyoneclust`	logical. Test if there is more than one cluster? Default: FALSE
`alpha`	numerical between 0 and 1. What is the threshold to test if there is more than one cluster? Default: 0.05
`nperm`	numerical. How many permutations are required to test if there is more than one cluster? Default: 50
`Warnings`	logical. Display warnings about the fact that none of the subjects in some clusters checked an attribute or product? Default: FALSE

Value

Each partitionK contains a list for each number of clusters of the partition, K=1 to gpmax with:

group: the clustering partition after consolidation. If Noise_cluster=TRUE, some subjects could be in the noise cluster ("K+1")
rho: the threshold for the noise cluster
homogeneity: homogeneity index (
s_with_compromise: similarity coefficient of each subject with its cluster compromise
weights: weight associated with each subject in its cluster
compromise: the compromise of each cluster
CA: list. the correspondance analysis results on each cluster compromise (coordinates, contributions...)
inertia: percentage of total variance explained by each axis of the CA for each cluster
s_all_cluster: the similarity coefficient between each subject and each cluster compromise
criterion: the CLUSCATA criterion error
param: parameters called
type: parameter passed to other functions

There is also at the end of the list:

dend: The CLUSCATA dendrogram
cutree_k: the partition obtained by cutting the dendrogram in K clusters (before consolidation).
overall_homogeneity_ng: percentage of overall homogeneity by number of clusters before consolidation (and after if there is no noise cluster)
diff_crit_ng: variation of criterion when a merging is done before consolidation (and after if there is no noise cluster)
test_one_cluster: decision and pvalue to know if there is more than one cluster
param: parameters called
type: parameter passed to other functions

References

Llobell, F., Cariou, V., Vigneau, E., Labenne, A., & Qannari, E. M. (2019). A new approach for the analysis of data and the clustering of subjects in a CATA experiment. Food Quality and Preference, 72, 31-39.
Llobell, F., Giacalone, D., Labenne, A., Qannari, E.M. (2019). Assessment of the agreement and cluster analysis of the respondents in a CATA experiment. Food Quality and Preference, 77, 184-190.

Examples


data(straw)
#with 40 subjects
res=cluscata(Data=straw[,1:(16*40)], nblo=40)
#plot(res, ngroups=3, Graph_dend=FALSE)
summary(res, ngroups=3)
#With noise cluster
res2=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE,
Graph_dend=FALSE, Graph_bar=FALSE)
#With noise cluster and defined rho threshold
#(high threshold for this example, you can put low threshold
#(ex: 0.2 or 0.3) to avoid set aside lot of respondents)
res3=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE,
Graph_dend=FALSE, Graph_bar=FALSE, rhoparam=0.6)
#different Noise cluster thresholds
res3=cluscata(Data=straw[,1:(16*40)], nblo=40, Noise_cluster=TRUE,
Graph_dend=FALSE, Graph_bar=FALSE, Unique_threshold= FALSE,
 rhoparam=c(0.6, 0.5,0.4))
#with all subjects
res=cluscata(Data=straw, nblo=114, printlevel=TRUE)


#Vertical format
data("fish")
Data=fish[1:66,2:30]
chang2=change_cata_format2(Data, nprod= 6, nattr= 27, nsub = 11, nsess= 1)
res3=cluscata(Data= chang2$Datafinal, nblo = 11, NameBlocks =  chang2$NameSub)

ClustBlock documentation built on June 8, 2025, 10:32 a.m.