kmeans clustering of variables
Description
Iterative relocation algorithm of kmeans type which performs a partitionning of a set of variables. Variables can be quantitative, qualitative or a mixture of both. The center of a cluster of variables is a synthetic variable but is not a 'mean' as for classical kmeans. This synthetic variable is the first principal component calculated by PCAmix. PCAmix is defined for a mixture of qualitative and quantitative variables and includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. The homogeneity of a cluster of variables is defined as the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster, which is in all cases a numerical variable. Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.
Usage
1 2 
Arguments
X.quanti 
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). 
X.quali 
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). 
init 
either the number of clusters or an initial partition (a vector of integers indicating the cluster to which each variable is allocated).
If 
iter.max 
the maximum number of iterations allowed. 
nstart 
if 
matsim 
boolean, if 'TRUE', the matrices of similarities between variables in same cluster are calculated. 
Value
var 
a list of matrices of squared loadings i.e. for each cluster of variables, the squared loadings on first principal component of PCAmix. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the first PC (the cluster center). 
sim 
a list of matrices of similarities i.e. for each cluster, similarities between their variables.
The similarity between two variables is defined as a square cosine: the square of the Pearson correlation when the two variables are quantitative;
the correlation ratio when one variable is quantitative and the other one is qualitative;
the square of the canonical correlation between two sets of dummy variables, when the two variables are qualitative.

cluster 
a vector of integers indicating the cluster to which each variable is allocated. 
wss 
the withincluster sum of squares for each cluster: the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster. 
E 
the pourcentage of homogeneity which is accounted by the partition in k clusters. 
size 
the number of variables in each cluster. 
scores 
a n by k numerical matrix which contains the k cluster centers. The center of a cluster is a synthetic variable: the first principal component calculated by PCAmix.
The k columns of 
Author(s)
Marie Chavent <marie.chavent@ubordeaux2.fr>, Vanessa Kuentz, Benoit Liquet, Jerome Saracco
See Also
summary.clustvar
,print.clustvar
,stability
,cutreevar
,predict.clustvar
Examples
1 2 3 4 5 6 7 8 9 10 11 12  data(decathlon)
#choice of the number of clusters
tree < hclustvar(X.quanti=decathlon[,1:10])
stab < stability(tree,B=60)
#a random set of variables is chosen as the initial cluster centers, nstart=10 times
part1 < kmeansvar(X.quanti=decathlon[,1:10],init=5,nstart=10)
summary(part1)
#the partition from the hierarchical clustering is chosen as initial partition
part_init<cutreevar(tree,5)$cluster
part2<kmeansvar(X.quanti=decathlon[,1:10],init=part_init,matsim=TRUE)
summary(part2)
part2$sim
