Description Usage Arguments Details Value References See Also Examples
Iterative relocation algorithm of k-means type which performs a partitionning of a set of variables. Variables can be quantitative, qualitative or a mixture of both. The center of a cluster of variables is a synthetic variable but is not a 'mean' as for classical k-means. This synthetic variable is the first principal component calculated by PCAmix. PCAmix is defined for a mixture of qualitative and quantitative variables and includes ordinary principal component analysis (PCA) and multiple correspondence analysis (MCA) as special cases. The homogeneity of a cluster of variables is defined as the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster, which is in all cases a numerical variable. Missing values are replaced by means for quantitative variables and by zeros in the indicator matrix for qualitative variables.
1 2 |
X.quanti |
a numeric matrix of data, or an object that can be coerced to such a matrix (such as a numeric vector or a data frame with all numeric columns). |
X.quali |
a categorical matrix of data, or an object that can be coerced to such a matrix (such as a character vector, a factor or a data frame with all factor columns). |
init |
either the number of clusters or an initial partition (a vector
of integers indicating the cluster to which each variable is allocated).
If |
iter.max |
the maximum number of iterations allowed. |
nstart |
if |
matsim |
boolean, if 'TRUE', the matrices of similarities between variables in same cluster are calculated. |
If the quantitative and qualitative data are in a same dataframe, the function
splitmix
can be used to extract automatically the qualitative and the quantitative
data in two separated dataframes.
var |
a list of matrices of squared loadings i.e. for each cluster of variables, the squared loadings on first principal component of PCAmix. For quantitative variables (resp. qualitative), squared loadings are the squared correlations (resp. the correlation ratios) with the first PC (the cluster center). |
sim |
a list of matrices of similarities
i.e. for each cluster, similarities between their variables. The
similarity between two variables is defined as a square cosine: the square
of the Pearson correlation when the two variables are quantitative; the
correlation ratio when one variable is quantitative and the other one is
qualitative; the square of the canonical correlation between two sets of
dummy variables, when the two variables are qualitative. |
cluster |
a vector of integers indicating the cluster to which each variable is allocated. |
wss |
the within-cluster sum of squares for each cluster: the sum of the correlation ratio (for qualitative variables) and the squared correlation (for quantitative variables) between the variables and the center of the cluster. |
E |
the pourcentage of homogeneity which is accounted by the partition in k clusters. |
size |
the number of variables in each cluster. |
scores |
a n by k numerical matrix which contains the k
cluster centers. The center of a cluster is a synthetic variable: the first
principal component calculated by PCAmix. The k columns of |
coef |
a list of the coefficients of the linear combinations defining the synthetic variable of each cluster. |
Chavent, M., Liquet, B., Kuentz, V., Saracco, J. (2012), ClustOfVar: An R Package for the Clustering of Variables. Journal of Statistical Software, Vol. 50, pp. 1-16.
splitmix
, summary.clustvar
,predict.clustvar
1 2 3 4 5 6 7 8 9 10 11 12 | data(decathlon)
#choice of the number of clusters
tree <- hclustvar(X.quanti=decathlon[,1:10])
stab <- stability(tree,B=60)
#a random set of variables is chosen as the initial cluster centers, nstart=10 times
part1 <- kmeansvar(X.quanti=decathlon[,1:10],init=5,nstart=10)
summary(part1)
#the partition from the hierarchical clustering is chosen as initial partition
part_init<-cutreevar(tree,5)$cluster
part2<-kmeansvar(X.quanti=decathlon[,1:10],init=part_init,matsim=TRUE)
summary(part2)
part2$sim
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.