Description Details Author(s) References Examples
For a given data matrix A and cluster centers/prototypes collected in the matrix P, the functions described here select a subset of variables Q that mostly explains/justifies P as prototipes. The functions are useful to reduce the dimension of the data for classification and to discard masking variables for clustering.
Package: | qVarSel |
Type: | Package |
Version: | 1.0 |
Date: | 2014-05-27 |
License: | gpl-2 |
The package is useful to reduce the variable dimension for clustering. The example below shows the sequence of the operations. First, k-means can applied to the whole data sets, to calculate prototypes P. Then, distances between units U and P are calculated ans stored in a matrix D. Then, apply package subroutine q-VarSelH to select the most important variables. Apply EM optimization on data D for full clustering parameters estimation.
Stefano Benati
Maintainer: Stefano Benati <stefano.benati@unitn.it>
S. Benati, S. Garcia Quiles, J. Puerto "Optimization Methods to Select Variables for Clustering", Working Paper, Universidad de Sevilla, 2014
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | # Generate random cluster with masking variables
require(clusterGeneration)
tmp1 <- genRandomClust(numClust = 4, sepVal = 0.2, clustszind = 2,
rangeN = c( 100, 150 ),
numNonNoisy = 5, numNoisy = 5, numReplicate = 1,
fileName = "chk1")
a <- tmp1$datList$chk1_1
ass <- tmp1$memList$chk1_1
numunits <- length(ass)
noiseindex <- tmp1$noisyList$chk1_1
a <- scale(a) #Standardzation for columns
# calculate data prototypes using k-means
sl2 <- kmeans(a, 4, iter.max = 200,
nstart = 10, algorithm = "L")
prototype = sl2$centers
# calculate distances between observations and prototypes
# Remark: d is a 3-dimensions matrix
d = PrtDist(a, prototype)
# Select 5 most representative variables, use 200 iterations
lsH <- qVarSelH(d, 5, maxit = 200)
# reduce the dimension of a
sq = 1:(dim(a)[2])
vrb = sq[lsH$x > 0.01]
a_reduced = a[ ,vrb]
# use the EM methodology for clustering on the reduced data
require(mclust)
sl1 <- Mclust(a_reduced, G = 4, modelName = "VVV")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.