qVarSel-package: Selecting Variables for Clustering and Classification

Description Details Author(s) References Examples

Description

For a given data matrix A and cluster centers/prototypes collected in the matrix P, the functions described here select a subset of variables Q that mostly explains/justifies P as prototipes. The functions are useful to reduce the dimension of the data for classification and to discard masking variables for clustering.

Details

Package: qVarSel
Type: Package
Version: 1.0
Date: 2014-05-27
License: gpl-2

The package is useful to reduce the variable dimension for clustering. The example below shows the sequence of the operations. First, k-means can applied to the whole data sets, to calculate prototypes P. Then, distances between units U and P are calculated ans stored in a matrix D. Then, apply package subroutine q-VarSelH to select the most important variables. Apply EM optimization on data D for full clustering parameters estimation.

Author(s)

Stefano Benati

Maintainer: Stefano Benati <stefano.benati@unitn.it>

References

S. Benati, S. Garcia Quiles, J. Puerto "Optimization Methods to Select Variables for Clustering", Working Paper, Universidad de Sevilla, 2014

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
  # Generate random cluster with masking variables
  
  require(clusterGeneration)
  tmp1 <- genRandomClust(numClust = 4, sepVal = 0.2, clustszind = 2,
                         rangeN = c( 100, 150 ),
                         numNonNoisy = 5, numNoisy = 5, numReplicate = 1, 
                         fileName = "chk1")
  a <- tmp1$datList$chk1_1
  ass <- tmp1$memList$chk1_1
  numunits <- length(ass)
  noiseindex <- tmp1$noisyList$chk1_1
  a <- scale(a)  #Standardzation for columns
  
  # calculate data prototypes using k-means

  sl2 <- kmeans(a, 4, iter.max = 200, 
                      nstart = 10, algorithm = "L")
  prototype = sl2$centers
  
  # calculate distances between observations and prototypes
  # Remark: d is a 3-dimensions matrix
  
  d = PrtDist(a, prototype)
  
  # Select 5 most representative variables, use 200 iterations
  
  lsH <- qVarSelH(d, 5, maxit = 200)
  
  # reduce the dimension of a
  
  sq = 1:(dim(a)[2])
  vrb = sq[lsH$x > 0.01]
  a_reduced = a[ ,vrb]
  
  # use the EM methodology for clustering on the reduced data

  require(mclust)
  sl1 <- Mclust(a_reduced, G = 4, modelName = "VVV") 

qVarSel documentation built on May 2, 2019, 9:28 a.m.