qVarSelH: Selection of Variables for Clustering or Data Dimension...

Description Usage Arguments Details Value Note Author(s) References See Also Examples

View source: R/qVarSelLP.R

Description

The function implements the q-Vars heuristic described in the reference below. Given a 3-dimension matrix D, with d[i,j,k] being the distance between statistic unit i and prototype j measured through variable k, the function calculates the set of variables of cardinality q that mostly explains the prototypes.

Usage

1
2
3
qVarSelH(d,
         q,
         maxit = 100)

Arguments

d

A numeric 3-dimensional matrix where elements d(i,j,k) are the distances between observation i and cluster center/prototype j, that are measured through variable k.

q

A positive scalar, that is the number of variables to select

maxit

A positive scalar, that is the maximum number of iteration allowed

Details

The heuristic repeatedly selects a set of variables and then allocates units to prototypes, while a local optimum is reached. Random restart is used to continue the search until the maximum number of iteration is reached.

Value

obj

The value of the objective function

x

A 0-1 vector describing wheter variable k is selected: If x[k] = 1 then k is selected

ass

A vector of assignment of units to clusters: if ass[i] = j then unit i is assigned to the cluster represented by center/prototype j

bestit

The iteration in which the optimal solution is found

Note

The methodology is heuristic and some steps are random. It may be the case that different runs provide different solutions.

Author(s)

Stefano Benati

References

S. Benati, S. Garcia Quiles, J. Puerto "Optimization Methods to Select Variables for Clustering", Working Paper, Universidad de Sevilla, 2014

See Also

qVarSelLP

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
## Generate random 2 cluster with 20 masking variables
## and 10 true variables
  
require(clusterGeneration)
tmp1 <- genRandomClust(numClust = 2, sepVal = 0.2, clustszind = 2,
                         rangeN = c( 100, 150 ),
                         numNonNoisy = 10, numNoisy = 20, numReplicate = 1, 
                         fileName = "chk1")
a <- tmp1$datList$chk1_1
a <- scale(a)  # Standardize for column


## Calculate two prototype, using kmeans
y <- kmeans(a, 2, iter.max = 200, nstart = 10)
p = y$centers

## Calculate dist:
d <- PrtDist(a, p)
           
## Calculate Best 10 variables:
lsH <- qVarSelH(d, 10, maxit = 200)

qVarSel documentation built on May 2, 2019, 9:28 a.m.