SCPP_cluster: Spectral Clustering using Projection Pursuit

Description Usage Arguments Value References Examples

Description

Generates a binary partitioning tree by recursively partitioning a dataset using spectral clustering within an optimal subspace.

Usage

1
SCPP_cluster(X, K, v0, ndim, nMicro, betamax, betamin, smult, minsize, minprop, omega, type)

Arguments

X

a numeric matrix (num_data x num_dimensions); the dataset to be clustered.

K

the number of clusters to extract.

v0

(optional) initialisations for projection puprsuit. a function(X) of the data being split, which returns a matrix with ncol(X)*ndim rows. each column of the output of v0(X) is used as an initialisation for projection pursuit. That is, the i-th initialisation is via the projection matrix: matrix(v0[,i], ncol = ndim). the solution with the minimum spectral connectivity is used within the final model. initialisations are determined separately for each cluster being split. if omitted then a single initialisation is used; the first ndim principal components.

ndim

(optional) dimension of subspace.if omitted then ndim = 2

nMicro

(optional) number of microclusters. running time is quadratic in the number of microclusters. approximation of true objective is better the more microclusters used. if omitted then nMicro = 200

betamax

(optional) initial value of beta, which affects pairwise similarities by increasing the similarity of points outside beta standard deviations from the mean. This reduces the effect of outliers. if omitted then betamax = 5.

betamin

(optional) smallest value of beta considered. beta is reduced by 0.1 from initial value until a desired balance in cluster size is met. if omitted then betamin = 0.5.

smult

(optional) multiplicative factor applied to scaling parameter used in pairwise similarities. scaling parameter is determined for each cluster (C) being split as sqrt(mean(eigen(cov(C))$values[1:intr]))*smult*(4/3/n)^(1/(4+intr)), where intr is an estimate of the intrinsic dimensionality of $C$. if omitted then smult = 1.

minsize

(optional) the minimum cluster size allowable. if omitted then minsize = nrow(X)/K/5, i.e., 20 percent of the average cluster size.

minprop

(optional) the minimum cluster expressed as a proportion of the cluster being split. if omitted then minimum cluster size is detemined by minsize.

omega

(optional) parameter controlling the orthogonality of the columns of the projection vector. if omitted then omega = 1.

type

(optional) type of Laplacian ("standard" or "normalised") to use. if omitted then type = "normalised".

Value

a named list containing

$cluster

cluster assignment vector.

$model

matrix containing the would-be location of each node (depth and position at depth) within a complete tree of appropriate depth.

$nodes

unnamed list each element of which is a named list containing details of the binary partitions at each node in the model.

$data

the data matrix being clustered.

$args

named list of arguments passed to SCPP_cluster.

References

Hofmeyr, D., Pavlidis, N., Eckley, I. (2018) Minimum spectral connectivity projection pursuit. Statistics and Computing, to appear.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## load synthetic control chart dataset
data(chart)

## obtain clustering solution using SCPP
sol <- SCPP_cluster(chart$x, 6)

## plot cluster model
SCPP_plot(sol)

## evaluate performance using external cluster validity metrics
cluster_performance(sol$cluster, chart$c)

DavidHofmeyr/SCPP documentation built on May 28, 2019, 12:25 p.m.