spuds: Spectral Partitioning Using Density Separation
In DavidHofmeyr/spuds: Spectral Partitioning Using Density Separation

Description Usage Arguments Value Examples

Spectral clustering algorithm which selects the number of clusters based on validation using low density separation.

1	spuds(X, c0, scale, sigmult, cplus, cmax, lam, gam, intr)

`X`	a numeric matrix (num_data x num_dimensions); the dataset to be clustered.
`c0`	(optional) the initial number of clusters. Default is 20.
`scale`	(optional) either a numeric value for the scale parameter or a function taking as input the data matrix and which returns a scalar. Default is the square root of the average of the first k eigenvalues (evals) of the covariance, multiplied by (4/(2+k)/n)^(1/(k+4)). k is an estimate of the intrinsic dimensionality.
`sigmult`	(optional) multiplier for scale value. Default is 1.2
`cplus`	(optional) increment in cluster number with each iteration. Default is 1. If the number of clusters may be large or the data contain numerous outliers then setting this to a larger value (e.g. 10) should accelerate the runtime. In this instance also increase cmax.
`cmax`	(optional) maximum number of clusters (including outlier clusters). Default is 50
`lam`	(optional) Density threshold. default is 1
`gam`	(optional) minimum non-outlier cluster size. default is n/50
`intr`	(optional) how to determine the intrinsic dimensionality. For values in (0, 1] chooses dimension which accounts for the corresponding proportion of total variability in the data. For values greater than 1 chooses those dimensions which are greater than the largest eigenvalue divided by intr. Two type string options are also implemented. If intr is set to "kaiser" then dimension is the number of eigenvalues >= 1 (columns are standardised first). If intr is set to "elbow" then a simple elbow rule is used.

a vector of cluster labels

### generate data set using the function provided

X <- spuds_datagen(3000, 10, 10)

### produce clustering solution using SPUDS algorithm

sol <- spuds(X$x)

### assess the quality of the solution

cluster_performance(sol, X$c)