spectral.clust: Spectral Clustering using Similarity or Distance Matrices
In kdml: Kernel Distance Metric Learning for Mixed-Type Data

spectral.clust

R Documentation

Spectral Clustering using Similarity or Distance Matrices

Description

This function calculates performs spectral clustering with the k-means step using precomputed similarity or distance matrices, and returns a vector of cluster assignments.

Usage

spectral.clust(S, k, nstart = 10, iter.max = 1000, 
              is.sim = NULL, neighbours = 10)

Arguments

`S`	a `n \times n` numeric matrix representing either pairwise similarities or distances between observations. The matrix can be a similarity matrix or a distance matrix, as indicated by the `is.sim` argument.
`k`	integer value specifying the number of clusters to form. This is passed to the `kmeans` algorithm.
`nstart`	integer value specifying the number of random starts for the bandwidth estimation. Defaults to 3 or the number of variables, whichever is larger.
`iter.max`	integer value specifying the maximum number of iterations for the `kmeans` algorithm. Defaults to `1000`.
`is.sim`	logical value indicating whether the input matrix `S` is a similarity matrix. If set to `TRUE`, `S` is treated as a similarity matrix. If set to `FALSE`, `S` is treated as a distance matrix. Must be specified.
`neighbours`	integer value specifying the number of nearest neighbours to consider when constructing the graph Laplacian. This helps in determining the structure of the graph from the similarity or distance matrix. Defaults to `10`.

Details

spectral.clust implements spectral clustering on pairwise similarity or distance matrices, following the method described by Ng et al. (2001). The function first constructs an adjacency matrix from the input similarity or distance matrix S using the neighbours parameter to define the nearest connections. If S is a similarity matrix (is.sim = TRUE), the function retains the largest values corresponding to the neighbours nearest observations. If S is a distance matrix (is.sim = FALSE), it retains the smallest values for the nearest observations. The adjacency matrix is symmetrized and used to compute the unnormalized Laplacian matrix. The eigenvectors corresponding to the smallest eigenvalues of the Laplacian are extracted and clustered using the kmeans algorithm. The number of clusters, k, and parameters such as the number of random starts (nstart) and maximum iterations (iter.max) for the kmeans step are user-specified.

Value

spectral.clust returns a list object with the following components:

`clusters`	an `n`-variate integer vector indicating the cluster assignment for each observation, as determined by the `kmeans` algorithm.
`S`	the original `n \times n` numeric matrix used as input, representing either pairwise similarities or distances between observations, depending on the `is.sim` argument.

Author(s)

John R. J. Thompson john.thompson@ubc.ca, Jesse S. Ghashti jesse.ghashti@ubc.ca

References

Ng, A., Jordan, M., & Weiss, Y. (2001). On spectral clustering: Analysis and an algorithm. “Advances in Neural Information processing systems”, 14.

Examples

# load the Iris dataset
dat <- iris[,-5]

# calculate pairwise similarities using maximum likelihood cross validation
S <- kss(dat, bw = "np", npmethod = "cv.ml", cFUN = "c_gaussian", verbose = TRUE)

# cluster points using spectral clustering and compare to true class labels
cl <- spectral.clust(S$similarities, 3, is.sim = TRUE)
table(cl$clusters, iris[,5])

# try a different number of neighbours
cl2 <- spectral.clust(S$similarities, 3, is.sim = TRUE, neighbours = 4)
table(cl2$clusters, iris[,5])

kdml documentation built on Sept. 21, 2024, 9:06 a.m.