Spectrum: Spectrum: Fast Adaptive Spectral Clustering for Single and...

Description Usage Arguments Value Examples

View source: R/spectrum.R

Description

Spectrum is a self-tuning spectral clustering method for single or multi-view data. Spectrum uses a new type of adaptive density-aware kernel that strengthens connections between points that share common nearest neighbours in the graph. For integrating multi-view data and reducing noise a tensor product graph data integration and diffusion procedure is used. Spectrum analyses eigenvector variance or distribution to determine the number of clusters. Spectrum is well suited for a wide range of data, including both Gaussian and non-Gaussian structures.

Usage

1
2
3
4
5
6
7
Spectrum(data, method = 1, silent = FALSE, showres = TRUE,
  diffusion = TRUE, kerneltype = c("density", "stsc"), maxk = 10,
  NN = 3, NN2 = 7, showpca = FALSE, frac = 2, thresh = 7,
  fontsize = 18, dotsize = 3, tunekernel = FALSE,
  clusteralg = "GMM", FASP = FALSE, FASPk = NULL, fixk = NULL,
  krangemax = 10, runrange = FALSE, diffusion_iters = 4,
  KNNs_p = 10, missing = FALSE)

Arguments

data

Data frame or list of data frames: contains the data with points to cluster as columns and rows as features. For multi-view data a list of dataframes is to be supplied with the samples in the same order.

method

Numerical value: 1 = default eigengap method (Gaussian clusters), 2 = multimodality gap method (Gaussian/ non-Gaussian clusters), 3 = no automatic method (see fixk param)

silent

Logical flag: whether to turn off messages

showres

Logical flag: whether to show the results on the screen

diffusion

Logical flag: whether to perform graph diffusion to reduce noise (default=TRUE)

kerneltype

Character string: 'density' (default) = adaptive density aware kernel, 'stsc' = Zelnik-Manor self-tuning kernel

maxk

Numerical value: the maximum number of expected clusters (default=10). This is data dependent, do not set excessively high.

NN

Numerical value: kernel param, the number of nearest neighbours to use sigma parameters (default=3)

NN2

Numerical value: kernel param, the number of nearest neighbours to use for the common nearest neigbours (default = 7)

showpca

Logical flag: whether to show pca when running on one view

frac

Numerical value: optk search param, fraction to find the last substantial drop (multimodality gap method param)

thresh

Numerical value: optk search param, how many points ahead to keep searching (multimodality gap method param)

fontsize

Numerical value: controls font size of the ggplot2 plots

dotsize

Numerical value: controls the dot size of the ggplot2 plots

tunekernel

Logical flag: whether to tune the kernel, only applies for method 2 (default=FALSE)

clusteralg

Character string: clustering algorithm for eigenvector matrix (GMM or km)

FASP

Logical flag: whether to use Fast Approximate Spectral Clustering (for v. high sample numbers)

FASPk

Numerical value: the number of centroids to compute when doing FASP

fixk

Numerical value: if we are just performing spectral clustering without automatic selection of K, set this parameter and method to 3

krangemax

Numerical value: the maximum K value to iterate towards when running a range of K

runrange

Logical flag: whether to run a range of K or not (default=FALSE), puts Kth results into Kth element of list

diffusion_iters

Numerical value: number of diffusion iterations for the graph (default=4)

KNNs_p

Numerical value: number of KNNs when making KNN graph (default=10, suggested=10-20)

missing

Logical flag: whether to impute missing data in multi-view analysis (default=FALSE)

Value

A list, containing: 1) cluster assignments, in the same order as input data columns 2) eigenvector analysis results (either eigenvalues or dip test statistics) 3) optimal K 4) final similarity matrix 5) eigenvectors and eigenvalues of graph Laplacian

Examples

1
res <- Spectrum(brain[[1]][,1:50])

Example output

***Spectrum***
detected views: 1
method: 1
kernel: density
calculating similarity matrix 1
done.
combining similarity matrices if > 1 and making kNN graph...
done.
diffusing on tensor product graph...
done.
calculating graph laplacian (L)...
getting eigensystem of L...
done.
examining eigenvalues to select K...
optimal K: 3
doing GMM clustering...
done.
finished.

Spectrum documentation built on Feb. 10, 2020, 9:07 a.m.