scca_compute: Spectral Clustering Correpondence Analysis

Description Usage Arguments Details Value References Examples

View source: R/scca_compute.R

Description

Please see van Dam, et al. 2021 for a detailed description of the theory and mathematical foundation of Spectral Clustering Correspondence Analysis.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
scca_compute(
  m,
  iter.max = 10,
  nstart = 25,
  disconnect.rm = TRUE,
  max_eigenvalues = 25,
  decomp = "svd",
  max_depth = Inf,
  heuristic = eigengap_heuristic
)

Arguments

m

A matrix representing a bi-partite network. The matrix must have row names and column names.

iter.max

The maximum number of iterations kmeans is allowed to make. Default is 10.

nstart

Number of random cluster sets kmeans may choose to start with. Default is 25.

disconnect.rm

If TRUE (default) disconnected rows and columns in the input data will be removed.

max_eigenvalues

Restrict the number of computed eigenvalues to max_eigenvalues. The default is 25.

decomp

The decomposition function to use. Choices are svd (default) and svd

max_depth

The maximum allowed depth of the analysis process. If Inf (default) the analysis goes on until a stop condition has been met.

heuristic

The function to use for calculating the number of clusters. The default is eigengap_heuristic

Details

The function scca_compute performs a hierarchical, Spectral Clustering Correspondence Analysis on a matrix M representing a bi-partite network. The process consists of the following steps:

  1. Computation of eigenvalues and eigenvectors of the similarity matrix derived from M.

  2. Determine K, being how many relevant eigenvectors should be found.

  3. Apply K-means to find a clustering of the elements of M into K clusters in a hierarchical manner.

The process can be (hierarchical) repeated on the resulting clusters. The output of sccs_compute is a tree in which every node represents one step in the process.

The hierarchical decomposition on a branch stops, when the number of relevant eigenvalues equals 1 or, the maximum depth has been reached. This is signaled by k = 0. When the subset is too small to decompose any further, processing on the branch also stops and a warning is raised. Also the value of k is set to -1.

The function scca_compute is a wrapper function around the workhorse scca_compute_tree

Value

A tree which describes the hierarchical SCCA process. Every node contains the following information:

depth

The depth of the node in the tree.

labels

The labels (rownames) of the subset in this node

n_labs

The number of labels (observations) in the subset

n_node

Depth-first, pre-order numbering of nodes in the scca tree

child

Number of this node among its siblings. No order intended

spectrum

Vector of the Eigen values found at this node. The eigenvalues are sorted on explained variance in descending order.

eigen_vec_1

The first eigenvector of the subset of this node

eigen_vec_2

The second eigenvector

eigen_vec_3

The third eigenvector

k

The number of relevant eigenvalues. This is the value for parameter k of 'kmeans'.

node_type

The value is 'leaf' if k equals -1, 0, or 1, else 'branch'

node

A list of k child nodes, if node_type == 'branch'

References

van Dam, et al. (2021), Correspondence analysis, spectral clustering and graph embedding: applications to ecology and economic complexity, *name of journal*, DOI: <doi>.

Examples

1
2
3
4
## Not run: 
scca_compute(carnivora)

## End(Not run)

UtrechtUniversity/SCCA documentation built on April 16, 2021, 3:23 a.m.