clustering_analysis: Clustering, internal evaluation and batch effect estimation

View source: R/clustering.R

clustering_analysisR Documentation

Clustering, internal evaluation and batch effect estimation

Description

Single embedding or dataset evaluation

Usage

clustering_analysis(
  dat,
  n_clusters = 2:5,
  cluster_methods = c("hierarchical", "diana", "kmeans"),
  clustering_dissimilarity = NULL,
  distance_metric = "euclidean",
  correlation_method = "spearman",
  hierarchical_linkage = "complete",
  kmeans_num_init = 100,
  kmeans_max_iters = 100,
  kmeans_tol = 1e-08,
  gmm_modelNames = NULL,
  gmm_shrinkage = 0.01,
  knn_neighbours = 30,
  knn_jaccard = TRUE,
  kernel = "linear",
  kernel_gamma = 1,
  kernel_center = TRUE,
  kernel_normalize = TRUE,
  kkmeans_algorithm = "spectral",
  kkmeans_refine = FALSE,
  kkmeans_maxiter = 100,
  kkmeans_n_init = 100,
  kkmeans_tol = 1e-08,
  ...
)

Arguments

dat

A data.frame with features on columns labeled as "dim[0-9]+", must also contain "id" column.

n_clusters

A vector of integers defining the number of clusters.

cluster_methods

A vector of clustering method names, see details for options.

clustering_dissimilarity

A dissimilarity matrix used in some methods such as hierarchical clustering. Computed with clustering_dissimilarity_from_data if missing.

distance_metric

Either "euclidean" or "correlation".

correlation_method

Method for cor.

hierarchical_linkage

See flashClust.

kmeans_num_init

See KMeans_rcpp.

kmeans_max_iters

See KMeans_rcpp.

kmeans_tol

See KMeans_rcpp.

gmm_modelNames

Sepcifies model type for Mclust

gmm_shrinkage

Shrinkage parameter for priorControl.

knn_neighbours

number of nearest neighbours for community detection.

knn_jaccard

computes shared neighbour weights with Jaccard ubdex if TRUE.

kernel

kernel for kernel k-means, options: "linear", "gaussian", "rbf", "jaccard", "tanimoto"

kernel_gamma

gamma for the Gaussian/RBF kernel, higher values correspond to more complicated boundaries

kernel_center

center kernels if TRUE

kernel_normalize

normalize kernels to L2 norm 1 if TRUE

kkmeans_algorithm

See kernel_kmeans options.

kkmeans_refine

See kernel_kmeans.

kkmeans_maxiter

maximum number of iterations for kernel k-means

kkmeans_n_init

number of random initializations for kernel k-means++

kkmeans_tol

delta error convergence threshold for spectral clustering

...

extra arguments are ignored

Details

Supported clustering methods are:

  • "hierarchical" - agglomerative hierarchical clustering

  • "diana" - divisive hierarchical clustering analysis

  • "kmeans" - k-means++

  • "model" - Gaussian Mixture Models

  • "knn_communities" - Louvain community detection on shared k nearest neighbour graphs

  • "spectral" - spectral clustering

  • "SC3" - consensus clustering http://dx.doi.org/10.1038/nmeth.4236, note that this requires SC3 installation which is not required by default

  • "kkmeans" - kernelized k-means initialized by a spectral approximation

  • "kkmeanspp" - kernelized k-means++ with random initializations

Value

Returns a list containing clusters, metrics, and chisq.test p-values if batch_label was supplied


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.