clusterData: Cluster Data Based on Different Methods

View source: R/2.clusterData.R

clusterDataR Documentation

Cluster Data Based on Different Methods

Description

Cluster Data Based on Different Methods

Usage

clusterData(
  obj = NULL,
  scaleData = TRUE,
  cluster.method = c("mfuzz", "TCseq", "kmeans", "wgcna"),
  TCseq_params_list = list(),
  object = NULL,
  min.std = 0,
  cluster.num = NULL,
  subcluster = NULL,
  seed = 5201314,
  ...
)

Arguments

obj

An input object that can take one of two types: - A cell_data_set object for trajectory analysis. - A matrix or data.frame containing expression data.

scaleData

Logical. Whether to scale the data (e.g., z-score normalization).

cluster.method

Character. Clustering method to use. Options are one of "mfuzz", "TCseq", "kmeans", or "wgcna".

TCseq_params_list

A list of additional parameters passed to the TCseq::timeclust function.

object

A pre-calculated object required when using "wgcna" as the clustering method.

min.std

Numeric. Minimum standard deviation for filtering expression data.

cluster.num

Integer. The number of clusters to identify.

subcluster

A numeric vector of specific cluster IDs to include in the results. If NULL, all clusters are included.

seed

An integer seed for reproducibility in clustering operations.

...

Additional arguments passed to internal functions such as pre_pseudotime_matrix.

Details

Depending on the selected cluster.method, different clustering algorithms are used:

  • "mfuzz": Applies Mfuzz soft clustering method, suitable for identifying overlapping clusters.

  • "TCseq": Uses TCseq clustering for time-series expression data with support for additional parameters.

  • "kmeans": Employs standard k-means clustering via base R's stats::kmeans.

  • "wgcna": Leverages pre-calculated WGCNA (Weighted Gene Co-expression Network Analysis) networks.

The function is designed to be flexible, allowing preprocessing (e.g., filtering by min.std), scaling the data (scaleData = TRUE), and generating results compatible with data visualization pipelines.

Value

A list containing the following clustering results:

  • wide.res: A wide-format data frame with clusters and normalized expression levels.

  • long.res: A long-format data frame for visualizations, containing cluster information, normalized values, cluster names, and memberships.

  • cluster.list: A list where each element contains genes belonging to a specific cluster.

  • type: The clustering method used ("mfuzz", "TCseq", "kmeans", or "wgcna").

  • geneMode: Currently set to "none" (reserved for future use).

  • geneType: Currently set to "none" (reserved for future use).

WGCNA Clustering

If the WGCNA method is selected, the object parameter must contain a pre-calculated WGCNA network object. This is typically obtained using the WGCNA package functions.

Subsetting Clusters

Use the subcluster parameter to focus on specific clusters. Cluster IDs not included in the subcluster vector will be excluded from the final results.

Author(s)

JunZhang

This function performs clustering on input data using one of four methods: mfuzz, TCseq, kmeans, or wgcna. The clustering results include metadata, normalized data, and cluster memberships.

Examples


data("exps")

# kmeans
ck <- clusterData(obj = exps,
                  cluster.method = "kmeans",
                  cluster.num = 8)


ClusterGVis documentation built on April 4, 2025, 2:27 a.m.