COPS: Clustering algorithms for Omics based Patient Stratification

View source: R/clustering_pipeline.R

COPSR Documentation

Clustering algorithms for Omics based Patient Stratification

Description

Combines subsampling, subsample_pathway_enrichment, subsample_dimred, subsample_clustering_evaluation, stability_evaluation, subsample_survival_evaluation, subsample_module_evaluation and subsample_association_analysis to conveniently and comprehensively test clustering algorithms on a given set of input data.

Usage

COPS(
  dat,
  nfolds = 5,
  nruns = 1,
  association_data = NULL,
  survival_data = NULL,
  module_eigs = NULL,
  verbose = TRUE,
  parallel = 1,
  pathway_enrichment_method = "none",
  multi_omic_methods = NULL,
  vertical_parallelization = FALSE,
  internal_metrics = NULL,
  silhouette_dissimilarity = NULL,
  pre_compute_silhouette_dissimilarity = TRUE,
  ...
)

vertical_pipeline(
  dat_list,
  nfolds = 5,
  nruns = 1,
  survival_data = NULL,
  association_data = NULL,
  multi_omic_methods = NULL,
  parallel = 1,
  data_is_kernels = FALSE,
  silhouette_dissimilarities = NULL,
  by = c("run", "fold", "m", "k", "mkkm_mr_lambda"),
  verbose = TRUE,
  ...
)

Arguments

dat

A single matrix or list of matrices, patients on columns and features on rows.

nfolds

Number of cross-validation folds for stability evaluation and metric estimates.

nruns

Number of cross-validation replicates for stability evaluation and metric estimates.

association_data

Data for association tests, see cluster_associations for details.

survival_data

Data for survival analysis, see survival_preprocess for details.

module_eigs

Data for gene module correlation analysis, see gene_module_score for details.

verbose

Prints progress messages and time taken.

parallel

Number of parallel threads for supported operations.

pathway_enrichment_method

enrichment_method for genes_to_pathways.

multi_omic_methods

Character vector of multi-view clustering method names for multi_omic_clustering.

vertical_parallelization

(Experimental) if set, all pipeline steps are evaluated in succession within each fold (instead of evaluating each step for all folds before moving on). Always true for multi-view methods.

internal_metrics

Internal metric names passed to intCriteria. This will slow the pipeline down considerably.

silhouette_dissimilarity

Dissimilarity matrix to use for computing silhouette indices.

pre_compute_silhouette_dissimilarity

If silhouette_dissimilarity=NULL, it will be calculated automatically by default.

...

Extra arguments are passed to pipeline components where appropriate.

dat_list

list of data tables

data_is_kernels

Whether dat_list should be treated as kernel matrices

silhouette_dissimilarities

list of dissimilarity matrices used for silhouette calculations

by

column names used to split threads by

Details

If multi_omic_methods is set, then the input matrices are treated as different views of the same patients. Available methods are listed in the documentation for multi_omic_clustering.

Value

Returns a list of pipeline component outputs for each run, fold and method given different settings and input data sets.

  • clusters data.frame defining clusters

  • internal_metrics data.frame of internal metrics

  • stability data.frame of stability scores

  • survival data.frame of survival analysis results

  • modules data.frame of gene module association scores

  • association data.frame of association results to variables of interest

  • cluster_sizes data.frame giving the sizes of clusters

list of clustering analysis results

Functions

  • vertical_pipeline(): pipeline vertical parallelization

Examples

library(COPS)

# Dimensionality reduction and clustering (DR-CL)
res <- COPS(ad_ge_micro_zscore, 
association_data = ad_studies, 
parallel = 1, nruns = 2, nfolds = 5, 
dimred_methods = c("pca", "umap", "tsne"), 
cluster_methods = c("hierarchical", "kmeans"), 
distance_metric = "euclidean", 
n_clusters = 2:4)

# Clustering (CL)
res <- COPS(ad_ge_micro_zscore, 
association_data = ad_studies, 
parallel = 1, nruns = 2, nfolds = 5, 
dimred_methods = c("none"), 
cluster_methods = c("hierarchical"), 
distance_metric = "correlation", 
n_clusters = 2:4)

# Biological knowledge integration and clustering (BK-CL)
res <- COPS(ad_ge_micro_zscore, 
association_data = ad_studies, 
pathway_enrichment_method = "DiffRank", 
gene_key_x = "ENSEMBL", 
gs_subcats = "CP:KEGG", 
parallel = 1, nruns = 2, nfolds = 5, 
dimred_methods = c("none"), 
cluster_methods = c("hierarchical"), 
distance_metric = "correlation", 
n_clusters = 2:4)


vittoriofortino84/COPS documentation built on Jan. 28, 2025, 3:16 p.m.