ATSC: Automated Trimmed & Sparse Clustering
In neobernad/evaluomeR: Evaluation of Bioinformatics Metrics

View source: R/metricsAnalysis.R

ATSC	R Documentation

Automated Trimmed & Sparse Clustering

Description

Automated Trimmed & Sparse Clustering. This methods performs an optimal k value analysis with stabilityRange, qualityRange and getOptimalKValue evaluomeR methods. The optimal k value is used to compute estimate a L1 bound and an alpha trimming portion automatically in order to perform an automatic trimmed and sparse clustering. This posibily results in the input dataset being trimmed (either by columns, determined by L1 or by rows, determined by alpha). Another optimal k value analysis is then executed over the trimmed dataset, to conclude with the an optimal partition.

Usage

ATSC(
  data,
  k.range = c(2, 15),
  bs = 100,
  cbi = "kmeans",
  max_alpha = 0.1,
  all_metrics = TRUE,
  L1 = NULL,
  alpha = NULL,
  gold_standard = NULL,
  seed = NULL
)

Arguments

`data`	A `SummarizedExperiment`. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.
`k.range`	Concatenation of two positive integers. The first value `k.range[1]` is considered as the lower bound of the range, whilst the second one, `k.range[2]`, as the higher. Both values must be contained in [2,15] range.
`bs`	Positive integer. Bootstrap value to perform the resampling.
`cbi`	Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of `pam`. The method used in 'hclust' CBI is "ward.D2".
`max_alpha`	Maximum value of alpha, iterating over seq(0, max_alpha, 0.05)
`all_metrics`	Boolean. If true, clustering is performed upon all the dataset.
`L1`	A single L1 bound on weights (the feature weights), see `RSKC`.
`seed`	Positive integer. A seed for internal bootstrap.

Value

A list containing:

`stab`	A data frame containing standardized stability.
`qual`	A data frame containing standardized quality.
`optimalK`	The optimal k value representing the optimal number of clusters determined from the initial analysis.
`stab_ATSC`	A data frame containing standardized stability after applying ATSC.
`qual_ATSC`	A data frame containing standardized quality applying ATSC.
`optimalK_ATSC`	The optimal k value representing the optimal number of clusters determined after applying ATSC.
`rskcOut`	An object returned by the RSKC function containing clustering results, including weights and trimmed observations.
`trimmedRows`	A vector of indices representing the rows that were trimmed from the dataset during the clustering process.
`trimmedColumns`	A vector of names representing the columns that were trimmed (i.e., removed) from the dataset due to zero weights.
`trimmedDataset`	A data frame containing the final processed dataset after trimming rows and columns.