ATSC: Automated Trimmed & Sparse Clustering

View source: R/metricsAnalysis.R

ATSCR Documentation

Automated Trimmed & Sparse Clustering

Description

Automated Trimmed & Sparse Clustering. This methods performs an optimal k value analysis with stabilityRange, qualityRange and getOptimalKValue evaluomeR methods. The optimal k value is used to compute estimate a L1 bound and an alpha trimming portion automatically in order to perform an automatic trimmed and sparse clustering. This posibily results in the input dataset being trimmed (either by columns, determined by L1 or by rows, determined by alpha). Another optimal k value analysis is then executed over the trimmed dataset, to conclude with the an optimal partition.

Usage

ATSC(
  data,
  k.range = c(2, 15),
  bs = 100,
  cbi = "kmeans",
  max_alpha = 0.1,
  all_metrics = TRUE,
  L1 = NULL,
  alpha = NULL,
  gold_standard = NULL,
  seed = NULL
)

Arguments

data

A SummarizedExperiment. The SummarizedExperiment must contain an assay with the following structure: A valid header with names. The first column of the header is the ID or name of the instance of the dataset (e.g., ontology, pathway, etc.) on which the metrics are measured. The other columns of the header contains the names of the metrics. The rows contains the measurements of the metrics for each instance in the dataset.

k.range

Concatenation of two positive integers. The first value k.range[1] is considered as the lower bound of the range, whilst the second one, k.range[2], as the higher. Both values must be contained in [2,15] range.

bs

Positive integer. Bootstrap value to perform the resampling.

cbi

Clusterboot interface name (default: "kmeans"): "kmeans", "clara", "clara_pam", "hclust", "pamk", "pamk_pam", "pamk". Any CBI appended with '_pam' makes use of pam. The method used in 'hclust' CBI is "ward.D2".

max_alpha

Maximum value of alpha, iterating over seq(0, max_alpha, 0.05)

all_metrics

Boolean. If true, clustering is performed upon all the dataset.

L1

A single L1 bound on weights (the feature weights), see RSKC.

seed

Positive integer. A seed for internal bootstrap.

Value

A list containing:

stab

A data frame containing standardized stability.

qual

A data frame containing standardized quality.

optimalK

The optimal k value representing the optimal number of clusters determined from the initial analysis.

stab_ATSC

A data frame containing standardized stability after applying ATSC.

qual_ATSC

A data frame containing standardized quality applying ATSC.

optimalK_ATSC

The optimal k value representing the optimal number of clusters determined after applying ATSC.

rskcOut

An object returned by the RSKC function containing clustering results, including weights and trimmed observations.

trimmedRows

A vector of indices representing the rows that were trimmed from the dataset during the clustering process.

trimmedColumns

A vector of names representing the columns that were trimmed (i.e., removed) from the dataset due to zero weights.

trimmedDataset

A data frame containing the final processed dataset after trimming rows and columns.


neobernad/evaluomeR documentation built on Dec. 18, 2024, 5:34 a.m.