spectrace_cluster_spectra: Cluster spectrace data

View source: R/spectrace_cluster_spectra.R

spectrace_cluster_spectraR Documentation

Cluster spectrace data

Description

This function allows to cluster spectrace data.

Usage

spectrace_cluster_spectra(
  lightData,
  method = c("kmeans", "kmedoids-pam", "kmedoids-clara"),
  encoding = c("PCA", "none"),
  normalization = c("AUC", "peak"),
  n.clusters,
  n.init = 100,
  n.samples = 100,
  samplesize = 100 * n.clusters,
  classify = FALSE,
  referenceData = NULL,
  clusters.only = FALSE,
  return.sil = FALSE,
  return.encoded = FALSE,
  return.plot = FALSE,
  return.classification = FALSE
)

Arguments

lightData

Data frame containing the (calibrated) light data. Data needs to be in wide format (see spectrace_to_wide).

method

Clustering method. Must be one of ['kmeans', 'kmedoids-pam', 'kmedoids-clara']. Defaults to 'kmeans'.

encoding

Encoding method. Must be one of ['PCA', 'none']. Defaults to 'PCA'.

normalization

Normalization method (see spectrace_normalize_spectra). Defaults to 'AUC'.

n.clusters

Integer, indicating the number of clusters.

n.init

Integer, indicting the number of random initializations for kmeans. Defaults to 100.

n.samples

Integer, indicating the number of random samples for the clara algorithm and bootstrapped silhouette score for the kmeans algorithm. Defaults to 100.

samplesize

Integer, indicating the size of each sample for the clara algorithm and bootstrapped silhouette score for the kmeans algorithm. Can be between 1 and N, with N being the number of observations. Defaults to 'min(N, 100 * n.clusters)'.

classify

Logical, indicating whether the cluster medians should be classified. See spectrace_classify_spectra. Defaults to FALSE.

referenceData

Data frame with the reference data for classification. See spectrace_classify_spectra. If not provided (default), the in-built reference data will be used.

clusters.only

Logical, indicating whether only a vector with the cluster ids should be returned. Defaults to FALSE

return.sil

Logical, indicting whether average silhouette scores per cluster should be returned. Defaults to FALSE.

return.encoded

Logical, indicting whether the encoded data should be returned. Defaults to FALSE.

return.plot

Logical, indicating whether the plot should be returned. Defaults to FALSE.

return.classification

Logical, indicating whether the classification should be returned. Defaults to FALSE.

Value

The original ‘lightData' with an additional column named ’cluster_id' indicating the cluster of each spectrum in the data. If 'clusters.only=TRUE', a vector with the cluster_ids is returned. If 'return.sil=TRUE' a named list is returned with two entries: 'data' holding the data frame or cluster_id vector and 'sil_scores' the silhouette scores.


steffenhartmeyer/spectrace documentation built on Dec. 4, 2024, 4:13 p.m.