ssmf: Simplex-structured matrix factorisation algorithm (SSMF).
In MetabolSSMF: Simplex-Structured Matrix Factorisation for Metabolomics Analysis

View source: R/Functions.R

ssmf	R Documentation

Simplex-structured matrix factorisation algorithm (SSMF).

Description

This function implements on SSMF on a data matrix or data frame.

Usage

ssmf(
  data,
  k,
  H = NULL,
  W = NULL,
  meth = c("kmeans", "uniform", "dirichlet", "nmf"),
  lr = 0.01,
  nruns = 50
)

Arguments

`data`	Data matrix or data frame.
`k`	The number of prototypes/clusters.
`H`	Matrix, user input `H` matrix to start the algorithm. If input is empty, the function will initialise `H` matrix automatically.
`W`	Matrix, user input `W` matrix to start the algorithm. If input is empty, the function will initialise `W` matrix automatically.
`meth`	Specification of method to initialise the `W` and `H` matrix, see 'method' in `init()`.
`lr`	Optimisation learning rate.
`nruns`	The maximum times of running the algorithm.

Details

Let X \in R^{n \times p} be the data set with n observations and p variables. Given an integer k \ll \text{min}(n,p), the data set is clustered by simplex-structured matrix factorisation (SSMF), which aims to process soft clustering and partition the observations into k fuzzy clusters such that the sum of squares from observations to the assigned cluster prototypes is minimised. SSMF finds H_{n \times k} and W_{k \times p}, such that

X \approx HW,

A cluster prototype refers to a vector that represent the characteristics of a particular cluster, denoted by w_r \in \mathbb{R}^{p} , where r is the r^{th} cluster. A cluster membership vector h_i \in \mathbb{R}^{k} describes the proportion of the cluster prototypes of the i^{th} observation. W is the prototype matrix where each row is the cluster prototype and H is the soft membership matrix where each row gives the soft cluster membership of each observation. The problem of finding the approximate matrix factorisation is solved by minising residual sum of squares (RSS), that is

\mathrm{RSS} = \| X-HW \|^2 = \sum_{i=1}^{n}\sum_{j=1}^{p} \left\{ X_{ij}-(HW)_{ij}\right\}^2,

such that \sum_{r=1}^k h_{ir}=1 and h_{ir}\geq 0.

Value

W The optimised W matrix, containing the values of prototypes.

H The optimised H matrix, containing the values of soft memberships.

SSE The residuals sum of square.

Author(s)

Wenxuan Liu

References

Abdolali, Maryam & Gillis, Nicolas. (2020). Simplex-Structured Matrix Factorization: Sparsity-based Identifiability and Provably Correct Algorithms. <doi:10.1137/20M1354982>

Examples



library(MetabolSSMF)

# Initialisation by user
data <- SimulatedDataset
k <- 4

## Initialised by kmeans
fit.km <- kmeans(data, centers = k)

H <- mclust::unmap(fit.km$cluster)
W <- fit.km$centers

fit1 <- ssmf(data, k = k, H = H) #start the algorithm from H
fit2 <- ssmf(data, k = k, W = W) #start the algorithm from W

# Initialisation inside the function
fit3 <- ssmf(data, k = 4, meth = 'dirichlet')
fit4 <- ssmf(data, k = 4)

MetabolSSMF documentation built on April 3, 2025, 5:44 p.m.