ssmf | R Documentation |
This function implements on SSMF on a data matrix or data frame.
ssmf(
data,
k,
H = NULL,
W = NULL,
meth = c("kmeans", "uniform", "dirichlet", "nmf"),
lr = 0.01,
nruns = 50
)
data |
Data matrix or data frame. |
k |
The number of prototypes/clusters. |
H |
Matrix, user input |
W |
Matrix, user input |
meth |
Specification of method to initialise the |
lr |
Optimisation learning rate. |
nruns |
The maximum times of running the algorithm. |
Let X \in R^{n \times p}
be the data set with n
observations and p
variables.
Given an integer k \ll \text{min}(n,p)
,
the data set is clustered by simplex-structured matrix factorisation (SSMF), which aims to process soft clustering
and partition the observations into k
fuzzy clusters such that the sum of squares from observations to the
assigned cluster prototypes is minimised.
SSMF finds H_{n \times k}
and W_{k \times p}
,
such that
X \approx HW,
A cluster prototype refers to a vector that represent the characteristics of a particular cluster,
denoted by w_r \in \mathbb{R}^{p}
, where r
is the r^{th}
cluster.
A cluster membership vector h_i \in \mathbb{R}^{k}
describes the proportion of the cluster prototypes
of the i^{th}
observation. W
is the prototype matrix where each row is the cluster prototype and
H
is the soft membership matrix where each row gives the soft cluster membership of each observation.
The problem of finding the approximate matrix factorisation is solved by minising residual sum of squares (RSS), that is
\mathrm{RSS} = \| X-HW \|^2 = \sum_{i=1}^{n}\sum_{j=1}^{p} \left\{ X_{ij}-(HW)_{ij}\right\}^2,
such that \sum_{r=1}^k h_{ir}=1
and h_{ir}\geq 0
.
W
The optimised W
matrix, containing the values of prototypes.
H
The optimised H
matrix, containing the values of soft memberships.
SSE
The residuals sum of square.
Wenxuan Liu
Abdolali, Maryam & Gillis, Nicolas. (2020). Simplex-Structured Matrix Factorization: Sparsity-based Identifiability and Provably Correct Algorithms. <doi:10.1137/20M1354982>
library(MetabolSSMF)
# Initialisation by user
data <- SimulatedDataset
k <- 4
## Initialised by kmeans
fit.km <- kmeans(data, centers = k)
H <- mclust::unmap(fit.km$cluster)
W <- fit.km$centers
fit1 <- ssmf(data, k = k, H = H) #start the algorithm from H
fit2 <- ssmf(data, k = k, W = W) #start the algorithm from W
# Initialisation inside the function
fit3 <- ssmf(data, k = 4, meth = 'dirichlet')
fit4 <- ssmf(data, k = 4)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.