Description Usage Arguments Details Value Author(s) References See Also Examples
Doubly-enhanced EM algorithm for tensor clustering
1 2 3 |
X |
Input tensor (or matrix) list of length n, where n is the number of observations. Each element of the list is a tensor or matrix. The order of tensor can be any positive integer not less than 2. |
nclass |
Number of clusters. |
niter |
Maximum iteration times for EM algorithm. Default value is 100. |
lambda |
A user-specified |
dfmax |
The maximum number of selected variables in the model. Default is the number of observations |
pmax |
The maximum number of potential selected variables during iteration. In middle step, the algorithm can select at most |
pf |
Weight of lasso penalty. Default is a vector of value |
eps |
Convergence threshold for coordinate descent difference between iterations. Default value is |
maxit |
Maximum iteration times for coordinate descent for all lambda. Default value is |
sml |
Threshold for ratio of loss function change after each iteration to old loss function value. Default value is |
verbose |
Indicates whether print out lambda during iteration or not. Default value is |
ceps |
Convergence threshold for cluster mean difference between iterations. Default value is |
initial |
Whether to initialize algorithm with K-means clustering. Default value is |
vec_x |
Vectorized tensor data. Default value is |
The DEEM
function implements the Doubly-Enhanced EM algorithm (DEEM) for tensor clustering. The observations \mathbf{X}_i are assumed to be following the tensor normal mixture model (TNMM) with common covariances across different clusters:
\mathbf{X}_i\sim∑_{k=1}^Kπ_k \mathrm{TN}(\bm{μ}_k;\bm{Σ}_1,…,\bm{Σ}_M),\quad i=1,…,n,
where 0<π_k<1 is the prior probability for \mathbf{X} to be in the k-th cluster such that ∑_{k=1}^{K}π_k=1, \bm{μ}_k is the cluster mean of the k-th cluster and \bm{Σ}_1,…,\bm{Σ}_M) are the common covariances across different clusters. Under the TNMM framework, the optimal clustering rule can be showed as
\widehat{Y}^{opt}=\arg\max_k\{\logπ_k+\langle\mathbf{X}-(\bm{μ}_1+\bm{μ}_k)/2,\mathbf{B}_k\rangle\},
where \mathbf{B}_k=[\![\bm{μ}_k-\bm{μ}_1;\bm{Σ}_1^{-1},…,\bm{Σ}_M^{-1}]\!]. In the enhanced E-step, DEEM
imposes sparsity directly on the optimal clustering rule as a flexible alternative to popular low-rank assumptions on tensor coefficients \mathbf{B}_k as
\min_{\mathbf{B}_2,…,\mathbf{B}_K}\bigg[∑_{k=2}^K(\langle\mathbf{B}_k,[\![\mathbf{B}_k,\widehat{\bm{Σ}}_1^{(t)},…,\widehat{\bm{Σ}}_M^{(t)}]\!]\rangle-2\langle\mathbf{B}_k,\widehat{\bm{μ}}_k^{(t)}-\widehat{\bm{μ}}_1^{(t)}\rangle) +λ^{(t+1)}∑_{\mathcal{J}}√{∑_{k=2}^Kb_{k,\mathcal{J}}^2}\bigg],
where λ^{(t+1)} is a tuning parameter. In the enhanced M-step, DEEM
employs a new estimator for the tensor correlation structure, which facilitates both the computation and the theoretical studies.
pi |
A vector of estimated prior probabilities for clusters. |
mu |
A list of estimated cluster means. |
sigma |
A list of estimated covariance matrices. |
gamma |
A |
y |
A vector of estimated labels. |
iter |
Number of iterations until convergence. |
df |
Average zero elements in beta over iterations. |
beta |
A matrix of vectorized |
Kai Deng, Yuqing Pan, Xin Zhang and Qing Mai
Mai, Q., Zhang, X., Pan, Y. and Deng, K. (2021). A Doubly-Enhanced EM Algorithm for Model-Based Tensor Clustering. Journal of the American Statistical Association.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 | dimen = c(5,5,5)
nvars = prod(dimen)
K = 2
n = 100
sigma = array(list(),3)
sigma[[1]] = sigma[[2]] = sigma[[3]] = diag(5)
B2=array(0,dim=dimen)
B2[1:3,1,1]=2
y = c(rep(1,50),rep(2,50))
M = array(list(),K)
M[[1]] = array(0,dim=dimen)
M[[2]] = B2
vec_x=matrix(rnorm(n*prod(dimen)),ncol=n)
X=array(list(),n)
for (i in 1:n){
X[[i]] = array(vec_x[,i],dim=dimen)
X[[i]] = M[[y[i]]] + X[[i]]
}
myfit = DEEM(X, nclass=2, lambda=0.05)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.