Description Usage Arguments Details Value References Examples
Perform overlapping (variable) clustering of a p- dimensional feature generated from the latent factor model
X = AZ + E
with identifiability conditions on A and Cov(Z).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
X |
A n by p data matrix. |
lbd |
The grid of leading constant of λ. |
mu |
The leading constant used for thresholding the loading matrix. |
est_non_pure_row |
String. Procedure used for estimating the non-pure rows. One of {"HT", "ST", "Dantzig"}. |
verbose |
Logical. Set FALSE to suppress printing the progress. |
pure_homo |
Logical. TRUE if the pure loadings have the same magnitude. |
diagonal |
Logical. If TRUE, the covariance matrix of Z is diagonal; else FALSE. |
delta |
The grid of leading constant of δ. |
merge |
Logical. If TRUE, take the union of all candidate pure variables; otherwise, take the intersection. |
rep_CV |
The number of repetitions used for cross validation. |
ndelta |
Integer. The length of the grid of |
q |
Either |
exact |
Logical. Only active for compute the |
max_pure |
A numeric value between (0, 1] specifying the maximal
proportion of pure variables. Default is NULL. When not specified,
|
nfolds |
The number of folds. Default is 10. |
LOVE
performs overlapping clustering of the feature variables
X generated from the latent factor model
X = AZ+E
where the loading matrix A and the covariance matrix of Z satisfy certain identifiability conditions. The main goal is to estimate the loading matrix A whose support is used to form overlapping groups of X.
The first step estimates the pure loadings, defined as the rows of A that are proportional to canonical vectors. When the pure loadings are expected to have the same magnitudes (up to the sign), for instance,
A_{1.} = (1, 0, 0), A_{2.} = (-1, 0, 0),
the estimation of pure loadings is done via setting pure_homo
to
TRUE
. When different magnitudes are expected for the pure loadings,
such as
A_{1.} = (1, 0, 0), A_{2.} = (-0.5, 0, 0),
the estimation uses a different approach by setting setting pure_homo
to FALSE
.
The second step estimates the non-pure (mixed) loadings of A. Three
procedures are available as specified by est_non_pure_row
. The choice
"HT" specifies the estimation via hard-thresholding that is computationally
fast while "ST" uses soft-thresholding instead. Both "ST" and "Dantzig"
resort to solving linear programs. Another difference of "Dantzig" from "HT"
and "ST" is that the former does not require to estimate the precision
matrix of Z.
A list of objects including:
K
The estimated number of clusters.
pureVec
The estimated set of pure variables.
pureInd
The estimated partition of pure variables.
group
The estimated clusters (indices of each cluster).
A
The estimated p by K assignment matrix.
C
The covariance matrix of Z.
Omega
The precision matrix of Z.
Gamma
The diagonal of the covariance matrix of E.
optDelta
The selected value of δ.
Bing, X., Bunea, F., Yang N and Wegkamp, M. (2020) Adaptive estimation in structured factor models with applications to overlapping clustering, Annals of Statistics, Vol.48(4) 2055 - 2081, August 2020. https://projecteuclid.org/journals/annals-of-statistics/volume-48/issue-4/Adaptive-estimation-in-structured-factor-models-with-applications-to-overlapping/10.1214/19-AOS1877.short
Bing, X., Bunea, F. and Wegkamp, M. (2021) Detecting approximate replicate components of a high-dimensional random vector with latent structure. https://arxiv.org/abs/2010.02288.
1 2 3 4 5 6 7 8 9 | p <- 6
n <- 100
K <- 2
A <- rbind(c(1, 0), c(-1, 0), c(0, 1), c(0, 1), c(1/3, 2/3), c(1/2, -1/2))
Z <- matrix(rnorm(n * K, sd = sqrt(2)), n, K)
E <- matrix(rnorm(n * p), n, p)
X <- Z %*% t(A) + E
res_LOVE <- LOVE(X, pure_homo = FALSE, delta = NULL)
res_LOVE <- LOVE(X, pure_homo = TRUE, delta = seq(0.1, 1.1 ,0.1))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.