LOVE: LOVE: Latent-model based OVErlapping clustering

Description Usage Arguments Details Value References Examples

View source: R/LOVE.R

Description

Perform overlapping (variable) clustering of a p- dimensional feature generated from the latent factor model

X = AZ + E

with identifiability conditions on A and Cov(Z).

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
LOVE(
  X,
  lbd = 0.5,
  mu = 0.5,
  est_non_pure_row = "HT",
  verbose = FALSE,
  pure_homo = FALSE,
  diagonal = FALSE,
  delta = NULL,
  merge = FALSE,
  rep_CV = 50,
  ndelta = 50,
  q = 2,
  exact = FALSE,
  max_pure = NULL,
  nfolds = 10
)

Arguments

X

A n by p data matrix.

lbd

The grid of leading constant of λ.

mu

The leading constant used for thresholding the loading matrix.

est_non_pure_row

String. Procedure used for estimating the non-pure rows. One of {"HT", "ST", "Dantzig"}.

verbose

Logical. Set FALSE to suppress printing the progress.

pure_homo

Logical. TRUE if the pure loadings have the same magnitude.

diagonal

Logical. If TRUE, the covariance matrix of Z is diagonal; else FALSE.

delta

The grid of leading constant of δ.

merge

Logical. If TRUE, take the union of all candidate pure variables; otherwise, take the intersection.

rep_CV

The number of repetitions used for cross validation.

ndelta

Integer. The length of the grid of delta.

q

Either 2 or Inf to specify the type of score.

exact

Logical. Only active for compute the Inf score. If TRUE, compute the Inf score exactly via solving a linear program. Otherwise, use approximation to compute Inf score.

max_pure

A numeric value between (0, 1] specifying the maximal proportion of pure variables. Default is NULL. When not specified, max_pure = 1 if n > p, max_pure = 0.8 otherwise.

nfolds

The number of folds. Default is 10.

Details

LOVE performs overlapping clustering of the feature variables X generated from the latent factor model

X = AZ+E

where the loading matrix A and the covariance matrix of Z satisfy certain identifiability conditions. The main goal is to estimate the loading matrix A whose support is used to form overlapping groups of X.

The first step estimates the pure loadings, defined as the rows of A that are proportional to canonical vectors. When the pure loadings are expected to have the same magnitudes (up to the sign), for instance,

A_{1.} = (1, 0, 0), A_{2.} = (-1, 0, 0),

the estimation of pure loadings is done via setting pure_homo to TRUE. When different magnitudes are expected for the pure loadings, such as

A_{1.} = (1, 0, 0), A_{2.} = (-0.5, 0, 0),

the estimation uses a different approach by setting setting pure_homo to FALSE.

The second step estimates the non-pure (mixed) loadings of A. Three procedures are available as specified by est_non_pure_row. The choice "HT" specifies the estimation via hard-thresholding that is computationally fast while "ST" uses soft-thresholding instead. Both "ST" and "Dantzig" resort to solving linear programs. Another difference of "Dantzig" from "HT" and "ST" is that the former does not require to estimate the precision matrix of Z.

Value

A list of objects including:

References

Bing, X., Bunea, F., Yang N and Wegkamp, M. (2020) Adaptive estimation in structured factor models with applications to overlapping clustering, Annals of Statistics, Vol.48(4) 2055 - 2081, August 2020. https://projecteuclid.org/journals/annals-of-statistics/volume-48/issue-4/Adaptive-estimation-in-structured-factor-models-with-applications-to-overlapping/10.1214/19-AOS1877.short

Bing, X., Bunea, F. and Wegkamp, M. (2021) Detecting approximate replicate components of a high-dimensional random vector with latent structure. https://arxiv.org/abs/2010.02288.

Examples

1
2
3
4
5
6
7
8
9
p <- 6
n <- 100
K <- 2
A <- rbind(c(1, 0), c(-1, 0), c(0, 1), c(0, 1), c(1/3, 2/3), c(1/2, -1/2))
Z <- matrix(rnorm(n * K, sd = sqrt(2)), n, K)
E <- matrix(rnorm(n * p), n, p)
X <- Z %*% t(A) + E
res_LOVE <- LOVE(X, pure_homo = FALSE, delta = NULL)
res_LOVE <- LOVE(X, pure_homo = TRUE, delta = seq(0.1, 1.1 ,0.1))

bingx1990/LOVE documentation built on Jan. 23, 2022, 1:30 a.m.