test_indep_com_clust: Pseudo pseudolikelihood ratio test for association between...

Description Usage Arguments Value References Examples

View source: R/test_indep_com_clust.R

Description

Implements the pseudo pseudolikelihood ratio test described in Section 4 of Gao et. al. (2019) "Testing for Association in Multi-View Network Data" for testing for dependence between communities in a network data view and cluster in a multivariate view. Fits a stochastic block model in the network view, and a Gaussian mixture model in the multivariate view.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
test_indep_com_clust(
  X,
  K1 = NULL,
  K2 = NULL,
  model2 = "EII",
  init2 = NULL,
  nperm = 200,
  step = 0.001,
  maxiter = 1000,
  parallel = FALSE
)

Arguments

X

Multi-view data with two views; a list of two n x n adjacency matrices.

K1

An optional argument containing the number of communities in View 1. If left out, then the number of communities is chosen with the method of Le and Levina (2015).

K2

An optional argument containing the number of clusters in View 2. If left out, then the number of clusters is chosen with BIC.

model2

A character string indicating the model to be fitted for Gaussian model-based clustering in the multivariate view using the function Mclust. The default is "EII" (spherical, equal volume). The help file for mclustModelNames describes the available model options.

init2

An optional argument containing the model to be fitted in the hierarchical clustering initialization in Gaussian model-based clustering in in the multivariate view . The default is "VVV" (ellipsoidal, varying volume, shape, and orientation). The help file for hc describes the available model options.

nperm

An integer specifying the number of permutations to use for the permutation procedure. The default number is 200.

step

A numeric value containing the fixed step size to be used in the optimization algorithm for estimating Pi. The default step size is 0.001.

maxiter

A numeric value containing the maximum number of iterations to run in the optimization algorithm. The default maximum is 1000.

parallel

An optional argument allowing for parallel computing using the doParallel package

Value

A list containing the following output components:

K1

The number of communities in view 1

K2

The number of communities in view 2

Pi.est

The estimated Pi matrix

P2LRstat

The pseudo likelihood ratio test statistic

pval

The p-value

modelfit1

The parameter estimates and community assignment estimates from View 1.

modelfit2

The parameter estimates and community assignment estimates from View 2.

References

Amini, A. A., Chen, A., Bickel, P. J., & Levina, E. (2013). Pseudo-likelihood methods for community detection in large sparse networks. The Annals of Statistics, 41(4), 2097-2122.

Fraley C. and Raftery A. E. (2002) Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association, 97/458, pp. 611-631. Gao, L.L., Witten, D., Bien, J. Testing for Association in Multi-View Network Data, preprint.

Le, C. M., & Levina, E. (2015). Estimating the number of communities in networks by spectral methods. arXiv preprint arXiv:1507.00827.

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8/1, pp. 205-233.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
# 50 draws from a multi-view SBM, where the clusters 
# and the communities are independent
n <- 50
Pi <- tcrossprod(c(0.5, 0.5), c(0.5, 0.5))
theta1 <- rbind(c(0.5, 0.1), c(0.1, 0.5))
mu2 <- cbind(c(2, 2), c(-2, 2))
Sigma2 <- diag(rep(0.5, 2))

dat <- mv_sbm_gmm_gen(n, Pi, theta1, mu2, Sigma2)


# Test H0: communities are independent
# Data was generated under the null hypothesis
results <- test_indep_com_clust(dat$data, nperm=25)
results$pval

multiviewtest documentation built on Oct. 13, 2021, 5:08 p.m.