test_indep_clust: Pseudo likelihood ratio test for dependent clusterings
In multiviewtest: Hypothesis Tests for Association Between Subgroups in Two Data Views

Description Usage Arguments Value References Examples

Implements the pseudo likelihood ratio test described in Section 3 of Gao et. al. (2019) "Are Clusterings of Multiple Data Views Independent?" for testing for dependence between clusterings of two data views. Fits Gaussian mixture models in each view.

test_indep_clust(
  x,
  model1 = "EII",
  model2 = "EII",
  K1 = NULL,
  K2 = NULL,
  init1 = NULL,
  init2 = NULL,
  B = 200,
  step = 0.001,
  maxiter = 1000
)

`x`	Multi-view data with two views; a list of two numeric vectors (in the case of univariate data) or matrices containing the two data views. In matrix format, rows correspond to observations and columns correspond to variables.
`model1`	A character string indicating the model to be fitted for Gaussian model-based clustering in view 1 using the function `Mclust`. The default is `"EII"` (spherical, equal volume). The help file for `mclustModelNames` describes the available model options.
`model2`	A character string indicating indicating the model to be fitted for Gaussian model-based clustering in view 1 using the function `Mclust`. The default is `"EII"` (spherical, equal volume). The help file for `mclustModelNames` describes the available model options.
`K1`	An optional argument containing the number of clusters in View 1. If left out, then the number of clusters is chosen with BIC as described in Section 2.3.3 of "Are Clusterings of Multiple Data Views Independent?"
`K2`	An optional argument containing the number of clusters in View 2. If left out, then the number of clusters is chosen with BIC as described in Section 2.3.3 of "Are Clusterings of Multiple Data Views Independent?"
`init1`	An optional argument containing the model to be fitted in the hierarchical clustering initialization in Gaussian model-based clustering in view 1. The default is `"VVV"` (ellipsoidal, varying volume, shape, and orientation). The help file for `hc` describes the available model options.
`init2`	An optional argument containing the model to be fitted in the hierarchical clustering initialization in Gaussian model-based clustering in view 2. The default is `"VVV"` (ellipsoidal, varying volume, shape, and orientation). The help file for `hc` describes the available model options.
`B`	An integer specifying the number of permutations to use for the permutation procedure. The default number is 200.
`step`	A numeric value containing the fixed step size to be used in the optimization algorithm for estimating Pi. The default step size is 0.001. See Supplement C of "Are Clusterings of Multiple Data Views Independent?" for details.
`maxiter`	A numeric value containing the maximum number of iterations to run in the optimization algorithm. The default maximum is 1000.

A list containing the following output components:

`K1`	The number of clusters in view 1
`K2`	The number of clusters in view 2
`Pi.est`	The estimated Pi matrix
`PLRstat`	The pseudo likelihood ratio test statistic
`pval`	The p-value
`modelfit1`	The object of class '`Mclust`' corresponding to the model-based clustering fitted in View 1; contains eg. estimated parameters and cluster assignments. The help file for `Mclust` describes the components of the object.
`modelfit2`	The object of class '`Mclust`' corresponding to the model-based clustering fitted in View 2; contains eg. estimated parameters and cluster assignments. The help file for `Mclust` describes the components of the object.

Fraley C. and Raftery A. E. (2002) Model-based clustering, discriminant analysis and density estimation, Journal of the American Statistical Association, 97/458, pp. 611-631.

Gao, L.L., Bien, J., Witten, D. (2019) Are Clusterings of Multiple Data Views Independent? Biostatistics, DOI: 10.1093/biostatistics/kxz001.

Scrucca L., Fop M., Murphy T. B. and Raftery A. E. (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models, The R Journal, 8/1, pp. 205-233.

set.seed(1)
n <- 50
sig <- 2
p <- 2
K <- 3
mu1 <- cbind(c(2, 0), c(0, 2),  c(2, -2), c(-2, 0), c(0, -2), c(-2, 2))
mu2 <- cbind(c(-2, 0), c(0, -2), c(-2, 2), c(2, 0), c(0, 2), c(2, -2))
# Generates two-view data where the clusters are independent.
x1 <- list(matrix(sig* rnorm(n*p), n, p) + t(mu1)[sample(1:K, n, replace=TRUE), ],
         matrix(sig * rnorm(n*p), n, p) + t(mu2)[sample(1:K, n, replace=TRUE), ])

# Generate two-view data where the clusters are identical.
n <- 70
cl <- sample(1:K, n, replace=TRUE)
x2 <- list(matrix(sig* rnorm(n*p), n, p) + t(mu1)[cl, ],
         matrix(sig * rnorm(n*p), n, p) + t(mu2)[cl, ])

# Run the function on independent data views; we do not reject the null hypothesis.
 # By default, not specifying K1 and K2 means the number of clusters
 # to use in the test in each view is chosen via BIC.
 # Covariance matrix model specified is shared sigma^2 I covariance matrix in view 1
 # and shared diagonal covariance matrix in view 2.
 # B specifies the number of permutations to do for the permutation test.
 # Covariance matrix model specified for initialization
 # is shared sigma^2 I covariance matrix in view 1
indep1 <- test_indep_clust(x1,model1="EII", model2="EEI",
init1="EII", B=52)
# The estimated cluster parameters in view 1
indep1$modelfit1$parameters
# The cluster assignments in view 2
indep1$modelfit2$classification

# Run the function on identical data views; we reject the null hypothesis
# We specify the number of clusters in each view to use in the test.
# Covariance matrix model specified is shared covariance matrix in view 1
# and shared diagonal covariance matrix in view 2.
# See mclust documentation for more covariance model specification options.
identical2 <- test_indep_clust(x2,model1="EEE", model2="EEI", K1=2, K2=3, B=51)
# P-value
identical2$pval

multiviewtest documentation built on Oct. 13, 2021, 5:08 p.m.

multiviewtest index

Package overview README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

multiviewtest
Hypothesis Tests for Association Between Subgroups in Two Data Views

test_indep_clust: Pseudo likelihood ratio test for dependent clusterings
In multiviewtest: Hypothesis Tests for Association Between Subgroups in Two Data Views

Description

Usage

Arguments

Value

References

Examples

Related to test_indep_clust in multiviewtest...

R Package Documentation

Browse R Packages

We want your feedback!

multiviewtest Hypothesis Tests for Association Between Subgroups in Two Data Views

test_indep_clust: Pseudo likelihood ratio test for dependent clusterings In multiviewtest: Hypothesis Tests for Association Between Subgroups in Two Data Views

Description

Usage

Arguments

Value

References

Examples

Related to test_indep_clust in multiviewtest...

R Package Documentation

Browse R Packages

We want your feedback!

multiviewtest
Hypothesis Tests for Association Between Subgroups in Two Data Views

test_indep_clust: Pseudo likelihood ratio test for dependent clusterings
In multiviewtest: Hypothesis Tests for Association Between Subgroups in Two Data Views