Home

/

CRAN

/

UNPaC

/

UNPaC_Copula: Unimodal Non-Parametric Cluster (UNPaC) Significance Test

UNPaC_Copula: Unimodal Non-Parametric Cluster (UNPaC) Significance Test
In UNPaC: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

View source: R/UNPaC_Copula.R

UNPaC_Copula

R Documentation

Unimodal Non-Parametric Cluster (UNPaC) Significance Test

Description

The UnPAC test assesses the significance of clusters by comparing the cluster index (CI) from the data to the CI from a ortho-unimodal reference data generated using a Gaussian copula. This method is described in Helgeson, Vock, and Bair (2021). The CI is defined to be the sum of the within-cluster sum of squares about the cluster means divided by the total sum of squares. Smaller values of the CI indicate a stronger clustering.

Usage

UNPaC_Copula(
  x,
  cluster,
  cluster.fun,
  nsim = 100,
  var_selection = FALSE,
  gamma = 0.1,
  p.adjust = "fdr",
  k = 2,
  rho = 0.02,
  cov = "glasso",
  center = TRUE,
  scale = FALSE
)

Arguments

`x`	a dataset with n observations (rows) and p features (columns)
`cluster`	labels generated by clustering method
`cluster.fun`	function used to cluster data. Function should return list containing a component "cluster." Examples include `kmeans` and `pam`.
`nsim`	a numeric value specifying the number of unimodal reference distributions used for testing (default=100)
`var_selection`	should dimension be reduced using feature filtering procedure? See description below. (default=FALSE)
`gamma`	threshold for feature filtering procedure. See description below. Not used if var_selection=FALSE (default=0.10)
`p.adjust`	p-value adjustment method for additional feature filtering. See `p.adjust` for options. (default="fdr"). Not used if p.adjust="none."
`k`	integer value specifying the number of clusters to test (default=2)
`rho`	a regularization parameter used in implementation of the graphical lasso. See documentation for lambda in `huge`. Not used if `cov="est"` or `cov="banded"`
`cov`	method used for approximating the covariance structure. options include: "glasso" (See `huge`), "banded" (See `band.chol.cv`) and "est" (default = "glasso")
`center`	should data be centered such that each feature has mean equal to zero prior to clustering (default=TRUE)
`scale`	should data be scaled such that each feature has variance equal to one prior to clustering (default=FALSE)

Details

There are three options for the covariance matrix used in generating the Gaussian copula: sample covariance estimation, cov="est", which should be used if n>p; the graphical lasso, cov="glasso", which should be used if n<p; and k-banded covariance, cov="banded", which can be used if n<p and it can be assumed that features farther away in the ordering have weaker covariance. The graphical lasso is implemented using the huge function. When cov="banded" is selected the k-banded covariance Cholesky factor of Rothman, Levina, and Zhu (2010) is used to estimate the covariance matrix. Cross-validation is used for selecting the banding parameter. See documentation in band.chol.cv.

In high dimensional (n<p) settings a dimension reduction step can be implemented which selects features based on an F-test for difference in means across clusters. Features having a p-value less than a threshold gamma are retained. For additional feature filtering a p-value adjustment procedure (such as p.adjust="fdr") can be used. If no features are retained the resulting p-value for the cluster significance test is given as 1.

Value

The function returns a list with the following components:

selected_features: A vector of integers indicating the features retained by the feature filtering process.
sim_CI: vector containing the cluster indices for each generated unimodal reference distribution
pvalue_emp: the empirical p-value: the proportion of times the cluster index from the reference data is smaller the cluster index from the observed data
pvalue_norm: the normalized p-value: the simulated p-value based on comparison to a standard normal distribution

Author(s)

Erika S. Helgeson, David Vock, Eric Bair

References

Helgeson, ES, Vock, DM, and Bair, E. (2021) “Nonparametric cluster significance testing with reference to a unimodal null distribution." Biometrics 77: 1215– 1226. < https://doi.org/10.1111/biom.13376 >
Rothman, A. J., Levina, E., and Zhu, J. (2010). “A new approach to Cholesky-based covariance regularization in high dimensions." Biometrika 97(3): 539-550.

Examples

# K-means example
test1 <- matrix(rnorm(100*50), nrow=100, ncol=50)
test1[1:30,1:50] <- rnorm(30*50, 2)
test.data<-scale(test1,scale=FALSE,center=TRUE)
cluster<-kmeans(test.data,2)$cluster
UNPaCResults <- UNPaC_Copula(test.data,cluster,kmeans, nsim=100,cov="est")

# Hierarchical clustering example
 
test <- matrix(nrow=1200, ncol=75)
theta <- rep(NA, 1200)
theta[1:500] <- runif(500, 0, pi)
theta[501:1200] <- runif(700, pi, 2*pi)
test[1:500,seq(from=2,to=50,by=2)] <- -2+5*sin(theta[1:500])
test[501:1200,seq(from=2,to=50,by=2)] <- 5*sin(theta[501:1200])
test[1:500,seq(from=1,to=49,by=2)] <- 5+5*cos(theta[1:500])
test[501:1200,seq(from=1,to=49,by=2)] <- 5*cos(theta[501:1200])
test[,1:50] <- test[,1:50] + rnorm(50*1200, 0, 0.2)
test[,51:75] <- rnorm(25*1200, 0, 1)
test.data<-scale(test,center=TRUE,scale=FALSE)
# Defining clustering function
hclustFunction<-function(x,k){
 D<-stats::dist(x)
 xn.hc <- hclust(D, method="single")
 list(cluster=cutree(xn.hc, k))}

cluster=hclustFunction(test.data,2)$cluster
UNPaCResults <- UNPaC_Copula(test.data,cluster,hclustFunction, nsim=100,cov="est")

UNPaC documentation built on June 10, 2022, 1:06 a.m.

UNPaC index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

UNPaC
Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

UNPaC_Copula: Unimodal Non-Parametric Cluster (UNPaC) Significance Test
In UNPaC: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Unimodal Non-Parametric Cluster (UNPaC) Significance Test

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to UNPaC_Copula in UNPaC...

R Package Documentation

Browse R Packages

We want your feedback!

UNPaC Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

UNPaC_Copula: Unimodal Non-Parametric Cluster (UNPaC) Significance Test In UNPaC: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

Unimodal Non-Parametric Cluster (UNPaC) Significance Test

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to UNPaC_Copula in UNPaC...

R Package Documentation

Browse R Packages

We want your feedback!

UNPaC
Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution

UNPaC_Copula: Unimodal Non-Parametric Cluster (UNPaC) Significance Test
In UNPaC: Non-Parametric Cluster Significance Testing with Reference to a Unimodal Null Distribution