test_clusters_approx: Monte Carlo significance test for any clustering method
In lucylgao/clusterpval: Clusterpval: P-Values for Differences in Means after Clustering

test_clusters_approx

R Documentation

Monte Carlo significance test for any clustering method

Description

This function performs a user-specified clustering method cl_fun on the rows of a data matrix to obtain K clusters, and tests the null hypothesis of no difference in means between clusters k1 and k2.

Usage

test_clusters_approx(
  X,
  k1,
  k2,
  iso = TRUE,
  sig = NULL,
  SigInv = NULL,
  ndraws = 2000,
  cl_fun,
  cl = NULL
)

Arguments

`X`	`n` by `p` matrix containing numeric data.
`k1, k2`	Integers selecting the clusters to test.
`iso`	Boolean. If `TRUE`, isotropic covariance matrix model, otherwise not.
`sig`	Optional scalar specifying `\sigma`, relevant if `iso` is `TRUE`.
`SigInv`	Optional matrix specifying `\Sigma^{-1}`, relevant if `iso` is `FALSE`.
`ndraws`	Integer selecting the number of importance samples, default of 2000.
`cl_fun`	Function returning assignments to clusters 1 through `K`.
`cl`	Optionally pass in the results of calling `cl_fun` on your data. This is for efficiency and reproducibility (when the clustering function is non-deterministic).

Details

In order to account for the fact that the clusters have been estimated from the data, the p-values are computed conditional on the fact that those clusters were estimated. This function approximates p-values via importance sampling.

This function assumes that cl_fun takes a n \times p numeric data matrix as input and outputs integer assignments to clusters 1 through K.

Thank you to August Guang for providing code to speed-up the function by parallelizing via the future package.

Value

`stat`	the test statistic: the Euclidean distance between the mean of cluster `k1` and the mean of cluster `k2`
`pval`	the approximate p-value
`stderr`	standard error of the p-value estimate
`clusters`	the estimated cluster assignments

References

Lucy L. Gao et al. "Selective inference for hierarchical clustering".

Examples

# Simulates a 100 x 2 data set with three clusters
set.seed(123)
dat <- rbind(c(-1, 0), c(0, sqrt(3)), c(1, 0))[rep(1:3, length=100), ] + 
matrix(0.2*rnorm(200), 100, 2)

# Function to run k-means clustering w/ k = 3 and 50 random starts
km_cluster <- function(X) { 
 km <- kmeans(X, 3, nstart=50)
 return(km$cluster)
}

# Cluster data using k-means 
clusters <- km_cluster(dat)
table(rep(1:3, length=100), clusters)

# tests for a difference in means between clusters 1 and 2
# We pass in earlier k-means clustering results from earlier
results <- test_clusters_approx(dat, k1=1, k2=2, cl_fun=km_cluster, ndraws=500, cl=clusters)
results$stat
results$pval
results$stderr

lucylgao/clusterpval documentation built on July 4, 2023, 4:40 p.m.

lucylgao/clusterpval index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lucylgao/clusterpval
Clusterpval: P-Values for Differences in Means after Clustering

test_clusters_approx: Monte Carlo significance test for any clustering method
In lucylgao/clusterpval: Clusterpval: P-Values for Differences in Means after Clustering

Monte Carlo significance test for any clustering method

Description

Usage

Arguments

Details

Value

References

Examples

Related to test_clusters_approx in lucylgao/clusterpval...

R Package Documentation

Browse R Packages

We want your feedback!

lucylgao/clusterpval Clusterpval: P-Values for Differences in Means after Clustering

test_clusters_approx: Monte Carlo significance test for any clustering method In lucylgao/clusterpval: Clusterpval: P-Values for Differences in Means after Clustering

Monte Carlo significance test for any clustering method

Description

Usage

Arguments

Details

Value

References

Examples

Related to test_clusters_approx in lucylgao/clusterpval...

R Package Documentation

Browse R Packages

We want your feedback!

lucylgao/clusterpval
Clusterpval: P-Values for Differences in Means after Clustering

test_clusters_approx: Monte Carlo significance test for any clustering method
In lucylgao/clusterpval: Clusterpval: P-Values for Differences in Means after Clustering