test_clusters_approx | R Documentation |
This function performs a user-specified clustering method cl_fun
on the rows of a
data matrix to obtain K
clusters, and tests the null hypothesis of no difference in means
between clusters k1
and k2
.
test_clusters_approx(
X,
k1,
k2,
iso = TRUE,
sig = NULL,
SigInv = NULL,
ndraws = 2000,
cl_fun,
cl = NULL
)
X |
|
k1, k2 |
Integers selecting the clusters to test. |
iso |
Boolean. If |
sig |
Optional scalar specifying |
SigInv |
Optional matrix specifying |
ndraws |
Integer selecting the number of importance samples, default of 2000. |
cl_fun |
Function returning assignments to clusters 1 through |
cl |
Optionally pass in the results of calling |
In order to account for the fact that the clusters have been estimated from the data, the p-values are computed conditional on the fact that those clusters were estimated. This function approximates p-values via importance sampling.
This function assumes that cl_fun
takes a n \times p
numeric data matrix as input
and outputs integer assignments to clusters 1 through K
.
Thank you to August Guang for providing code to speed-up the function by
parallelizing via the future
package.
stat |
the test statistic: the Euclidean distance between the mean of cluster |
pval |
the approximate p-value |
stderr |
standard error of the p-value estimate |
clusters |
the estimated cluster assignments |
Lucy L. Gao et al. "Selective inference for hierarchical clustering".
# Simulates a 100 x 2 data set with three clusters
set.seed(123)
dat <- rbind(c(-1, 0), c(0, sqrt(3)), c(1, 0))[rep(1:3, length=100), ] +
matrix(0.2*rnorm(200), 100, 2)
# Function to run k-means clustering w/ k = 3 and 50 random starts
km_cluster <- function(X) {
km <- kmeans(X, 3, nstart=50)
return(km$cluster)
}
# Cluster data using k-means
clusters <- km_cluster(dat)
table(rep(1:3, length=100), clusters)
# tests for a difference in means between clusters 1 and 2
# We pass in earlier k-means clustering results from earlier
results <- test_clusters_approx(dat, k1=1, k2=2, cl_fun=km_cluster, ndraws=500, cl=clusters)
results$stat
results$pval
results$stderr
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.