Compute.Chi.sq: Function to evaluate if a set of similarity distributions...

Compute.Chi.sqR Documentation

Function to evaluate if a set of similarity distributions significantly differ using the chi square test.

Description

The set of similarity values for a specific value of k (number of clusters) are subdivided in two groups choosing a threshold for the similarity value (default 0.9). Then different sets are compared using the chi squared test for multiple proportions. The number of degrees of freedom are equal to the number of the different sets minus 1. This function is iteratively used by Chi.square.compute.pvalues.

Usage

Compute.Chi.sq(M, s0 = 0.9)

Arguments

M

matrix representing the similarity values for different number of clusters. Each row represents similarity values for a number of clusters. Number of rows ==> how many numbers of clusters are considered; number of columns ==> cardinality of the similarity values for a given number of clusters

s0

threshold for the similarity value (default 0.9)

Value

p-value (type I error) associated with the null hypothesis (no difference between the considered set of k-clusterings)

Author(s)

Giorgio Valentini valentini@di.unimi.it

References

A.Bertoni, G. Valentini, Model order selection for clustered bio-molecular data, In: Probabilistic Modeling and Machine Learning in Structural and Systems Biology, J. Rousu, S. Kaski and E. Ukkonen (Eds.), Tuusula, Finland, 17-18 June, 2006

See Also

Chi.square.compute.pvalues

Examples

library("clusterv")
# Synthetic data set generation
M <- generate.sample6 (n=10, m=15, dim=800, d=3, s=0.2)
# computing the similarity matrix using random projections and hierarchcial clustering
Sim <- do.similarity.projection(M, c=6, nprojections=20, dim=JL.predict.dim(60,epsilon=0.2))
# Evaluating the p-value for the group of the 5 clusterings (from 2 to 6 clusters)
Compute.Chi.sq(Sim)
# the same, considering only the clusterings wih 2 and 6 clusters:
Compute.Chi.sq(Sim[c(1,5),])

mosclust documentation built on June 8, 2025, 11:23 a.m.