Hierarchical.sim.resampling: Function to compute similarity indices using resampling...

Hierarchical.sim.resamplingR Documentation

Function to compute similarity indices using resampling techniques and hierarchical clustering.

Description

Function to compute similarity indices using resampling techniques and hierarchical clustering. A vector of similarity measures between pairs of clusterings perturbed with resampling techniques is computed for a given number of clusters. The fraction of the resampled data (without replacement), the similarity measure and the type of hierarchical clustering may be selected.

Usage

Hierarchical.sim.resampling(X, c = 2, nsub = 100, f = 0.8, s = sFM, 
                            distance = "euclidean", hmethod = "ward.D")

Arguments

X

matrix of data (variables are rows, examples columns)

c

number of clusters

nsub

number of subsamples

f

fraction of the data resampled without replacement

s

similarity function to be used. It may be one of the following: - sFM (Fowlkes and Mallows) - sJaccard (Jaccard) - sM (matching coefficient) (default Fowlkes and Mallows)

distance

it must be one of the two: "euclidean" (default) or "pearson" (that is 1 - Pearson correlation)

hmethod

the agglomeration method to be used. This parameter is used only by the hierarchical clustering algorithm. This should be one of the following: "ward.D", "single", "complete", "average", "mcquitty", "median" or "centroid", according of the hclust method of the package stats.

Value

vector of the computed similarity measures (length equal to nsub)

Author(s)

Giorgio Valentini valentini@di.unimi.it

See Also

Hierarchical.sim.projection, Hierarchical.sim.noise

Examples

library("clusterv")
# Synthetic data set generation
M <- generate.sample6 (n=20, m=10, dim=600, d=3, s=0.2);
# computing a vector of similarity indices with 2 clusters:
v2 <- Hierarchical.sim.resampling(M, c = 2, nsub = 20, f = 0.8, s = sFM)
# computing a vector of similarity indices with 3 clusters:
v3 <- Hierarchical.sim.resampling(M, c = 3, nsub = 20, f = 0.8, s = sFM)
# computing a vector of similarity indices with 2 clusters using the Jaccard index
v2J <- Hierarchical.sim.resampling(M, c = 2, nsub = 20, f = 0.8, s = sJaccard)
#  2 clusters using the Jaccard index and Pearson correlation
v2JP <- Hierarchical.sim.resampling(M, c = 2, nsub = 20, f = 0.8, s = sJaccard, 
                                    distance="pearson")

mosclust documentation built on June 8, 2025, 11:23 a.m.