clest: Clest: A prediction-based resampling method for estimating...

Description Usage Arguments Value References

View source: R/clest.R

Description

This method measures the similarity of two clustering computed on two non-overlapping subsamples of the dataset. The obtained measures are then compared with other similarity measures taken on generated datasets. Those generated datasets are computed as in the gap statistics method (Tibshirani et al, 2001).

Usage

1
2
3
clest(X, maxK, clusterAlg = myKmean, similarity = adj.rand.index,
  pmax = 0.05, dmin = 0.05, B = 50, B0 = 20, rho = 0.6,
  verbose = TRUE, ...)

Arguments

X

data matrix or data frame of size n x d, n observations and d features

maxK

maximum number of clusters to evaluate.

clusterAlg

clustering algorithm. Its output must be a list having a compoment "cluster" containing the assignation of each observation. For more details, check the formatting of function myKmean.

similarity

function measuring the similarity between two partitions.

pmax

threshold for the p-value

dmin

threshold for the d-value

B

number of resampling iterations

B0

number of reference datasets to generate

rho

proportion of the train/test set

verbose

logical, if TRUE, plots the evolution of the algorithm

...

additional parameters for the clustering algorithm

Value

list of 3 attributes:

p

vector of p-values

d

vector of d-values

kopt

optimal number of clusters

References


mattmail/clusterAnalysis documentation built on Nov. 4, 2019, 6:18 p.m.