Hopkins: Hopkins statisic for cluster tendency

Description Usage Arguments Details Value Author(s) References Examples

Description

Function to compute the Hopkins statistic for datasets given a certain sample size. Indicates the cluster tendency in data

Usage

1
Hopkins(dataset, sample_size=0.1)

Arguments

dataset

The dataset for which a Hopkins statistic is returned

sample_size

The sample size as a proportion of the total number of observations in data. The greater the sample size, the more accurate Hopkins statistic is produced. Increased sample size has exponential increased complexity

Details

The Hopkins statistic is useful as a test for cluster tendency in data. By creating a uniform distribution in data space, the distance to nearest original data point is calculated. The sum of distance to original data points is compared to sum of distance between original data points. The function returns an index between 0 and 1, where 1 characterize data partitioned in clusters, 0.5 characterize random uniformly distributed data and 0 characterize random data

Value

The Hopkins statistic

Author(s)

Jacob H. Madsen

References

Hopkins, B. (1954). A New Method for determining the Type of Distribution of Plant Individuals. Annals of Botany. Vol. 18, pp. 213–227

Examples

1
2
3
4
5
## Select some arbitrary dataset
X <- iris[,1:4]

## Run Hopkins statistic
Hopkins(X, 0.4)

jhmadsen/ClustTools documentation built on May 24, 2019, 9:54 p.m.