get_clustering_stats: Get clustering statistics In scclust: Size-Constrained Clustering

Description

`get_clustering_stats` calculates statistics of a clustering.

Usage

 `1` ```get_clustering_stats(distances, clustering) ```

Arguments

 `distances` a `distances` object describing the distances between the data points in `clustering`. `clustering` a `scclust` object containing a non-empty clustering.

Details

The function reports the following measures:

 `num_data_points` total number of data points `num_assigned` number of points assigned to a cluster `num_clusters` number of clusters `min_cluster_size` size of the smallest cluster `max_cluster_size` size of the largest cluster `avg_cluster_size` average cluster size `sum_dists` sum of all within-cluster distances `min_dist` smallest within-cluster distance `max_dist` largest within-cluster distance `avg_min_dist` average of the clusters' smallest distances `avg_max_dist` average of the clusters' largest distances `avg_dist_weighted` average of the clusters' average distances weighed by cluster size `avg_dist_unweighted` average of the clusters' average distances (unweighed)

Let d(i,j) denote the distance between data points i and j. Let c be a cluster containing the indices of points assigned to the cluster. Let

D(c) = { d(i,j) : i,j in c and i > j }

be a function returning all within-cluster distances in c. Let C be a set containing all clusters.

`sum_dists` is defined as:

∑_[c in C] sum(D(c))

`min_dist` is defined as:

min_[c in C] min(D(c))

`max_dist` is defined as:

max_[c in C] max(D(c))

`avg_min_dist` is defined as:

∑_[c in C] min(D(c)) / count(C)

`avg_max_dist` is defined as:

∑_[c in C] max(D(c)) / count(C)

Let:

AD(c) = sum(D(c)) / count(D(c))

be the average within-cluster distance in cluster c.

`avg_dist_weighted` is defined as:

∑_[c in C] count(c) * AD(c) / num_assigned

where num_assigned is the number of assigned data points (see above).

`avg_dist_unweighted` is defined as:

∑_[c in C] AD(c) / count(C)

Value

Returns a list of class `clustering_stats` containing the statistics.

Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26``` ```my_data_points <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0), y = c(10, 9, 8, 7, 6, 10, 9, 8, 7, 6)) my_distances <- distances(my_data_points) my_scclust <- scclust(c("A", "A", "B", "C", "B", "C", "C", "A", "B", "B")) get_clustering_stats(my_distances, my_scclust) # > Value # > num_data_points 10.0000000 # > num_assigned 10.0000000 # > num_clusters 3.0000000 # > min_cluster_size 3.0000000 # > max_cluster_size 4.0000000 # > avg_cluster_size 3.3333333 # > sum_dists 18.2013097 # > min_dist 0.5000000 # > max_dist 3.0066593 # > avg_min_dist 0.8366584 # > avg_max_dist 2.4148611 # > avg_dist_weighted 1.5575594 # > avg_dist_unweighted 1.5847484 ```

scclust documentation built on May 2, 2019, 4:04 p.m.