Description Usage Arguments Details Value Examples

`get_clustering_stats`

calculates statistics of a clustering.

1 | ```
get_clustering_stats(distances, clustering)
``` |

`distances` |
a |

`clustering` |
a |

The function reports the following measures:

`num_data_points` | total number of data points |

`num_assigned` | number of points assigned to a cluster |

`num_clusters` | number of clusters |

`min_cluster_size` | size of the smallest cluster |

`max_cluster_size` | size of the largest cluster |

`avg_cluster_size` | average cluster size |

`sum_dists` | sum of all within-cluster distances |

`min_dist` | smallest within-cluster distance |

`max_dist` | largest within-cluster distance |

`avg_min_dist` | average of the clusters' smallest distances |

`avg_max_dist` | average of the clusters' largest distances |

`avg_dist_weighted` | average of the clusters' average distances weighed by cluster size |

`avg_dist_unweighted` | average of the clusters' average distances (unweighed) |

Let *d(i,j)* denote the distance between data points *i*
and *j*. Let *c* be a cluster containing the indices of points
assigned to the cluster. Let

*D(c) = { d(i,j) : i,j in c and i > j }*

be a function returning all within-cluster distances in *c*. Let
*C* be a set containing all clusters.

`sum_dists`

is defined as:

*∑_[c in C] sum(D(c))*

`min_dist`

is defined as:

*min_[c in C] min(D(c))*

`max_dist`

is defined as:

*max_[c in C] max(D(c))*

`avg_min_dist`

is defined as:

*∑_[c in C] min(D(c)) / count(C)*

`avg_max_dist`

is defined as:

*∑_[c in C] max(D(c)) / count(C)*

Let:

*AD(c) = sum(D(c)) / count(D(c))*

be the average within-cluster distance in cluster *c*.

`avg_dist_weighted`

is defined as:

*∑_[c in C] count(c) * AD(c) / num_assigned*

where *num_assigned* is the number of assigned data
points (see above).

`avg_dist_unweighted`

is defined as:

*∑_[c in C] AD(c) / count(C)*

Returns a list of class `clustering_stats`

containing the statistics.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 | ```
my_data_points <- data.frame(x = c(0.1, 0.2, 0.3, 0.4, 0.5,
0.6, 0.7, 0.8, 0.9, 1.0),
y = c(10, 9, 8, 7, 6,
10, 9, 8, 7, 6))
my_distances <- distances(my_data_points)
my_scclust <- scclust(c("A", "A", "B", "C", "B",
"C", "C", "A", "B", "B"))
get_clustering_stats(my_distances, my_scclust)
# > Value
# > num_data_points 10.0000000
# > num_assigned 10.0000000
# > num_clusters 3.0000000
# > min_cluster_size 3.0000000
# > max_cluster_size 4.0000000
# > avg_cluster_size 3.3333333
# > sum_dists 18.2013097
# > min_dist 0.5000000
# > max_dist 3.0066593
# > avg_min_dist 0.8366584
# > avg_max_dist 2.4148611
# > avg_dist_weighted 1.5575594
# > avg_dist_unweighted 1.5847484
``` |

