plan_distances | R Documentation |
Compute Distance between Partitions
plan_distances(plans, measure = "variation of information", ncores = 1)
redist.distances(plans, measure = "Hamming", ncores = 1, total_pop = NULL)
plans |
A matrix with one row for each precinct and one column for each map. Required. |
measure |
String vector indicating which distances to compute. Implemented currently are "Hamming", "Manhattan", "Euclidean", and "variation of information", Use "all" to return all implemented measures. Not case sensitive, and any unique substring is enough, e.g. "ham" for Hamming, or "info" for variation of information. |
ncores |
Number of cores to use for parallel computing. Default is 1. |
total_pop |
The vector of precinct populations. Used only if computing variation of information. If not provided, equal population of precincts will be assumed, i.e. the VI will be computed with respect to the precincts themselves, and not the population. |
Hamming distance measures the number of different precinct assignments between plans. Manhattan and Euclidean distances are the 1- and 2-norms for the assignment vectors. All three of the Hamming, Manhattan, and Euclidean distances implemented here are not invariant to permutations of the district labels; permuting will cause large changes in measured distance, and maps which are identical up to a permutation may be computed to be maximally distant.
Variation of Information is a metric on population partitions (i.e., districtings) which is invariant to permutations of the district labels, and arises out of information theory. It is calculated as
VI(\xi, \xi') = -\sum_{i=1}^n\sum_{j=1}^n pop(\xi_i \cap \xi'_j)/P
(2log(pop(\xi_i \cap \xi'_j)) - log(pop(\xi_i)) - log(pop(\xi'_j)))
where \xi,\xi'
are the partitions, \xi_i,\xi_j
the individual
districts, pop(\cdot)
is the population, and P
the total
population of the state. VI is also expressible as the difference between
the joint entropy and the mutual information (see references).
distance_matrix
returns a numeric distance matrix for the
chosen metric.
a named list of distance matrices, one for each distance measure selected.
Cover, T. M. and Thomas, J. A. (2006). Elements of information theory. John Wiley & Sons, 2 edition.
data(fl25)
data(fl25_enum)
plans_05 <- fl25_enum$plans[, fl25_enum$pop_dev <= 0.05]
distances <- redist.distances(plans_05)
distances$Hamming[1:5, 1:5]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.