summary.redist_plans: Diagnostic information on sampled plans

View source: R/diagnostics.R

summary.redist_plansR Documentation

Diagnostic information on sampled plans

Description

Prints diagnostic information, which varies by algorithm. All algorithms compute the plans_diversity() of the samples.

Usage

## S3 method for class 'redist_plans'
summary(object, district = 1L, all_runs = TRUE, vi_max = 100, ...)

Arguments

object

a redist_plans object

district

For R-hat values, which district to use for district-level summary statistics. We strongly recommend calling match_numbers() or number_by() before examining these district-level statistics.

all_runs

When there are multiple SMC runs, show detailed summary statistics for all runs (the default), or only the first run?

vi_max

The maximum number of plans to sample in computing the pairwise variation of information distance (sample diversity).

...

additional arguments (ignored)

Details

For SMC and MCMC, if there are multiple runs/chains, R-hat values will be computed for each summary statistic. These values should be close to 1. If they are not, then there is too much between-chain variation, indicating that there are not enough samples. R-hat values are calculated after rank-normalization and folding. MCMC chains are split in half before R-hat is computed. For summary statistics that vary across districts, R-hat is calculated for the first district only.

For SMC, diagnostics statistics include:

  • Effective samples: the effective sample size at each iteration, computed using the SMC weights. Larger is better. The percentage in parentheses is the ratio of the effective samples to the total samples.

  • Acceptance rate: the fraction of drawn spanning trees which yield a valid redistricting plan within the population tolerance. Very small values (< 1%) can indicate a bottleneck and may lead to a lack of diversity.

  • Standard deviation of the log weights: More variable weights (larger s.d.) indicate less efficient sampling. Values greater than 3 are likely problematic.

  • Maximum unique plans: an upper bound on the number of unique redistricting plans that survive each stage. The percentage in parentheses is the ratio of this number to the total number of samples. Small values (< 100) indicate a bottleneck, which leads to a loss of sample diversity and a higher variance.

  • Estimated k parameter: How many spanning tree edges were considered for cutting at each split. Mostly informational, though large jumps may indicate a need to increase adapt_k_thresh.

  • Bottleneck: An asterisk will appear in the right column if a bottleneck appears likely, based on the values of the other statistics.

In the event of problematic diagnostics, the function will provide suggestions for improvement.

Value

A data frame containing diagnostic information, invisibly.

Examples

data(iowa)
iowa_map <- redist_map(iowa, ndists = 4, pop_tol = 0.1)
plans <- redist_smc(iowa_map, 100)
summary(plans)


redist documentation built on June 24, 2024, 9:10 a.m.