xwalk_ags: Crosswalk Municipality or District Statistics

View source: R/xwalk_ags.R

xwalk_agsR Documentation

Crosswalk Municipality or District Statistics

Description

This function constructs time series of counts for Germany's municipalities (Gemeinden) and districts (Kreise).

Usage

xwalk_ags(
  data,
  ags,
  time,
  xwalk,
  variables = NULL,
  strata = NULL,
  weight = NULL,
  fuzzy_time = FALSE,
  verbose = TRUE
)

Arguments

data

A data frame or a data frame extension (e.g. a tibble).

ags

Name of the character variable (quoted) with municipality AGS (Gemeinden, 8 digits) or district AGS (Kreise, 5 digits).

time

Name of the variable (quoted) identifying the year (YYYY format). Values will be coerced to integers.

xwalk

Name of the crosswalk. The following crosswalks are available:

  • xd19, xd20 for district-level data between 1990-2019/2020.

  • xm19, xm20 for municipality-level data between 1990-2019/2020.

variables

Either a vector of names (quoted) for variables to interpolate or NULL to disable interpolation and return the data matched with the xwalk.

strata

Vector of variable names (quoted) or NULL. See details.

weight

Name of the interpolation weight or NULL. The following are available:

  • pop: Population weights.

  • size: Area weights.

  • emp: Weights based on the number of employees (1998 onwards).

fuzzy_time

If FALSE the crosswalk and the data are matched exactly by ags and time. If TRUE they are matched exactly by ags and as best as possible on time. See details below.

verbose

If TRUE the function outputs information on the number of matched and unmatched rows.

Details

This function facilitates the use of crosswalks constructed by the BBSR for municipalities and districts in Germany (Milbert 2010). The crosswalks map one year's set of district/municipality identifiers to later year's identifiers and provide weights to perform area or population weighted interpolation.

All data rows with NAs in either the ags or time variable are excluded. The same applies to all rows with a value in ags or time that never appears in the crosswalk.

Fuzzy matching uses the absolute difference between the year reported in the data and a crosswalk year. If there is a tie, crosswalk years from before the year reported in the data are preferred.

If area or population weighted interpolation is requested (i.e., when variables are supplied), the combination of the variables set in ags, time and strata need to uniquely identify a row in data.

Caution: Data from https://www.regionalstatistik.de/ sometimes includes annual values for merged units (e.g., Städteregion Aachen, 05334)) and for their former parts (Kreis Aachen, 05354 and Stadt Aachen, 05313). When such data is crosswalked with fuzzy_time=TRUE and interpolated, the final counts will be off by approximately factor 2. The reason is that the final output is the sum of the interpolated counts for the parts and the measured count of the merged unit.

Value

If interpolation is requested, the crosswalked and interpolated data are returned. If interpolation is not requested, the data matched with the crosswalk are returned. The following variables are added:

  • row_id row number of data before matching.

  • ags[*] the crosswalked AGS.

  • year_xw the matched year from the crosswalk.

  • [*]_conv the interpolation weight.

  • diff the absolute difference between year_xw and time.

References

Milbert, Antonia. 2010. "Gebietsreformen–politische Entscheidungen und Folgen für die Statistik." BBSR-Berichte kompakt 6/2010. Bundesinsitut für Bau-, Stadt-und Raumfoschung.

Examples


data(btw_sn)

btw_sn_ags20 <- xwalk_ags(
    data = btw_sn,
    ags = "district",
    time = "year",
    xwalk = "xd20",
    variables = c("voters", "valid"),
    weight = "pop"
)

head(btw_sn_ags20)


sumtxt/ags documentation built on April 10, 2024, 7:20 p.m.