knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(dkstat) library(geodk) library(tidyverse)
This documents the technical specification for the integration between {dkstat}
and {geodk}
. It can be found on this page and in the {geodk}
documentation. vignette("tech-specs-for-dkstat", package = "geodk").
I honestly don't know. This document is written mostly for me (Aleksander) and contributors to use as a reference when maintaining this integration. If you have any interest in the inner workings of how these two packages interact, then please do read on.
A thorough usage guide can be found in vignette("geodk", package = "dkstat"), but below I will provide a technical walkthrough of the integration.
The main goal of the integration between {dkstat}
and {geodk}
is to be able to run the following code and have meaningful geographic information added to the statistics.
dkstat::dst_get_all_data("laby04") |> geodk::geodk_enrich() # This function name is still debatable.
When accessing data from a table, e.g. "laby04" there is no obvious way to know what column is geographic.
dkstat::dst_get_all_data("laby01") |> dplyr::distinct(KOMGRP, .keep_all = TRUE) |> tail() #> KOMGRP BEVÆGELSE TID value #> 100 846 Mariagerfjord B04 Fødselsoverskud 2007-01-01 -0.3 #> 101 773 Morsø B04 Fødselsoverskud 2007-01-01 -1.3 #> 102 840 Rebild B04 Fødselsoverskud 2007-01-01 1.7 #> 103 787 Thisted B04 Fødselsoverskud 2007-01-01 -1.6 #> 104 820 Vesthimmerlands B04 Fødselsoverskud 2007-01-01 -0.8 #> 105 851 Aalborg B04 Fødselsoverskud 2007-01-01 1.6
When looking at the above tail, I (and probably you, as well) can recognise that the OMRÅDE
column is the geographic one.
Some tables, e.g. "laby04" has multiple geographic levels. This table has a grouping of municipalities and then the individual municipalities. To ensure that the individuals and groups are enriched properly, we have to take the different levels into account in the method.
Below you can find the description of each S3-class that is used (/abused) to enrich the statistical data with geographic information. It is sectioned by the geographic grouping of the dataset. Before we dive into the specific classes, I will first outline the general idea. For more information on the specific terminology used, please consult @adv-r.
Each type of geographic variable has its own S3 class. The class is determined by what observations is included in the variable. The class-assignment is done by a series of custom class-constructors called new_dkstat_*()
. One example is new_dkstat_Denmark_municipality_07()
which assigns the S3 class dkstat_Denmark_municipality_07
to a dataset. The class is assigned "after the fact", as Wickham calls it, ensuring that the usual behaviour of a data.frame is preserved, through inheritance for all the functions that don't know about these special classes (e.g. the {dplyr}
functions). Thus, the dkstat-classes are subclasses of data.frame
. The class names are derived from the API and maps 1:1 to the map
value that is returned for geographic variables.
The S3 generic can be found in geodk::geodk_enrich()
. The individual methods also live in {geodk}
. This is to not take on {geodk}
as a dependency in {dkstat}
. The S3 method for the municipality group (Denmark_municipality_07) from above is called geodk_enrich.dkstat_Denmark_municipality_07()
. Please open an issue if you would like to help add a method from another data source.
The API provides 14 different map-levels from which I have based the classes. This makes it very easy to add the right one. In {dkstat}
data-raw/dst_map.R
you can find a script that checks all tables for map variables and adds each new one to a list. This gives the below vector.
#> [1] "Denmark_municipality_07" "Verden_dk2" #> [3] "denmark_cities_19" "denmark_parish_23_4c" #> [5] "denmark_municipalitygroups_24" "Denmark_region_07" #> [7] "Denmark_rural_07" "denmark_multimember_constituency_23" #> [9] "denmark_deanary_23" "europe_dk" #> [11] "Verden_dk" "Europa_DK3" #> [13] "Denmark_county" "Verden_dk4"
Some of the geographic levels, such as "Verden_dk" includes other countries. This data is not available from {geodk}
thus leading to a message for the user informing them of this and then asking if it should add geometry for Denmark.
The municipality grouping that are specified by Statistics Denmark are described on this website. This grouping includes both the individual municipality and five groupings. Take a look at the link if you would like to learn more. In the {geodk}
backend I have created a list containing all the municipality names with both Statistics Denmark-naming and geodk-naming. In addition to that, the list is ordered by the municipality grouping.
This grouping is assigned the dkstat_Denmark_municipality_07
class in addition to the data.frame
class it already has.
The method for dkstat_Denmark_municipality_07
first filters the groupings from the individual municipalities. A list of the individual municipalities and groupings, along with their specific ids is stored in {geodk}
. It is not exported to the user.
After filtering, it assigns the individual municipality geometries. The grouping geometries are also assigned as well as a column indicating the geographic nature of the observation - Is it a overall grouping or an individual municipality?
The municipalities that make up the groupings are split per group and run through sf::st_union()
to be returned as a sf
geometry per group.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.