statficlassifications
accesses the open classifications API of
Statistics Finland at https://data.stat.fi/api/classifications/v2.
statficlassifications
contains tools for searching and importing
classifications and classification conversion keys in an R-friendly
format. The package also provides special tools for (re)classifying
regions.
For further information about open classifications API of Statistics Finland, see https://www.stat.fi/en/luokitukset/info/.
To install and start using the package:
install.packages("devtools")
devtools::install_github("pttry/statficlassifications")
library(statficlassifications)
There are two types of classification objects of interest: keys and classifications. Keys, or conversion keys or correspondence tables, are used to convert between different classifications and classifications can be used to convert names to codes and vice versa.
To list all available classification keys (correspondence tables), use
search_keys()
without arguments:
head(search_keys())
#> [1] "kunta 2019 -> ely 2019" "kunta 2022 -> hyvinvointialue 2022"
#> [3] "kunta 2019 -> kuntaryhmitys 2019" "kunta 2022 -> ely 2022"
#> [5] "ammatti 1987 -> ammatti 2001" "kunta 2019 -> seutukunta 2019"
Plug in search terms to search for available keys:
search_keys("maakunta")
#> [1] "kunta 2007 -> maakunta 2007" "kunta 2008 -> maakunta 2008"
#> [3] "kunta 2009 -> maakunta 2009" "kunta 2010 -> maakunta 2010"
#> [5] "kunta 2011 -> maakunta 2011" "kunta 2012 -> maakunta 2012"
#> [7] "kunta 2013 -> maakunta 2013" "kunta 2014 -> maakunta 2014"
#> [9] "kunta 2015 -> maakunta 2015" "maakunta 2007 -> suuralue 2003"
#> [11] "kunta 2016 -> maakunta 2016" "maakunta 2010 -> avi 2010"
#> [13] "seutukunta 2009 -> maakunta 2009" "seutukunta 2007 -> maakunta 2007"
#> [15] "maakunta 2012 -> seutukunta 2012" "maakunta 2012 -> suuralue 2012"
#> [17] "maakunta 2011 -> suuralue 2011" "seutukunta 2010 -> maakunta 2010"
#> [19] "seutukunta 2011 -> maakunta 2011" "maakunta 2016 -> suuralue 2016"
#> [21] "kunta 2017 -> maakunta 2017" "kunta 2018 -> maakunta 2018"
#> [23] "kunta 2019 -> maakunta 2019" "kunta 2020 -> maakunta 2020"
#> [25] "kunta 2021 -> maakunta 2021" "maakunta 2021 -> suuralue 2021"
#> [27] "kunta 2022 -> maakunta 2022" "maakunta 2022 -> suuralue 2022"
#> [29] "kunta 2023 -> maakunta 2023"
search_keys("kunta", "suuralue", 2020)
#> [1] "kunta 2020 -> suuralue 2020"
The open classifications API uniquely identifies each classification key
by a local ID. Having found the suitable key, you can use
search_keys()
with an argument as_localId = TRUE
to print the
localId of the key:
localId <- search_keys(source = "kunta", target = "suuralue", year = 2020, as_localId = TRUE)
localId
#> [1] "kunta_1_20200101%23suuralue_1_20200101"
Then you can get the key by:
key <- get_key(localId)
#> Vuoden 2020 kuntien ja suuralueiden välinen luokitusavain
head(key)
#> source_code source_name target_code target_name
#> 1 710 Raasepori 1 Helsinki-Uusimaa
#> 2 755 Siuntio 1 Helsinki-Uusimaa
#> 3 753 Sipoo 1 Helsinki-Uusimaa
#> 4 858 Tuusula 1 Helsinki-Uusimaa
#> 5 927 Vihti 1 Helsinki-Uusimaa
#> 6 018 Askola 1 Helsinki-Uusimaa
To list all available classifications, use search_classifications()
without arguments:
head(search_classifications())
#> [1] "verolaji 2019" "coicop 2018" "nettovaral_luok 2023"
#> [4] "koulutusaste 2018" "suuruusluokka 2018" "erva 2020"
Plug in search terms to search for available classifications:
search_classifications("ammatti")
#> [1] "ammatti 1987" "ammatti 1980" "ammatti 2001" "ammatti 1997" "ammatti 2010"
#> [6] "ammatti 2018" "ammatti 2021"
search_classifications("ammatti", 2021)
#> [1] "ammatti 2021"
To print the localId of the desired classification, use
as_localId = TRUE
.
localId <- search_classifications("ammatti", year = 2021, as_localId = TRUE)
Use localId to get the classification
head(get_classification(localId))
#> Tulorekisterin TK10-ammattiluokitus
#> code name
#> 1 01100 Upseerit
#> 2 02100 Aliupseerit
#> 3 03100 Sotilasammattihenkilöstö
#> 4 11110 Lainsäätäjät
#> 5 11121 Valtion keskushallinnon johtajat
#> 6 11122 Alue- ja paikallishallinnon johtajat ja ylimmät virkamiehet
For regional classifications, statficlassifications
has tailored
tools.
To load regional classifications, use get_regionclassification
:
region_classification <- get_regionclassification()
head(region_classification)
#> alue_code alue_name
#> 1 KU020 Akaa
#> 2 KU004 Alahärmä
#> 3 KU005 Alajärvi
#> 4 KU006 Alastaro
#> 5 KU009 Alavieska
#> 6 KU010 Alavus
seutukunta_classification <- get_regionclassification("seutukunta")
head(seutukunta_classification)
#> seutukunta_code seutukunta_name
#> 1 SK063 Etelä-Pirkanmaa
#> 2 SK143 Eteläiset seinänaapurit
#> 3 SK053 Forssa
#> 4 SK072 Heinola
#> 5 SK011 Helsinki
#> 6 SK051 Hämeenlinna
kunta_2010_classification <- get_regionclassification("kunta", year = 2010)
#> Overriding default option for offline when specific year is required.
head(kunta_2010_classification)
#> kunta_code kunta_name
#> 1 KU020 Akaa
#> 2 KU005 Alajärvi
#> 3 KU009 Alavieska
#> 4 KU010 Alavus
#> 5 KU015 Artjärvi
#> 6 KU016 Asikkala
To load regional correspondence tables / classification keys, use
get_regionkey
:
regionkey <- get_regionkey(only_names = TRUE)
head(regionkey)
#> kunta_name seutukunta_name maakunta_name
#> 1 Akaa Etelä-Pirkanmaa Pirkanmaa
#> 2 Urjala Etelä-Pirkanmaa Pirkanmaa
#> 3 Valkeakoski Etelä-Pirkanmaa Pirkanmaa
#> 4 Forssa Forssa Kanta-Häme
#> 5 Tammela Forssa Kanta-Häme
#> 6 Humppila Forssa Kanta-Häme
regionkey <- get_regionkey(only_codes = TRUE)
head(regionkey)
#> kunta_code seutukunta_code maakunta_code
#> 1 KU020 SK063 MK06
#> 2 KU887 SK063 MK06
#> 3 KU908 SK063 MK06
#> 4 KU061 SK053 MK05
#> 5 KU834 SK053 MK05
#> 6 KU103 SK053 MK05
You can also get more specialised regional classification keys by setting arguments:
kunta_maakunta_key <- get_regionkey("kunta", "maakunta")
head(kunta_maakunta_key)
#> kunta_name maakunta_name kunta_code maakunta_code
#> 1 Akaa Pirkanmaa KU020 MK06
#> 2 Urjala Pirkanmaa KU887 MK06
#> 3 Valkeakoski Pirkanmaa KU908 MK06
#> 4 Forssa Kanta-Häme KU061 MK05
#> 5 Tammela Kanta-Häme KU834 MK05
#> 6 Humppila Kanta-Häme KU103 MK05
statficlassifications
also gives you a selection of ways and functions
that help you to standardize and manipulate the regional information in
your data. For examples, generate random municipal data:
data <- get_regionkey() |> dplyr::select(kunta_name) |> dplyr::mutate(values = rnorm(dplyr::n()))
head(data)
#> kunta_name values
#> 1 Akaa 1.60532403
#> 2 Urjala -0.59541349
#> 3 Valkeakoski 0.86430228
#> 4 Forssa 0.26600502
#> 5 Tammela -0.09058066
#> 6 Humppila -0.24403987
You can use regional classification tables to add regions to your data:
dplyr::left_join(data, get_regionkey(only_names = TRUE), by = "kunta_name") |> head()
#> kunta_name values seutukunta_name maakunta_name
#> 1 Akaa 1.60532403 Etelä-Pirkanmaa Pirkanmaa
#> 2 Urjala -0.59541349 Etelä-Pirkanmaa Pirkanmaa
#> 3 Valkeakoski 0.86430228 Etelä-Pirkanmaa Pirkanmaa
#> 4 Forssa 0.26600502 Forssa Kanta-Häme
#> 5 Tammela -0.09058066 Forssa Kanta-Häme
#> 6 Humppila -0.24403987 Forssa Kanta-Häme
For a shortcut, use add_region
:
data |> add_region("maakunta") |> head()
#> The region classification in data seems to fit year(s) 2021. A key corresponding to this year is used.
#> kunta_name values maakunta
#> 1 Akaa 1.60532403 Pirkanmaa
#> 2 Urjala -0.59541349 Pirkanmaa
#> 3 Valkeakoski 0.86430228 Pirkanmaa
#> 4 Forssa 0.26600502 Kanta-Häme
#> 5 Tammela -0.09058066 Kanta-Häme
#> 6 Humppila -0.24403987 Kanta-Häme
It is also straightforward to compute, say, maakunta-level means give the municipal data.
data |> add_region("maakunta") |>
dplyr::group_by(maakunta) |>
dplyr::summarize(maakunta_mean = mean(values)) |> head()
#> The region classification in data seems to fit year(s) 2021. A key corresponding to this year is used.
#> # A tibble: 6 × 2
#> maakunta maakunta_mean
#> <fct> <dbl>
#> 1 Ahvenanmaa -0.363
#> 2 Etelä-Karjala -0.201
#> 3 Etelä-Pohjanmaa 0.227
#> 4 Etelä-Savo 0.0455
#> 5 Kainuu 0.0385
#> 6 Kanta-Häme -0.0827
statficlassifications
aims to impose the use of prefixed region codes.
Prefixes are useful in making sure codes do not map to multiple names.
There are, for instance, kuntia and seutukuntia with identical numbers.
Prefixes help distinguish between these. The correpondence of region
names and prefixes is
#> prefix name
#> 1 SSS KOKO MAA
#> 2 KU kunta
#> 3 SK seutukunta
#> 4 MK maakunta
#> 5 SA suuralue
#> 6 ELY ely
#> 7 MA manner_suomi_ahvenanmaa
#> 8 TK kuntaryhmitys
Function set_region_codes
helps you set the prefixes of you region
codes:
v <- c("191", "047", "063")
set_region_codes(v)
#> [1] "SK191" "KU047" "SK063"
If your input vector has codes that may match to multiple region codes,
you get an error without giving more information. If you know the region
level to which your codes correspond to you can submit the region level
to region_level
-argument.
v <- c("5", "47", "20")
set_region_codes(v, region_level = "kunta")
#> [1] "KU005" "KU047" "KU020"
Often you find data where kunta codes are not prefixed but the codes of
other regions are. With your information on to which region level codes
the codes without prefixes should be mapped, you can restrict the domain
of this mapping by providing the region_level
-argument:
v <- c("020", "047", "005", "MK01", "MK02")
set_region_codes(v, region_level = "kunta")
#> [1] "KU020" "KU047" "KU005" "MK01" "MK02"
Again sometimes you have codes of multiple region levels mixed in the
data without prefixes but the charachter length of the codes might make
a difference. In these cases you can set use_char_length_info = TRUE
.
The default is way to use character length information is that three
characters incidate kuntia and two characters indicate maakuntia.
v <- c("020", "047", "005", "01", "02")
set_region_codes(v, use_char_length_info = TRUE)
#> [1] "KU020" "KU047" "KU005" "MK01" "MK02"
Prefixed region codes map unambiguosly to region names so it is then
easy to map the codes to names. Note, however, that some region names do
not map uniquely to region codes. In these cases you have to supply the
region_level
-argument again.
v <- c("020", "047", "005", "MK01", "MK02")
v <- set_region_codes(v, region_level = "kunta")
codes_to_names(v)
#> [1] "Akaa" "Enontekiö" "Alajärvi" "Uusimaa"
#> [5] "Varsinais-Suomi"
v <- codes_to_names(v)
names_to_codes(v)
#> [1] "KU020" "KU047" "KU005" "MK01" "MK02"
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.