NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true") eval_packages <- requireNamespace(c("kableExtra", "ggplot2")) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", purl = NOT_CRAN, eval = NOT_CRAN && eval_packages )
message("Packages `kableExtra` and `ggplot2` required for building the vignette. Code chunks will not be evaluated.")
This sotkanet R package provides access to data from the Sotkanet portal. Your contributions and bug reports and other feedback are welcome.
The Sotkanet portal provides over 2000 demographic indicators across Finland and Europe. It is maintained by the National Institute for Health and Welfare (THL). For more information, see Information about Sotkanet and API description.
The sotkanet R package enables access to the Sotkanet API using R facilitating the use of the data from the API. This package is part of rOpenGov.
To install latest release version from CRAN, use:
install.packages("sotkanet")
To install development version from GitHub, use:
library(remotes) remotes::install_github("ropengov/sotkanet")
Test the installation by loading the package:
library(sotkanet)
Load sotkanet and other packages used in the vignette.
library(sotkanet) library(kableExtra) library(ggplot2)
List available Sotkanet indicators using sotkanet_indicators():
# Using a preset list of indicators to avoid a large download indicators <- sotkanet_indicators(id = c(4, 5, 6, 127, 10012, 10027), type = "table", lang = "en") kable(head(indicators))
List geographical regions with available indicators using sotkanet_regions():
# List of the first few regions regions <- sotkanet_regions(type = "table", lang = "en") kable(head(regions))
To download the data, we need to know the indicator for it. You can look for the right indicator using aforementioned sotkanet_indicators() or by browsing the Sotkanet website. For example, the indicator no. 10012 responds to the (EU) GPD per capita in Purchasing Power Standards (PPS) dataset. The data can be downloaded with get_sotkanet() function. If we want, for example, the GPD data from Finland for 2000-2010, the function call is:
# Get the indicator data dat <- get_sotkanet(indicators = 10012, years = 2000:2010, genders = c("total"), lang = "en", regions = "Finland") # The first few lines of the data kable(head(dat)) %>% kable_styling() %>% scroll_box(width = "100%")
The data can also be downloaded by using interactive function sotkanet_interactive(). It gives user interactive alternative for downloading the dataset. This function can also print dataset citation, code for the get_sotkanet() call and fixity checksum. 
Dataset citation can be printed for any indicator using the function sotkanet_cite(). The citation for the GPD data is:
sotkanet_cite(10012, lang = "en")
Let's now demonstrate the use of the package with two examples. For the first example we will use the GPD data from Nordic countries (Pohjoismaat) for 2000-2010 and draw a time series of the data comparing the countries.
# Get indicator data dat <- get_sotkanet(indicators = 10012, years = 2000:2010, genders = "total", lang = "en", region.category = "POHJOISMAAT") indicator_name <- as.character(unique(dat$indicator.title)) indicator_source <- as.character(unique(dat$indicator.organization.title)) # Retrive metadata dat_meta <- sotkanet_indicator_metadata(id = 10012) # Visualize library(ggplot2) p <- ggplot(dat, aes(x = year, y = primary.value, group = region.title, color = region.title)) + geom_line() + ggtitle(paste0(indicator_name, " \n", indicator_source)) + labs(x = "Year", y = "Value",caption = paste0( "Data source: https://sotkanet.fi/sotkanet", "\n", "Data date: ", dat_meta$`data-updated`)) + scale_x_continuous(breaks = seq(2000,2010, by = 1)) + theme(title = element_text(size = 10)) + theme(axis.title.x = element_text(size = 15)) + theme(axis.title.y = element_text(size = 15)) + theme(legend.title = element_text(size = 15)) print(p)
For the second example we will plot the population of Finnish municipalities against a measure of educational level.
# Get the data for the two indicators dat <- get_sotkanet(indicators = c(127, 180), years = 2022, lang = "en", genders = c("total"), region.category = c("KUNTA")) # Pick the fields of interest and remove duplicates datf <- dat[,c("region.title", "indicator.title", "primary.value")] datf <- datf[!duplicated(datf),] dw <- reshape(datf, idvar = "region.title", timevar = "indicator.title", direction = "wide") names(dw) <- c("Municipality", "Population", "Education_level") # Vizualise p <- ggplot(dw, aes(x = log(Population), y = Education_level)) + geom_point(size = 3) + ggtitle("Education level vs. population size") + theme(title = element_text(size = 10)) + labs(y = "Education level", caption = "Data source: https://sotkanet.fi/sotkanet") + theme(axis.title.x = element_text(size = 15)) + theme(axis.title.y = element_text(size = 15)) + theme(legend.title = element_text(size = 15)) plot(p)
Cite Sotkanet and link to https://sotkanet.fi/sotkanet/fi/index. Also mention indicator provider.
Central points:
This work can be freely used, modified and distributed under the Two-clause BSD license.
citation("sotkanet")
You can check the package GitHub page for known issues. You can can also use it to report new bugs and to make suggestions for improving the package.
This vignette was created with
sessionInfo()
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.