Datasets in geofi-package

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE, 
  warning = FALSE,
  fig.height = 7, 
  fig.width = 7,
  dpi = 75
)

geofi-package provides access to multiple dataset of different types and for different use. In this vignette we introduce the different datas and explain their use cases. Vignette Making maps using geofi-package provides multiple real-world examples of their usage.

Packages installation

geofi can be installed from CRAN using

# install from CRAN
install.packages("geofi")

# Install development version from GitHub
remotes::install_github("ropengov/geofi")
# Let's first create a function that checks if the suggested 
# packages are available
check_namespaces <- function(pkgs){
  return(all(unlist(sapply(pkgs, requireNamespace,quietly = TRUE))))
}
apiacc <- geofi::check_api_access()
pkginst <- check_namespaces(c("sotkanet","geofacet","ggplot2","dplyr"))
apiacc_pkginst <- all(apiacc,pkginst)

Municipality keys

Official administrative regions in Finland are based on municipalities. In 2021 there are 309 municipalities in Finland and the number is decreasing over time through mergers.\ \ Each municipality belongs to a higher level regional classifications such as regions (maakunta) or health care districts (sairaanhoitopiiri). municipality_key_-datasets are based on Statistics Finland Statistical classification -api with few modification and provided on yearly basis.

library(geofi)
library(dplyr)
d <- data(package = "geofi")
as_tibble(d$results) %>% 
  select(Item,Title) %>% 
    filter(grepl("municipality_key", Item))

Looking at the names of `municipality_key_2023` there is 69 different variables from each municipality.

names(geofi::municipality_key_2023)

With these municipality keys you can easily aggregate municipalities for plotting or you can list different regional breakdowns.\

geofi::municipality_key_2023 %>% 
  count(maakunta_code,maakunta_name_fi,maakunta_name_sv,maakunta_name_en)

Municipality keys are joined with the municipality spatial data by default, meaning that data returned by get_municipality() can be aggregated as it is.

Spatial data

Spatial data is provided as administrative regions (polygons), population and statistical grids (polygons) and municipality centers (points).

Municipality borders

Municipality borders are provided yearly from 2013 and in two scales 1: 1 000 000 and 1:4 500 000. Use 1000 or 4500 as value for scale-argument, respectively.

municipalities <- get_municipalities(year = 2023, scale = 4500)
plot(municipalities["municipality_name_fi"], border = NA)

Municipality borders with population

In 2022 a new data source is introduced that provides you municipality borders with municipality population data. Spatial data is provided in 1:4 500 000 scale.

Calling the function with year = 2019 returns population data from 2019-12-31 with spatial data on borders from 2020.

The statistical variables in the data are: total population (vaesto), share of the total population (vaesto_p), number of men (miehet), men's share of the population in an area (miehet_p) and women (naiset), women's share (naiset_p), those aged under 15: number (ika_0_14), share (ika_0_14p), those aged 15 to 64: number (ika_15_64), share (ika_15_64p), and aged 65 or over: number (ika_65_), share (ika_65_p).

To plot men's share at the municipality level in 2020 (2021 municipality borders) you can simply to this.

get_municipality_pop(year = 2022) %>%  
  subset(select = miehet_p) %>% 
  plot()

Aggregating the absolute population numbers is straightforward: to plot population at Wellbeing service county level you can do.

get_municipality_pop(year = 2022) %>%  
  group_by(hyvinvointialue_name_fi) %>%  
  summarise(vaesto = sum(vaesto)) %>%  
  select(vaesto) %>% 
  plot()

To plot the men's share at wellbeing service country level you have to add one more step

get_municipality_pop(year = 2022) %>%  
  dplyr::group_by(hyvinvointialue_name_fi) %>% 
  summarise(vaesto = sum(vaesto),
            miehet = sum(miehet)) %>% 
  mutate(share = miehet/vaesto*100) %>% 
  select(share) %>% 
  plot()

Zipcodes

Zipcodes are provided in a single resolution from 2015.

zipcodes <- get_zipcodes(year = 2023) 
plot(zipcodes["nimi"], border = NA)

Statistical grid

Grid net for statistics both in 1 km x 1 km and 5 km x 5km covers whole of Finland. The grid net includes all grid squares in Finland.

Statistics Finland proprietary grid database provides the attribute statistical data for these grid nets.

stat_grid <- get_statistical_grid(resolution = 5, auxiliary_data = TRUE)
plot(stat_grid["euref_x"], border = NA)

Population grid

Number of population by both 1 km x 1 km and 5 km x 5 km grids. The number of population on the last day of the reference year (31 December) by age group. Data includes only inhabited grids. The statistical variables of the data are:

Total population (vaesto), number of men (miehet) and women (naiset), under 15 year olds (ika_0_14), 15-64 year olds (ika_15_64), and aged over 65 (ika_65_). Only the number of population is reported for grids of under 10 inhabitants. See Population grid data.

The data describes the population distribution independent of administrative areas (such as municipal borders). The data is suitable for examination of population distribution and making various spatial analysis.

pop_grid <- get_population_grid(year = 2018, resolution = 5)
plot(pop_grid["kunta"], border = NA)

Central localities of municipalities

National Land Survey of Finland maintains Topological Database that contains a wide range of layers from which you can access the locations of central localities of each municipality in Finland.

plot(municipality_central_localities["teksti"])

Custom geofacet grid data

From Ryan Hafen's blog:

The geofacet package extends ggplot2 in a way that makes it easy to create geographically faceted visualizations in R. To geofacet is to take data representing different geographic entities and apply a visualization method to the data for each entity, with the resulting set of visualizations being laid out in a grid that mimics the original geographic topology as closely as possible.

geofi-package contains custom grids to be used with various Finnish administrative breakdowns as listed below.

d <- data(package = "geofi")
as_tibble(d$results) %>% 
  select(Item,Title) %>% 
    filter(grepl("grid", Item)) %>% 
  print(n = 100)

Here is an example where population data at municipality level is pulled from THL from 2000 to 2022, then aggregated at the levels of regions (maakunta) and then plotted with ggplot2 using grid geofi::grid_maakunta.

# Let pull population data from THL
library(sotkanet)
sotkadata <- GetDataSotkanet(indicators = 127, years = 2000:2022) %>% 
  filter(region.category == "KUNTA") %>% 
  mutate(municipality_code = as.integer(region.code))

# lets aggregate population data
dat <- left_join(geofi::municipality_key_2023 %>% select(-year),
                 sotkadata) %>% 
  group_by(maakunta_code, maakunta_name_fi,year) %>% 
  summarise(population = sum(primary.value, na.rm = TRUE)) %>% 
  na.omit() %>% 
  ungroup() %>% 
  rename(code = maakunta_code, name = maakunta_name_fi)

library(geofacet)
library(ggplot2)

ggplot(dat, aes(x = year, y = population/1000, group = name)) + 
  geom_line() + 
  facet_geo(facets = ~name, grid = grid_maakunta, scales = "free_y") +
  theme(axis.text.x = element_text(size = 6)) +
  scale_x_discrete(breaks = seq.int(from = 2000, to = 2023, by = 5)) +
  labs(title = unique(sotkadata$indicator.title.fi), y = "%")


Try the geofi package in your browser

Any scripts or data that you put into this service are public.

geofi documentation built on Nov. 2, 2023, 5:54 p.m.