Helsinki open data R tools

NOT_CRAN <- identical(tolower(Sys.getenv("NOT_CRAN")), "true")
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE, 
  warning = FALSE,
  fig.height = 7, 
  fig.width = 7,
  dpi = 75,
  purl = NOT_CRAN,
  eval = NOT_CRAN
)

helsinki - tutorial

helsinki R package provides tools to access open data from the Helsinki region in Finland.

For contact information, source code and bug reports, see the project's GitHub page. For other similar packages and related blog posts, see the rOpenGov project website.

Installation

Release version for most users:

install.packages("helsinki")

Development version for developers and other interested parties:

library(remotes)
remotes::install_github("ropengov/helsinki")

Load the package.

library(helsinki)

API Access

The package has basic functions for interacting with WFS APIs, courtesy of FMI2-package: wfs_api() for returning "wfs_api" and to_sf() for turning these objects into sf-objects.

All available features of a given API can be easily listed with the get_feature_list() function. The API functions can, however, be used with a wide variety of different base.url parameters.

input_url <- "https://kartta.hsy.fi/geoserver/wfs"

hsy_features <- get_feature_list(base.url = input_url)
# Select only features which are related to water utilities and services
hsy_vesihuolto <- hsy_features[which(hsy_features$Namespace == "vesihuolto"),]
hsy_vesihuolto
# We select our feature of interest from this list: Location of waterposts
feature_of_interest <- "vesihuolto:VH_Vesipostit_HSY"

When the wanted feature and its Name (in other words: Namespace:Title combination) is known, it can be downloaded with get_feature() by providing the correct base.url and the Name as the typename parameter.

input_url <- "https://kartta.hsy.fi/geoserver/wfs"
feature_of_interest <- "vesihuolto:VH_Vesipostit_HSY"

# downloading a feature
waterposts <- get_feature(base.url = input_url, typename = feature_of_interest)

# Visualizing the location of waterposts
if (exists("waterposts")) {
  if (!is.null(waterposts)) {
    plot(waterposts$geom)
  }
}

Dots on a blank canvas do not make much sense and therefore helsinki-package has get_city_map() function for downloading city district boundaries. An example of this is provided in the Helsinki region district maps section of this vignette.

Helsinki-package provides an easy-to-use menu-driven select_feature() function that effectively combines get_feature_list() and get_feature(). At default it only returns the Name of the wanted function, but if get parameter is set to TRUE, it returns an sf_object which can be easily visualized.

input_url <- "https://kartta.hsy.fi/geoserver/wfs"

# Interactive example with select_feature
selected_feature <- select_feature(base.url = input_url)
feature <- get_feature(base.url = input_url, typename = selected_feature)

# Skipping a redundant step with parameter get = TRUE
feature <- select_feature(base.url = input_url, get = TRUE)

Helsinki Region Environmental Services HSY open data

The above example shows a general use case which can easily be applied to Helsinki Region Environmental Services (HSY) WFS API as well as other service providers' APIs.

For legacy reasons, helsinki-package has also some specialized functions that aim to make downloading often used data as easy as possible.

Specifically, there are two new functions that replace deprecated functionalities from get_hsy() function: get_vaestotietoruudukko() (population grid) and get_rakennustietoruudukko() (building information grid).

library(ggplot2)

pop_grid <- get_vaestotietoruudukko(year = 2018)
building_grid <- get_rakennustietoruudukko(year = 2020)

# Logarithmic scales to make the visualizations more discernible
if (!all(is.null(pop_grid), is.null(building_grid))) {
  ggplot(pop_grid) + geom_sf(aes(colour=log(asukkaita), fill=log(asukkaita)))
  ggplot(building_grid) + geom_sf(aes(colour=log(kerala_yht), fill=log(kerala_yht)))
}

With the previous version of the helsinki package, years 2015 to 2020 were supported. In 2022 a new year was added, 2011, demonstrating how the API may be updated more regularly than the package. The get_feature_list() function can be used to download datasets that are not baked into included functions.

In addition to the datasets listed in the API getting updated, there are also legacy datasets that were never included in the API. We have added the functionality to download datasets from a wider selection of years, as zip files from a different file repository. These files may differ slightly from those downloaded via API and have different column names and larger grid squares and so on.

library(ggplot2)

pop_grid2 <- get_vaestotietoruudukko(year = 2011)
building_grid2 <- get_rakennustietoruudukko(year = 2011)

if (!all(is.null(pop_grid2), is.null(building_grid2))) {
  ggplot(pop_grid2) + geom_sf(aes(colour=log(ASUKKAITA), fill=log(ASUKKAITA)))
  ggplot(building_grid2) + geom_sf(aes(colour=log(ASVALJYYS), fill=log(ASVALJYYS)))
}

While easy enough to build, specialized functions such as these are probably not something that power users want to rely on in their work flows. They also add more manual phases to package maintenance and therefore are probably not the direction we're heading in the future. If you feel differently about this and there is a dataset that gets a lot of use, feel free to drop us a suggestion in GitHub.

Service and event information

Function get_servicemap() retrieves regional service data from city of Helsinki Service Map API, that contains data from the Service Map.

# Search for "puisto" (park) (specify q="query")
search_puisto <- get_servicemap(query="search", q="puisto")
# Study results: 47 variables in the data frame
str(search_puisto, max.level = 1)

We can see that this search returns a large number of results, over 2000. The results are returned as pages, where each page has 20 results by default. By giving no additional search parameters, we get 20 results from the first page of search results.

# Get names for the first 20 results
search_puisto$results$name.fi

# See what kind of data is given for services
names(search_puisto$results)

More results could be retrieved and viewed by giving additional search parameters.

search_puisto <- get_servicemap(query="search", q="puisto", page_size = 30, page = 2)

str(search_puisto)
search_puisto$results$name.fi

As we could see from above example, the returned data frame had 30 observations with 29 variables. At full width this output can be messy to handle in R console. One possible option would be to turn it into a more easily manageable tibble (which often is not a bad idea), another is to limit the extent of the query at the start.

Function get_linkedevents() retrieves regional event data from the new Linked Events API.

# Search for current events
events <- get_linkedevents(query="event")
# Get names for the first 20 results
events$data$name$fi
# See what kind of data is given for events
names(events$data)

Helsinki region district maps

Helsinki region geographic data can be accessed from a WFS API by using the get_city_map() function. Data is available for all 4 cities in the capital region: Helsinki, Espoo, Vantaa and Kauniainen.

Administrative divisions can be accessed on 3 distinct levels: "suuralue", "tilastoalue" and "pienalue". Literal, completely unofficial translations for these could be "grand district", "statistical area" and "(minor) district". The naming convention of these levels is sometimes confusing even in Finnish documents and different names can vary by city and time.

The main takeaway is that "suuralue" is the highest-level division and "pienalue" is the most granular level of division. "Tilastoalue" is somewhere between these two. These are the names to be used even if the city of interest might not use them in their Finnish or English website.

As promised earlier in API Access, the following example gives an idea on how to visualize waterpost locations (and, of course, other types of spatial data as well) on capital region map.

helsinki <- get_city_map(city = "helsinki", level = "suuralue")
espoo <- get_city_map(city = "espoo", level = "suuralue")
vantaa <- get_city_map(city = "vantaa", level = "suuralue")
kauniainen <- get_city_map(city = "kauniainen", level = "suuralue")

library(ggplot2)

if (!all(is.null(helsinki), is.null(espoo), is.null(vantaa), is.null(kauniainen), is.null(waterposts))) {
  ggplot() +
    geom_sf(data = helsinki) +
    geom_sf(data = espoo) +
    geom_sf(data = vantaa) +
    geom_sf(data = kauniainen) +
    geom_sf(data = waterposts)
}

In addition, it is possible to download "aanestysalue" (voting district) divisions for the city of Helsinki. Currently this data is not available for other cities and it must be accessed from other sources.

map <- get_city_map(city = "helsinki", level = "suuralue")
voting_district <- get_city_map(city = "helsinki", level = "aanestysalue")
library(sf)
plot(sf::st_geometry(map))
plot(sf::st_geometry(voting_district))

For other cities than Helsinki voting districts are currently not available.

Helsinki Region Infoshare statistics API

Function get_hri_stats() retrieves data from the Helsinki Region Infoshare statistics API. Specify a dataset to retrieve. In this specific example we will download the first item on the stats_list object. The output is a three-dimensional array.

# Retrieve list of available data
stats_list <- get_hri_stats(query="")
# Show first results
head(stats_list)

if (!is.null(stats_list)) {
  # Retrieve a specific dataset
  stats_res <- get_hri_stats(query=stats_list[1])
  # Show the structure of the results
  str(stats_res)
}

Licensing and Citations

Citing the data

See help() to get citation information for each function and related data sources.

If no such information is explicitly stated, see data provider's website for more information.

Citing the R package

citation("helsinki")

Session info

This vignette was created with

sessionInfo()


Try the helsinki package in your browser

Any scripts or data that you put into this service are public.

helsinki documentation built on Dec. 2, 2022, 5:09 p.m.