library(dplyr)

Statistics Netherlands (CBS) is the office that produces all official statistics of the Netherlands.

For long SN has put its data on the web in its online database StatLine. Since 2014 this data base has an open data web API based on the OData protocol. The cbsodataR package allows for retrieving data right into R.

Table of Contents

A list of tables can be retrieved using the cbs_get_datasets (cbs_get_toc) function.

library(dplyr) # not needed, but used in examples below
library(cbsodataR)

datasets <- cbs_get_datasets() 

datasets |> 
  filter(Language == "en") |> # only English tables
  select(Identifier, ShortTitle) 

Search for tables

Tables can be searched for using the cbs_search function.

toc_apples <- cbs_search(c("elstar", "apple"), language = "en")
toc_apples[, c("Identifier", "ShortTitle", "score")]

Other catalogs

Other catalogs with data are available:

catalogs <- cbs_get_catalogs()
catalogs$Identifier

Metadata

Using an "Identifier" from cbs_get_datasets or cbs_search information on the table can be retrieved with cbs_get_meta

apples <- cbs_get_meta('71509ENG')
apples

The meta object contains all metadata properties of cbsodata (see the original documentation) in the form of data.frames. Each data.frame describes properties of the SN table.

names(apples)

Data download

With cbs_get_data data can be retrieved. By default all data for this table will be downloaded in a temporary directory.

cbs_get_data('71509ENG') |> 
  select(1:4) |>  # demonstration purpose
  head()

Data download from a link

The opendata portal of CBS (https://opendata.cbs.nl/dataportaal/#/CBS/en) allows for finding a table and making a selection within this table manually. Such a selection can be stored in a hyperlink (click the "share" button). This link can also be used with cbsodataR using the cbs_get_data_from_link function.

cbs_get_data_from_link("https://opendata.cbs.nl/dataportaal/#/CBS/en/dataset/71509ENG/table?dl=193CB") |> 
  select(1:4) |> 
  head()

Adding category labels

The first columns are categorical columns: they contain codes. The labels for these columns can be added with the function cbs_add_label_columns.

cbs_get_data('71509ENG') |>
  cbs_add_label_columns() |> 
  select(1:4) |> 
  head()

Adding Date column

The period/time columns of Statistics Netherlands (CBS) contain coded time periods: e.g. 2018JJ00 (i.e. 2018), 2018KW03 (i.e. 2018 Q3), 2016MM04 (i.e. 2016 April). With cbs_add_date_column the time periods will be converted and added to the data:

cbs_get_data('71509ENG') |>
  cbs_add_date_column() |> 
  select(2:4) |> 
  head()

This can be useful for further time series analysis, but also for displaying. It is also possible to convert the dates to numbers:

cbs_get_data('71509ENG') |>
  cbs_add_date_column(date_type = "numeric") |> 
  select(2:4) |> 
  head()

Select and filter

It is possible restrict the download using filter statements. This may shorten the download time considerably.

Filter

Filter statements for the columns can be used to restrict the download. Note the following:

apples <- cbs_get_meta('71509ENG')
names(apples)
# meta data for column Periods
head(apples$Periods[,1:2])
#meta data for column FruitFarmingRegions
head(apples$FruitFarmingRegions[,1:2 ])
  cbs_get_data( '71509ENG'
              , Periods=c('2000JJ00','2001JJ00') # selection on Periods column
              , FruitFarmingRegions = "1" # selection on FruitFarmingRegions
              #
              # restrict the columns to the following as found in
              # apples$DataProperties with "select"
              , select = c("FruitFarmingRegions", "Periods", "TotalAppleVarieties_1")  
              ) |> 
  cbs_add_label_columns()
  cbs_get_data( '71509ENG'
              , Periods = has_substring('2000') # selection on Periods column
              , FruitFarmingRegions = "1" # selection on FruitFarmingRegions
              #
              # restrict the columns to the following as found in
              # cbs_get_meta("71509ENG")$DataProperties with "select"
              , select = c("FruitFarmingRegions", "Periods", "TotalAppleVarieties_1")  
              ) |> 
    cbs_add_label_columns()
  cbs_get_data( '71509ENG'
              , Periods = eq("2010JJ00") | has_substring('2000') # selection on Periods column
              , FruitFarmingRegions = "1" # selection on FruitFarmingRegions
              #
              # restrict the columns to the following as found in
              # cbs_get_meta("71509ENG")$DataProperties with "select"
              , select = c("FruitFarmingRegions", "Periods", "TotalAppleVarieties_1")  
              ) |> 
    cbs_add_label_columns()

Download data

Data can also be downloaded explicitly by using cbs_download_table



edwindj/cbsodataR documentation built on April 28, 2024, 7:02 p.m.