knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
Statistics Netherlands (CBS) is the office that produces all official statistics of the Netherlands.
For long CBS has put its data on the web in its online database StatLine. Since 2014 this data base has an open data web API based on the OData protocol. The cbsodataR package allows for retrieving data right into R using this API.
A new version of the web api has been developed which is based on the OData4
protocol. This OData4 API contains major changes in how the metadata and data
is transported, hence this package cbsodata4
. Since the old web api will be
phased out in due time, cbsodata4
is the successor of cbsodataR
.
This document describes how to use cbsodata4
to download data (and meta data)
from Statistics Netherlands. It offers very similar functions as cbsodataR
,
so should be familiar to users of cbsodataR
, but there are differences so
carefully check your code.
library(cbsodata4)
A list of datasets that are available can be loaded with with cbs4_get_datasets()
(cbs4_get_toc()
is an alias)
datasets <- cbs4_get_datasets() head(datasets[,c("Identifier", "Title", "Modified")])
Using cbs4_search
a list of tables can be found that contain desired search
terms, e.g. "Diesel":
datasets <- cbs4_search("Diesel") head(datasets[,c("Identifier", "Title", "rel")])
Using an “Identifier” from cbs4_get_datasets
information on the table can be
retrieved with cbs4_get_metadata
meta_petrol <- cbs4_get_metadata("80416ned") meta_petrol
The meta object contains all metadata properties of cbsodata
in the form of data.frame
s. Each data.frame
describes properties of the
CBS table: "Dimensions", "MeasureCodes" and one ore more "\<Dimension>Codes" describing the meta data of the borders of a SN table.
names(meta_petrol) meta_petrol$MeasureCodes[, 1:3] meta_petrol$Dimensions # just 1 dimension with the following categories: head(meta_petrol$PeriodenCodes)
With cbs4_get_observations
and cbs4_get_data
data can be retrieved. By
default this will be downloaded in a temporary directory, but this
can be set explicitly with the argument download_dir
.
cbs4_get_observations
is the format in which the data is downloaded from Statistic
Netherlands (CBS).
It is in so-called long format. It contains
one Measure
column, describing the topics/variable, one Value
column
describing the statistical value, one or more Dimension columns and some extra
columns with value specific metadata.
obs <- cbs4_get_observations("80416ned") head(obs)
cbs4_get_data
returns the data in so-called wide format in which each Measure
has its own column. For many
uses this is a more natural format. It is a pivoted version of cbs4_get_observations()
.
# same data, but pivoted data <- cbs4_get_data("80416ned", name_measure_columns = FALSE) head(data, 2)
By default the names of the columns are more readable with cbs4_get_data
# same data, but pivoted data <- cbs4_get_data("80416ned") head(data, 2)
The Dimension and Measure columns use codes/keys/identifiers to describe categories. These can be found in the metadata, but can also be automatically added using cbs4_add_label_columns
.
obs <- cbs4_get_observations("80416ned") obs <- cbs4_add_label_columns(obs) head(obs)
or
data <- cbs4_get_data("80416ned") data <- cbs4_add_label_columns(data) head(data, 2)
The period/time columns of Statistics Netherlands (CBS) contain coded
time periods: e.g. 2018JJ00 (i.e. 2018), 2018KW03 (i.e. 2018 Q3),
2016MM04 (i.e. 2016 April). With cbs4_add_date_column
the time periods
will be converted and added to the data:
obs <- cbs4_get_observations("80416ned") obs <- cbs4_add_date_column(obs) head(obs) data <- cbs4_get_data("80416ned") data <- cbs4_add_date_column(data) head(data)
Each Measure
has a measure unit, which
can be added to observations with cbs4_add_unit_column()
obs <- cbs4_get_observations("80416ned") obs <- cbs4_add_unit_column(obs) head(obs)
It is possible to restrict the download using filter statements. This may shorten the download time considerably.
Filter statements for the columns can be used to restrict the download. Note the following:
Identifier
column in
the cbs4_get_metadata
objects. e.g. for year 2020, the code is "2020JJ00".meta <- cbs4_get_metadata("60006") tail(meta$PeriodenCodes) meta$MeasureCodes[,c("Identifier","Title")]
<column_name> = values
to
cbs4_get_observations
(or cbs4_get_data
) e.g. Perioden = c("2019KW04", "2020KW01")
obs <- cbs4_get_observations("60006" , Measure = c("M003026","M003019") # selection on Measures , Perioden = c("2019KW04", "2020KW01") # selection on Perioden ) cbs4_add_label_columns(obs)
<column_name> = contains(<substring>)
to cbs4_get_data
e.g.
Perioden = contains("JJ")
data <- cbs4_get_data("60006" , Measure = c("M003026","M003019") # selection on Measures , Perioden = contains("2019") # retrieve all periods with 2019 ) data
Periods = contains("2019") | "2020KW01"
data <- cbs4_get_data("60006" , Measure = c("M003026","M003019") # selection on Measures , Perioden = contains("2019") | "2020KW01" # retrieve all periods with 2019 ) data
For the adventurous, it is possible to specify a odata v4 query themselves.
# supply your own odata 4 query cbs4_get_data("84287NED", query = "$filter=Perioden eq '2019MM12'")
Data and metadata of a table can also be downloaded explicitly by using cbs4_download
. This can be
an option if you don't want to load the data into memory (which both cbs4_get_data
and cbs4_get_observations
do), but only store it on disk.
cbs4_download("60006", download_dir = "./60006") # will download data and metadata in csv format.
CBS / Statistics Netherlands also offers collections of datasets that are not part of the main
collections: so-called catalogs. These can be retrieved with cbs4_get_catalogs()
.
catalogs <- cbs4_get_catalogs() catalogs[,1:2]
Another options is to set the catalog
argument in cbs4_get_datasets
to NULL
ds <- cbs4_get_datasets() nrow(ds) ds_all <- cbs4_get_datasets(catalog = NULL) nrow(ds_all) ds_asd <- cbs4_get_datasets(catalog = "CBS-asd") nrow(ds_asd)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.