In elben10/rdst: Download data from Statistics Denmark

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "figures/README-"
)
options(tibble.print_min = 5, tibble.print_max = 5)

Overview

rdst is a data package, which allows the user to retrieve data from Statistics Denmark through R. The package provide statistics coded in both english as well as danish. This allows users outside Denmark to investigate and explore high quality data.

Installation

# The only way to get rdst right now is to install the github version using the devtools package. This can be done with the line of code beneath.
devtools::install_github("elben10/rdst")

Usage

library(rdst)

The aim of the package is to make it easier to access data from Statistics Denmark. To access an overview of the collection of data sets we could use the function dst_tables(). It returns a tibble containing information about all data sets available at Statistics Denmark.

dst_tables()

As default, the function will only return two columns containing a short description of the data set, and the data set's ID. Though, it is possible to get more information about the data set using the column argument. For instance, we could be interested in when some data sets was last updated, or just be interested in new data from Statistics Denmark. To see when a data set most recently was updated we could use the following piece of code.

dst_tables(columns = c("id", "text", "updated"))

There is also more information available such as the first and the last period of the data sets. To see all the available columns see dst_tables().

It is crucial to know which variables is contained in a data set. To that purpose we can use the dst_variables() function. If we provide the function with a table ID provided as a charactervector it will return a tibble which has two columns. A column with the variable ID, and column with a short description of the variable. If we are interessted in the population growth in Denmark, we could find, which variables that are contained in the data set FOLK1A.

dst_variables("FOLK1A")

Now where we know which variables is contained in the data set, we can move forward and begin with the more interesting thing. Getting the data and explore it. To download data from Statistics Denmark we can use the function dst_download(). It suprisingly allows us to get data from Statistics Denmark. For instance if we still are interessted in how the danish population has developed recently we could download the data set FOLK1A.

dst_download("FOLK1A")

To illustrate the population growth we could use Hadley Wickham's package ggplot2 and Stefan Bache's package magrittr to enable piping.

library(ggplot2)
library(magrittr)

dst_download("FOLK1A") %>%
  ggplot(aes(TID, INDHOLD)) +
  geom_line() +
  xlab("Time") +
  ylab("Count")

We could make the same figure, but where we include the gender. The variable KØN means gender in danish. To see the translations see dst_variables(). We use the package dplyr to exclude all total observations.

library(dplyr)
dst_download("FOLK1a", "KØN") %>%
  filter(KØN != "Total") %>%
  ggplot(aes(TID, INDHOLD, colour = KØN)) +
  geom_line() +
  xlab("Time") +
  ylab("Count")