knitr::opts_chunk$set(echo = TRUE)
hdng
is a package facilitating the use of the HDNG Database, a database containing quantative information in various aspects of Dutch municipalities from about 1800 to the 1980's. This package aids the user in finding their way through the database, allowing them to search for wanted variables, browse within categories, and finally, make a query to extract the wanted data. Below is a short demonstration of the package's functionality. The package consists of three (basic) functions:
hdng_names_cats
: allows the user to browse through categories. An empty function call gives the basic categories, and a function call containing one or more specific categories returns a dataframe with all variables, and times at which they are available. Fully customizable in various aspects (see function documentation).
closest_matches
: gives the user the closest matches (according to string distance) of one or more particular searches. This allows the user to quickly find variables of interest without having to browse through categories. Also supports 'within-category' search. The particular kind of string distance can be customized.
hdng_data_get
: function allowing the user to extract the data. It takes as input variable codes, and returns a data.frame as output containing the variables. Many options customizable.
The package can be installed and loaded via:
devtools::install_github("basm92/hdng")
library(hdng)
Depending on your machine, the installation might take a while because the data is lazy loading (it is a lot of data).
Generally, you can use hdng_names_cats()
to look for specific data. An empty call gives you all available categories, wheras a call with one of the categories gives you a data.frame containing the availability of all variables within the time frame you specified:
hdng_names_cats()
hdng_names_cats("Beroepen") %>% head() hdng_names_cats("Beroepen", from = 1870 ,to = 1899) %>% head()
You can also apply a 'less strict' filter to the data. By default, the query filters the data to variables that are available at least once in the indicated period. You can override this option by specifying show.only.available = FALSE
.
hdng_names_cats("Welvaart", from = 1880, to = 1940) hdng_names_cats("Welvaart", from = 1880, to = 1940, show.only.available = F)
You can also search for a specific term using the function closest_matches
. By default, it returns variable names. If you want to see names, set variable.names
to TRUE
.
closest_matches("aantal autos", "Welvaart", n = 20) closest_matches("aantal autos", "Welvaart", variable.names = TRUE)
Finally, you enter a number of variable names into hdng_get_data
to retrieve a data.frame consisting of the actual variables. Please note that variable names are case-sensitive.
#Example 1 variables <- closest_matches("aantal autos", "Welvaart", n = 3) hdng_get_data(variables, gemeenten = c("Amsterdam", "Rotterdam", "Utrecht", "'s Gravenhage"), col.names = TRUE) #Example 2 hdng_names_cats("Beroepen") -> query query$var[1:10] -> variables2 #Select only 10 variables hdng_get_data(variables2, gemeenten = c("Groningen", "Leeuwarden", "Assen", "Zwolle"), col.names = TRUE)
I plan to include a function that takes as input variable names (or closest matches to them), and as output variable codes, so as not to frustrate the user to always look for variable codes to enter as arguments to hdng_data_get
. Alternatively, I might start to work on an extract function that takes names, not codes as input. I also want to integrate searches on the basis of province. Let me know if you have any suggestions. Thank you for reading!
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.