#| label: setup #| include: false # set RNG seeds set.seed(42L) htmlwidgets::setWidgetIdSeed(seed = 42L) knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) # disable cache expiry options(rdb.global_max_cache_age = Inf) # set pkg metadata pkg_metadata <- pal::desc_list()
rdb is not on CRAN yet[^1], but you can install the package directly from the development source code repository's master
branch, which we try to keep in a working state at all times.
#| label: pkg-instl-dev #| child: !expr pkgsnip::snip_path("pkg-instl-dev-gitlab.Rmd")
[^1]: We will probably release the package on CRAN once the current database and backend overhaul is complete and the rdb R package is properly adapted to use the new infrastructure, as well as sufficiently tested. This will most likely not happen before the end of 2024.
You can download RDB referendum data via the two functions rdb::rfrnd()
and rdb::rfrnds()
. The former one fetches the data of a single referendum only, of
which you must already know its uniqe RDB id
. The latter function allows to retrieve data for an arbitrary
number of referendums, depending on the conditions you specify via the function's various arguments.
To simply retrieve all referendums in the database (excluding draft entries), run
rdb::rfrnds()
which should output a tibble like this one:
#| echo: false rdb::rfrnds(quiet = TRUE)
The RDB referendum data's individual variables (columns) are documented in the codebook. It is also available as
a dataset via rdb::data_codebook
.
Results of rdb::rfrnds()
and some other functions in this package are by default cached on disk using pkgpins[^2]. You can define
the maximum age of cached results you're willing to tolerate via the argument max_cache_age
(defaults to a week). It accepts anything that can be successfully
converted to a lubridate duration -- e.g. a string like "3 hours"
, "2 days"
or "1 week"
, or
a number which will simply be interpreted as number of seconds.
To only re-download RDB data once every 4 hours and 48 minutes for example, use
rdb::rfrnds(max_cache_age = "4 hours 48 minutes")
Although we usually advise against it, you can also completely opt out of caching by specifying use_cache = FALSE
. However, please make sure to not run such
code in excess, as it creates additional (and most likely unnecessary) load on our servers.
[^2]: Caching happens per unique combination of evaluated function arguments (excluding any of the arguments incl_archive
, use_cache
, max_cache_age
and
quiet
). This means that when rdb::rfrnds()
is invoked two times in a row with the exact same set of arguments, say country_code = "CH"
and
level = c("national", "subnational")
, the second invocation will read the result from disk and thus is considerably faster (as long as use_cache
is set
to TRUE
and max_cache_age
is set to a timespan greater than the delay between invocations).
Since every argument combination is cached separately, altering the arguments will inevitably result in a new cache entry, regardless whether the actual data would technically already be present in an existing cache entry. E.g. `rdb::rfrnds(country_code = "CH", level = "national")` returns a perfect subset of the just mentioned invocation, nevertheless the data is freshly fetched the first time it is run.
rdb includes various functions to augment the referendum data by additional information which wouldn't make sense to be stored in the RDB itself.
For example, you can add the period (week, month, quarter, year, decade or century) in which a referendum took place using rdb::add_period()
. By default, the
recurring numeric week number of the year is added (i.e. period = "week"
):
#| echo: true rdb::rfrnds() |> rdb::add_period() |> dplyr::select(id, date, week)
Another frequently required augmentation is rdb::add_country_code_long()
which adds an additional column country_code_long
containing the ISO 3166-1
alpha-3 code. These three-letter codes are often required to join RDB referendum data with data from other
sources.
See the package reference for all available data augmentation functions.
For certain analyses, it might come in handy to transform the referendum data to a different shape beforehand. For a few such transformations, rdb provides ready-made functions.
rdb::as_ballot_dates()
for example transforms the default referendum-level observations to ones on the level of ballot date and jurisdiction:
#| echo: true rdb::rfrnds() |> nrow() rdb::rfrnds() |> rdb::as_ballot_dates() |> nrow()
Noteworthy is also rdb::unnest_var()
which provides a convenient and standardized way to unnest a multi-value variable of type list like the topics_tier_*
variables to long format.
#| echo: true rdb::rfrnds() |> nrow() rdb::rfrnds() |> rdb::unnest_var(topics_tier_1) |> nrow()
See the package reference for all available data transformation functions.
rdb also includes some ready-made convenience functions to create tables and (interactive) plots.
If you'd like a tabular overview of the top-ten countries by number of ballot dates per political level for example, you could simply run
rdb::rfrnds() |> rdb::as_ballot_dates() |> rdb::tbl_n_rfrnds(by = c(country_name, level), n_rows = 10L, order = "descending")
and you'd get the following nicely formatted gt table:
#| echo: false rdb::rfrnds(quiet = TRUE) |> rdb::as_ballot_dates() |> rdb::tbl_n_rfrnds(by = c(country_name, level), n_rows = 10L, order = "descending")
A table of the number of referendums in the UN subregion Polynesia since 2010 per a certain period, say years, can be generated via
rdb::rfrnds() |> rdb::add_world_regions() |> dplyr::filter(un_subregion == "Polynesia" & date > "2009-12-31") |> rdb::tbl_n_rfrnds_per_period(period = "year")
#| echo: false rdb::rfrnds(quiet = TRUE) |> rdb::add_world_regions() |> dplyr::filter(un_subregion == "Polynesia" & date > "2009-12-31") |> rdb::tbl_n_rfrnds_per_period(period = "year")
Or a stacked area chart visualizing the worldwide share of referendums per year since 1950, grouped by political level:
rdb::rfrnds() |> dplyr::filter(date >= "1950-01-01") |> rdb::tbl_n_rfrnds_per_period(period = "year", by = "level")
#| echo: false rdb::rfrnds(quiet = TRUE) |> dplyr::filter(date >= "1950-01-01") |> rdb::plot_rfrnd_share_per_period(period = "year", by = "level") |> plotlee::simplify_trace_ids()
Or, as a final example, the overall (hierarchical) segmentation of the political topics all the referendums in the RDB were about:
rdb::rfrnds() |> rdb::plot_topic_segmentation(method = "per_topic_lineage")
#| echo: false rdb::rfrnds(quiet = TRUE) |> rdb::plot_topic_segmentation(method = "per_topic_lineage") |> plotlee::simplify_trace_ids()
Again, see the package reference for all available data visualization and tabulation functions. More will likely be added in the future.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.