source("R/setup.R")$value
options(tibble.print_min = 5) cancer <- od_table("OGD_krebs_ext_KREBS_1") earnings <- od_table("OGD_veste309_Veste309_1")
This article contains the most important aspects of the method $tabulate()
.
This method aggregates sc_data
objects.
The first part will use the r tippy_dataset(cancer, "cancer dataset")
from the r ticle("od_table")
.
After that, other features of $tabulate()
will be demonstrated with the data from the r tippy_dataset(earnings, "structure of earnings survey (SES)")
.
cancer <- od_table("OGD_krebs_ext_KREBS_1") earnings <- od_table("OGD_veste309_Veste309_1")
Notice that these tabulation methods can also be used with the STATcube REST API.
This means that objects created by sc_table()
also have a $tabulate()
method.
Calling the $tabulate()
method with no arguments produces a table with the same dimensions as $data
.
cancer$tabulate()
identical(dim(cancer$tabulate()), dim(cancer$data))
Instead of cancer$tabulate(...)
it is also possible to use sc_tabulate(cancer, ...)
.
All available parameters for the $tabulate()
method are documented in ?sc_tabulate
.
To get the number of cases by reporting year and sex, use the labels of those variables as arguments.
cancer$tabulate("Reporting year", "Sex")
If more than one measure is included in the dataset, all measures will be aggregated.
r STATcubeR
uses rowsum()
to ensure a good performance with big datasets.
It is also possible to use partial matching or use codes.
cancer$tabulate("Reporting", "C-KRE")
r STATcubeR
will use pmatch()
to match the supplied strings with the metadata
to identify the variables that should be used for aggregation.
In some cases, datasets cannot be aggregated using the rowsum()
approach.
As an example, take the structure of earnings survey.
earnings <- od_table("OGD_veste309_Veste309_1") earnings
As we can see from the print()
output, the measures contain means and quartiles.
Therefore, aggregating the data via rowsum()
is not meaningful.
However, this dataset contains a "total code" for every field.
options(tibble.print_min = 10)
earnings$tabulate()
These total codes can be used to aggregate the data with $tabulate()
.
In order to do that, the total codes need to be specified using $total_codes()
.
earnings$total_codes(Sex = "Sum total", Citizenship = "Total", Region = "Total", `Form of employment` = "Total")
Now $tabulate()
will use these total codes to form aggregates of the data.
earnings$tabulate("Form of employment")
As we can see, the method extracted rows 2 to 7 from the data. The logic for
selecting those rows is equivalent to the following {dplyr}
expression.
earnings$data %>% dplyr::filter(Sex == "Sum total" & Citizenship == "Total" & `Region (NUTS2)` == "Total" & `Form of employment` != "Total") %>% dplyr::select(-Sex, -Citizenship, -`Region (NUTS2)`)
The $tabulate()
method also works with more than one variable.
options(tibble.print_min = 12) options(tibble.print_max = 12)
earnings$tabulate("Sex", "Form of employment")
earnings$tabulate("Sex", "Citizenship")
earnings$tabulate("Sex", "Region")
earnings$tabulate("Citizenship", "Region")
We get an empty table because this cross tabulation is not included in the OGD dataset. The same will happen for Citizenship & Form of employment as well as Region & Form of employment.
earnings$tabulate("Citizenship", "Form of employment") %>% dim()
earnings$tabulate("Region", "Form of employment") %>% dim()
By default, r STATcubeR
will always add totals for datasets from the REST API
and use those totals to aggregate the datasets.
x <- sc_table(sc_example("accomodation")) x$meta$fields
It is not necessary that all fields have totals.
For example, suppose we want to include the totals for Sex
in the output table.
We can just remove the total code before running sc_tabulate()
.
The special symbol NA
can be used to unset a total code.
earnings$total_codes(Sex = NA) earnings$tabulate("Sex")
It is possible to switch the language used for labeling the data.
This can be done by setting $language
to "de"
or "en"
.
earnings$language <- "de" earnings$tabulate("Geschlecht")
To skip labeling altogether and use variable codes in the output, use raw=TRUE
.
earnings$tabulate("Geschlecht", raw = TRUE)
Switching languages is always available for od_table()
objects.
For sc_table()
, it depends on which languages were requested.
# default: get labels in German and English x <- sc_table(sc_example("accomodation")) # only get English labels x <- sc_table(sc_example("accomodation"), lang = "en") # only get German labels x <- sc_table(sc_example("accomodation"), lang = "de")
In the previous examples, we only supplied names and/or codes of fields to sc_tabulate()
.
It is also possible to include measures in which case the unlisted measures will be omitted.
earnings$tabulate("Geschlecht", "Arithmetisches Mittel", "2. Quartil")
Just like for fields, measures also support partial matching and codes.
In the above example, "2. Quartil"
was matched to "2. Quartil (Median)"
.
Notice that we used the German label for the column "Sex"
in the last calls
to tabulate()
. This is necessary because only the "active" labels are
available to define the tabulation. If you want to use r STATcubeR
programmatically, always use codes to define the tabulation and also use the
.list
parameter if you want to pass several codes.
options(tibble.print_min = 7, tibble.print_max = 7)
earnings$field("C-A11-0") earnings$total_codes(`C-A11-0` = "A11-1") vars_to_tabulate <- c("C-A11-0", "C-BESCHV-0") earnings$tabulate(.list = vars_to_tabulate)
$total_codes()
currently uses an ellipsis (...
) parameter to define total codes.
In the future, programmatic updates of sc_data
objects should be defined in $recodes
.
See #17.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.