source("R/setup.R")$value
table <- od_table("OGD_krebs_ext_KREBS_1")
od_table()
makes it easy to import datasets from r ogd_portal
into your R sessions.
This function downloads csv sources from the fileserver.
This means that no API key is required to use STATcubeR with datasets from the OGD portal.
In this example, we will use a r tippy_dataset(table, "data set about cancer statistics")
.
The dataset id "OGD_krebs_ext_KREBS_1"
can be extracted from the url and will be used in the data import.
``{js, echo = FALSE}
let url = "https://data.statistik.gv.at/"
tippy("#ogd", {allowHTML: true, interactive: true, theme: 'light rounded', content:
"<b>Open Government Data from Statistics Austria</b><br/>" +
"The open data portal provides datasets from Statistics Austria" +
" according to open data guidelines<br/>" +
${url}`
})
## Import and overview To import a dataset, provide the dataset id as an argument. ```r table <- od_table("OGD_krebs_ext_KREBS_1")
This returns an object of class [od_table
], which bundles all the data from the OGD portal that corresponds to this dataset.
Printing the object will show a summary of the contents.
table
The dataset contains the number of cancer patients by several classification fields
The method $tabulate()
can be used to turn the object into a data.frame
in long format,
which contains labeled data.
table$tabulate()
The dataset contains r nrow(table$data)
rows.
If every combination of tumor type, year, region and sex would contain a separate row the number of rows would be the following.
[ 95\times37\times9\times2 = 63270 ]
This means that the table is fairly dense. But this might not be the case for other OGD datasets.
This section will show the different metadata components contained in the table object and how they relate to the resources on the OGD server.
table$resources$name
The labels for the columns of the data.frame representation are generated from r style_resource("OGD_krebs_ext_KREBS_1", "HEADER")
and can be extracted from the table object via $header
.
table$header
Additional metadata for the columns can be obtained via $meta
.
See the r ticle("sc_data")
for more details.
options(tibble.print_max = 5) options(tibble.print_min = 5)
The method table$field()
can be used to get information about specific classification fields.
These contain data from {dataset_id}_{field_code}.csv
.
Unlike the metadata in sc_table
, the od_table
class always contains German and English labels.
Both can be used to label the dataset.
The following call gives access to the German and English labels for the 95 different tumor types in the "cancer type"
classification.
Click "Year"
above to see information about the years.
table$field("C-TUM_ICD10_3ST-0")
r style_resource("OGD_krebs_ext_KREBS_1", "C-TUM_ICD10_3ST-0")
The reporting period spans 37 years (1983 to 2019).
The classification elements are parsed into a <date>
format for the <data.frame>
representation.
table$field("C-BERJ-0")
r style_resource("OGD_krebs_ext_KREBS_1", "C-BERJ-0")
The regional classification contains 9 elements which correspond to the NUTS2 regions ("Bundesländer") of Austria.
table$field("C-BUNDESLAND-0")
r style_resource("OGD_krebs_ext_KREBS_1", "C-BUNDESLAND-0")
Sex is coded as a dichotomous variable with the classification elements "male"
and "female"
.
table$field("C-KRE_GESCHLECHT-0")
r style_resource("OGD_krebs_ext_KREBS_1", "C-KRE_GESCHLECHT-0")
The json metadata file r style_resource("OGD_krebs_ext_KREBS_1", ext = "json")
is available via the $json
binding.
table$json
od_json("OGD_veste309_Veste309_1")
od_json("OGD_konjunkturmonitor_KonMon_1")
This print method only shows part of the metadata. More information can be extracted by using the keys of the json object.
table$json$extras$publisher table$json$extras$update_frequency table$json$resources[[1]]$url
To get the raw microdata from r style_resource("OGD_krebs_ext_KREBS_1")
, use table$data
.
The output is similar to what is returned from read.csv2("OGD_krebs_ext_KREBS_1.csv")
.
table$data
od_table()
makes sure that the levels of all factor columns are in the same order as in the metadata.
levels(table$data$`C-BUNDESLAND-0`) == table$field("C-BUNDESLAND-0")$code
As mentioned above, a labeled version of the data can be obtained via table$tabulate()
.
The labeling is done by taking the raw dataset and then joining the labels from $header
and $field()
.
table$tabulate()
Time variables are converted into a <date>
format if they satisfy certain STATcube standards.
You can read more about $tabulate()
in the r ticle('sc_tabulate')
.
It is possible to switch the language used for labeling the dataset using the $language
field.
This field can be used to get and set the language.
Allowed options are "en"
for English and "de"
for German.
table$language table$language <- "de" table$language
This option affects the print()
method as well as the output of $tabulate()
.
If no English labels are available, the German labels are used as a fallback mechanism.
table
table$tabulate()
r ticle('od_list')
to list all datasets that are compatible with od_table()
.r ticle('sc_tabulate')
to see how they can be summarized into a more compact form.r STATcubeR
caches all files requested from the server under the hood.
The r ticle('od_resources')
explains where and how those caches are stored.Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.