source("R/setup.R")$value
table <- od_table("OGD_krebs_ext_KREBS_1")

od_table() makes it easy to import datasets from r ogd_portal into your R sessions. This function downloads csv sources from the fileserver. This means that no API key is required to use STATcubeR with datasets from the OGD portal.

In this example, we will use a r tippy_dataset(table, "data set about cancer statistics"). The dataset id "OGD_krebs_ext_KREBS_1" can be extracted from the url and will be used in the data import.

``{js, echo = FALSE} let url = "https://data.statistik.gv.at/" tippy("#ogd", {allowHTML: true, interactive: true, theme: 'light rounded', content: "<b>Open Government Data from Statistics Austria</b><br/>" + "The open data portal provides datasets from Statistics Austria" + " according to open data guidelines<br/>" +${url}` })

## Import and overview

To import a dataset, provide the dataset id as an argument.

```r
table <- od_table("OGD_krebs_ext_KREBS_1")

This returns an object of class [od_table], which bundles all the data from the OGD portal that corresponds to this dataset. Printing the object will show a summary of the contents.

table

The dataset contains the number of cancer patients by several classification fields

Convert to a data frame

The method $tabulate() can be used to turn the object into a data.frame in long format, which contains labeled data.

table$tabulate()

The dataset contains r nrow(table$data) rows. If every combination of tumor type, year, region and sex would contain a separate row the number of rows would be the following.

[ 95\times37\times9\times2 = 63270 ]

This means that the table is fairly dense. But this might not be the case for other OGD datasets.

Metadata

This section will show the different metadata components contained in the table object and how they relate to the resources on the OGD server.

table$resources$name

Header

The labels for the columns of the data.frame representation are generated from r style_resource("OGD_krebs_ext_KREBS_1", "HEADER") and can be extracted from the table object via $header.

table$header

Additional metadata for the columns can be obtained via $meta. See the r ticle("sc_data") for more details.

Field infos {.unlisted .unnumbered .tabset .tabset-pills}

options(tibble.print_max = 5)
options(tibble.print_min = 5)

The method table$field() can be used to get information about specific classification fields. These contain data from {dataset_id}_{field_code}.csv. Unlike the metadata in sc_table, the od_table class always contains German and English labels. Both can be used to label the dataset.

Tumor type

The following call gives access to the German and English labels for the 95 different tumor types in the "cancer type" classification. Click "Year" above to see information about the years.

table$field("C-TUM_ICD10_3ST-0")

r style_resource("OGD_krebs_ext_KREBS_1", "C-TUM_ICD10_3ST-0")

Year

The reporting period spans 37 years (1983 to 2019). The classification elements are parsed into a <date> format for the <data.frame> representation.

table$field("C-BERJ-0")

r style_resource("OGD_krebs_ext_KREBS_1", "C-BERJ-0")

Province

The regional classification contains 9 elements which correspond to the NUTS2 regions ("Bundesländer") of Austria.

table$field("C-BUNDESLAND-0")

r style_resource("OGD_krebs_ext_KREBS_1", "C-BUNDESLAND-0")

Sex

Sex is coded as a dichotomous variable with the classification elements "male" and "female".

table$field("C-KRE_GESCHLECHT-0")

r style_resource("OGD_krebs_ext_KREBS_1", "C-KRE_GESCHLECHT-0")

json Metadata {.tabset .tabset-pills}

The json metadata file r style_resource("OGD_krebs_ext_KREBS_1", ext = "json") is available via the $json binding.

Cancer

table$json

Earnings

od_json("OGD_veste309_Veste309_1")

Economic Trend Monitor

od_json("OGD_konjunkturmonitor_KonMon_1")

This print method only shows part of the metadata. More information can be extracted by using the keys of the json object.

table$json$extras$publisher
table$json$extras$update_frequency
table$json$resources[[1]]$url

Table Contents

To get the raw microdata from r style_resource("OGD_krebs_ext_KREBS_1"), use table$data. The output is similar to what is returned from read.csv2("OGD_krebs_ext_KREBS_1.csv").

table$data

od_table() makes sure that the levels of all factor columns are in the same order as in the metadata.

levels(table$data$`C-BUNDESLAND-0`) == table$field("C-BUNDESLAND-0")$code

As mentioned above, a labeled version of the data can be obtained via table$tabulate(). The labeling is done by taking the raw dataset and then joining the labels from $header and $field().

table$tabulate()

Time variables are converted into a <date> format if they satisfy certain STATcube standards. You can read more about $tabulate() in the r ticle('sc_tabulate').

A Trip to Germany {#sauerkraut}

It is possible to switch the language used for labeling the dataset using the $language field. This field can be used to get and set the language. Allowed options are "en" for English and "de" for German.

table$language
table$language <- "de"
table$language

This option affects the print() method as well as the output of $tabulate(). If no English labels are available, the German labels are used as a fallback mechanism.

table
table$tabulate()

Further reading



statistikat/STATcubeR documentation built on Dec. 3, 2024, 8:04 p.m.