In statistikat/STATcubeR: R Interface for the 'STATcube' REST API and Open Government Data

source("R/setup.R")$value

tmp_cache_dir <- file.path(tempdir(), "STATcubeR/open_data")
dir.create(tmp_cache_dir, recursive = TRUE)
od_cache_dir(tmp_cache_dir)

This article explains how and where r STATcubeR caches resources from r ogd_portal in the local file system. Understanding this behavior will allow you to enable persistent caches and directly use the cached resources.

Overview

By default, r STATcubeR caches all accessed resources from r ogd_portal in the temporary directory of the current R session.

od_cache_dir()

Let's examine for example what happens when the data from the structure of earnings survey (SES) is requested.

earnings <- od_table("OGD_veste309_Veste309_1")

First STATcubeR will grab a json with metadata about this dataset from https://data.statistik.gv.at/ogd/json?dataset=OGD_veste309_Veste309_1 and check which resources belong to it. For any resource, the attributes name and last_modified are extracted from the json. They are also included in the od_table object under $resources.

earnings$resources

last_modified tells us when the resource was changed on the fileserver. If a resource does not exist in the cache or if the last modified entry in the json is newer than the cached file, it will be downloaded from the server. Otherwise, the cached version is reused.

Access and Updates

Cached files can be accessed with od_cache_file(). If the specified file exists in the cache, a path to the file will be returned. Otherwise, the file is downloaded to the cache and then the path is returned. The files use the same naming conventions as the open data fileserver.

od_cache_file("OGD_veste309_Veste309_1")

od_cache_file("OGD_veste309_Veste309_1", "C-A11-0")

To read files from the cache as data.frames, use od_resource() with same parameters as in od_cache_file(). This will apply a special parser to the dataset which drops unneeded columns and normalizes column names.

od_resource("OGD_veste309_Veste309_1", "C-A11-0")

The parser behaves differently for header files, data files and fields. Json files can be accessed with od_json().

json <- od_json("OGD_veste309_Veste309_1")
unlist(json$tags)

Clearing and Changing

od_cache_clear(id) can be used to clear the cache from all files belonging to the passed dataset id. We saw that earnings$resources contains 7 rows, therefore 7 files will be deleted during cleanup.

od_cache_clear("OGD_veste309_Veste309_1")

If you want to use a persistent directory like ~/.cache/STATcubeR/open_data/ for caching, the directory can be changed with od_cache_dir(new).

od_cache_dir("~/.cache/STATcubeR/open_data/")

The resources field

Let's go back to the $resources field of earnings.

earnings$resources

We already looked at name and last_modified. The remaining columns can be interpreted as follows

cached tells us the last time the cache file for the resource was modified.
size is the file size in bytes
download contains the amount of milliseconds used to retrieve the resource when it was last updated.
parsed reports the amount of milliseconds it took od_resource() to convert the file contents into a data.frame() format. For the json file, the parsing time is always reported as NA.

What's in the cache?

od_cache_summary() will give an overview about all files that are available in the cache directory. The returned table contains one row for every dataset.

The column updated contains the last modified date for the datasets json file.
json, data and header give the file sizes in bytes for the corresponding files.
fields is the total size of all fields and n_fields is the number of classification files available.

We can get a clear picture on how much disk space is used for each dataset.

od_cache_summary()

Note that od_cache_summary() only gathers information from the local file system based on filenames, file.mtime() and file.size().

od_cache_dir(tmp_cache_dir)

Download history

To get a history of all files that have been downloaded from the server, use od_downloads(). For each file, a timestamp for the download is recorded as well as the download time in milliseconds.

od_downloads()

unlink(tmp_cache_dir)

statistikat/STATcubeR documentation built on Dec. 3, 2024, 8:04 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

statistikat/STATcubeR
R Interface for the 'STATcube' REST API and Open Government Data

In statistikat/STATcubeR: R Interface for the 'STATcube' REST API and Open Government Data

Overview

Access and Updates

Clearing and Changing

The resources field

What's in the cache?

Download history

R Package Documentation

Browse R Packages

We want your feedback!

statistikat/STATcubeR R Interface for the 'STATcube' REST API and Open Government Data

In statistikat/STATcubeR: R Interface for the 'STATcube' REST API and Open Government Data

Overview

Access and Updates

Clearing and Changing

The resources field

What's in the cache?

Download history

R Package Documentation

Browse R Packages

We want your feedback!

statistikat/STATcubeR
R Interface for the 'STATcube' REST API and Open Government Data