source("R/setup.R")$value
tmp_cache_dir <- file.path(tempdir(), "STATcubeR/open_data") dir.create(tmp_cache_dir, recursive = TRUE) od_cache_dir(tmp_cache_dir)
This article explains how and where r STATcubeR
caches resources from r ogd_portal
in the local file system.
Understanding this behavior will allow you to enable persistent caches and directly use the cached resources.
By default, r STATcubeR
caches all accessed resources from r ogd_portal
in the temporary directory of the current R session.
od_cache_dir()
Let's examine for example what happens when the data from the structure of earnings survey (SES) is requested.
earnings <- od_table("OGD_veste309_Veste309_1")
First STATcubeR
will grab a json with metadata about this dataset from https://data.statistik.gv.at/ogd/json?dataset=OGD_veste309_Veste309_1 and check which resources belong to it.
For any resource, the attributes name
and last_modified
are extracted from the json.
They are also included in the od_table
object under $resources
.
earnings$resources
last_modified
tells us when the resource was changed on the fileserver.
If a resource does not exist in the cache or if the last modified entry in the json is newer than the cached file, it will be downloaded from the server.
Otherwise, the cached version is reused.
Cached files can be accessed with od_cache_file()
.
If the specified file exists in the cache, a path to the file will be returned.
Otherwise, the file is downloaded to the cache and then the path is returned.
The files use the same naming conventions as the open data fileserver.
od_cache_file("OGD_veste309_Veste309_1")
od_cache_file("OGD_veste309_Veste309_1", "C-A11-0")
To read files from the cache as data.frame
s, use od_resource()
with same parameters as in od_cache_file()
.
This will apply a special parser to the dataset which drops unneeded columns and normalizes column names.
od_resource("OGD_veste309_Veste309_1", "C-A11-0")
The parser behaves differently for header files, data files and fields.
Json files can be accessed with od_json()
.
json <- od_json("OGD_veste309_Veste309_1") unlist(json$tags)
od_cache_clear(id)
can be used to clear the cache from all files belonging to the passed dataset id.
We saw that earnings$resources
contains 7 rows, therefore 7 files will be deleted during cleanup.
od_cache_clear("OGD_veste309_Veste309_1")
If you want to use a persistent directory like ~/.cache/STATcubeR/open_data/
for caching, the directory can be changed with od_cache_dir(new)
.
od_cache_dir("~/.cache/STATcubeR/open_data/")
Let's go back to the $resources
field of earnings
.
earnings$resources
We already looked at name
and last_modified
.
The remaining columns can be interpreted as follows
cached
tells us the last time the cache file for the resource was modified.size
is the file size in bytesdownload
contains the amount of milliseconds used to retrieve the resource when it was last updated.parsed
reports the amount of milliseconds it took od_resource()
to convert the file contents into a data.frame()
format.
For the json file, the parsing time is always reported as NA
.od_cache_summary()
will give an overview about all files that are available in the cache directory.
The returned table contains one row for every dataset.
updated
contains the last modified date for the datasets json file.json
, data
and header
give the file sizes in bytes for the corresponding files.fields
is the total size of all fields and n_fields
is the number of classification files available.We can get a clear picture on how much disk space is used for each dataset.
od_cache_summary()
Note that od_cache_summary()
only gathers information from the local file system based on filenames, file.mtime()
and file.size()
.
od_cache_dir(tmp_cache_dir)
To get a history of all files that have been downloaded from the server, use od_downloads()
.
For each file, a timestamp for the download is recorded as well as the download time in milliseconds.
od_downloads()
unlink(tmp_cache_dir)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.