cache: Utilities for cache management

cache_datasetR Documentation

Utilities for cache management

Description

The dataverse package uses disk and session caches to improve network performance. Use of the cache is described on this page.

Usage

cache_dataset(version)

cache_path()

cache_info()

cache_reset()

Arguments

version

A character specifying a version of the dataset. This can be of the form "1.1" or "1" (where in "x.y", x is a major version and y is an optional minor version). As of v0.3.14, setting a version in this way will cache the dataset (See example in cache_dataset) so that it will not re-download the file the second time and read from the cache. Finally, set use_cache = "none" to not read from the cache and re-download afresh even when version is provided. If the user specifies a key or DATAVERSE_KEY argument, they can access the draft version by ":draft" (the current draft) or ":latest" (which will prioritize the draft over the latest published version).

Details

Use of the cache is determined by the value of the ⁠use_cache =⁠ argument to dataset and other API calls, or by the environment variable DATAVERSE_USE_CACHE. Possible values are

  • "none": do not use the cache. This is the default for datasets that are versioned with ":draft", ":latest", and ":latest-published".

  • "session": cache API requests for the duration of the R session. This is the default for API calls that do not involve file or dataset retrieval.

  • '"disk": use a permanent disk cache. This is the default for files and explicitly versioned datasets.

cache_dataset() determines whether a dataset or file should be cached based on the version specification.

cache_path() finds or creates the location (directory) on the file system containing the cache.

cache_info() queries the cache for information about the name, size, and other attributes of files in the cache. The file name is a 'hash' of the function used to retrieve the file; it is not useful for identifying specific files.

cache_reset() clears all downloaded files from the disk cache.

Value

cache_dataset() returns "disk" if the dataset version is to be cached to disk, "none" otherwise.

cache_path() returns the file path to the directory containing the cache.

cache_info() returns a data.frame containing names and sizes of files in the cache.

cache_reset() returns the path to the (now empty) cache, invisibly)

Examples

cache_dataset(":latest")  # "none"
cache_dataset("1.2")      # "disk"

## Not run: 
 # specifying the version will by default store a cache. Add `use_cache = "none"` to turn off
 df_tab <-
  get_dataframe_by_name(
   filename = "roster-bulls-1996.tab",
   dataset  = "doi:10.70122/FK2/HXJVJU",
   server   = "demo.dataverse.org",
   version = "3"
 )

## End(Not run)

cache_path()

cache_info()

dataverse documentation built on June 10, 2025, 9:13 a.m.