cache_htids | R Documentation |
This function takes a set of Hathi Trust IDs (usually already downloaded via rsync_from_hathi) and caches the JSON files to another format (e.g., csv or rds or parquet) along them. A typical workflow with this package normally involves selecting an appropriate set of Hathi Trust IDs (via workset_builder), downloading their Extracted Features files to your local machine (via rsync_from_hathi), caching these slow-to-load JSON Extracted Features files to a faster-loading format using cache_htids, and then using read_cached_htids to read them into a single data frame or arrow Dataset for further work.
cache_htids(
htids,
dir = getOption("hathiTools.ef.dir"),
cache_type = c("ef", "meta", "pagemeta"),
cache_format = getOption("hathiTools.cacheformat"),
keep_json = TRUE,
attempt_rsync = FALSE,
attempt_parallel = FALSE
)
htids |
A character vector of Hathi Trust ids, a workset created with
workset_builder, or a data frame with a column named "htid" containing
the Hathi Trust ids that require caching. If the JSON Extracted Features
files for these htids have not been downloaded via rsync_from_hathi or
get_hathi_counts to |
dir |
The directory where the download extracted features files are to
be found. Defaults to |
cache_type |
Type of information cached. The default is c("ef", "meta",
"pagemeta"), which refers to the extracted features, the volume metadata,
and the page metadata. Omitting one of these caches or finds only the rest
(e.g., |
cache_format |
File format of cache for Extracted Features files.
Defaults to |
keep_json |
Whether to keep the downloaded json files. Default is
|
attempt_rsync |
If |
attempt_parallel |
Default is |
A tibble with the paths of the cached files and an indicator of whether each htid was successfully cached.
htids <- c("mdp.39015008706338", "mdp.39015058109706")
dir <- tempdir()
# Caches nothing (nothing has been downloaded to `dir`):
cache_htids(htids, dir = dir, cache_type = "ef")
# Tries to rsync first, then caches
cache_htids(htids, dir = dir, cache_type = "ef", attempt_rsync = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.