find_cached_htids: Finds cached Extracted Features files for a set of HT ids
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

find_cached_htids

R Documentation

Finds cached Extracted Features files for a set of HT ids

Description

Finds cached Extracted Features files for a set of HT ids

Usage

find_cached_htids(
  htids,
  dir = getOption("hathiTools.ef.dir"),
  cache_type = c("ef", "meta", "pagemeta"),
  cache_format = getOption("hathiTools.cacheformat"),
  existing_only = TRUE
)

Arguments

`htids`	A character vector of Hathi Trust ids, a workset created with workset_builder, or a data frame with a column named "htid" containing the Hathi Trust ids that require caching. If the JSON Extracted Features files for these htids have not been downloaded via rsync_from_hathi or get_hathi_counts to `dir`, nothing will be cached (unless `attempt_rsync` is `TRUE`).
`dir`	The directory where the download extracted features files are to be found. Defaults to `getOption("hathiTools.ef.dir")`, which is just "hathi-ef" on load.
`cache_type`	Type of information cached. The default is c("ef", "meta", "pagemeta"), which refers to the extracted features, the volume metadata, and the page metadata. Omitting one of these caches or finds only the rest (e.g., `cache_type = "ef"` caches only the EF files, not their associated metadata or page metadata).
`cache_format`	File format of cache for Extracted Features files. Defaults to `getOption("hathiTools.cacheformat")`, which is "csv.gz" on load. Allowed cache types are: compressed csv (the default), "none" (no local caching of JSON download; only JSON file kept), "rds", "feather" and "parquet" (suitable for use with arrow; needs the arrow package installed), or "text2vec.csv" (a csv suitable for use with the package text2vec).
`existing_only`	Whether to return only file paths to files that actually exist. Default is `TRUE`. Use `FALSE` to find whether some files still need to be cached.

Value

A tibble with the paths of the cached files and an indicator of whether each htid has an existing cached file.

Examples


htids <- c("mdp.39015008706338", "mdp.39015058109706")
dir <- tempdir()

# Finds nothing (nothing has been downloaded or cached to `dir`):

find_cached_htids(htids, cache_format = c("none", "csv"), dir = dir)

cache_htids(htids, dir = dir, cache_type = "ef", attempt_rsync = TRUE)

# Finds the cached files and their JSON ef files

find_cached_htids(htids, cache_format = c("none", "csv"), dir = dir)

xmarquez/hathiTools documentation built on June 2, 2025, 5:12 a.m.

xmarquez/hathiTools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

xmarquez/hathiTools
Access the Hathi Trust Bookworm and Extracted Features Files from R

find_cached_htids: Finds cached Extracted Features files for a set of HT ids
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

Finds cached Extracted Features files for a set of HT ids

Description

Usage

Arguments

Value

Examples

Related to find_cached_htids in xmarquez/hathiTools...

R Package Documentation

Browse R Packages

We want your feedback!

xmarquez/hathiTools Access the Hathi Trust Bookworm and Extracted Features Files from R

find_cached_htids: Finds cached Extracted Features files for a set of HT ids In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

Finds cached Extracted Features files for a set of HT ids

Description

Usage

Arguments

Value

Examples

Related to find_cached_htids in xmarquez/hathiTools...

R Package Documentation

Browse R Packages

We want your feedback!

xmarquez/hathiTools
Access the Hathi Trust Bookworm and Extracted Features Files from R

find_cached_htids: Finds cached Extracted Features files for a set of HT ids
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R