get_hathi_counts: Reads the downloaded extracted features file for a given...
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

get_hathi_counts

R Documentation

Reads the downloaded extracted features file for a given Hathi Trust id

Description

Given a single Hathi Trust ID, this function returns a tibble with its per-page word count and part of speech information, and caches the results to the getOption("hathiTools.ef.dir") directory (by default "./hathi-ef"). If the file has not been cached already, it first attempts to download it directly from the Hathi Trust server. This function uses code authored by Ben Schmidt, from his Hathidy package (https://github.com/HumanitiesDataAnalysis/hathidy).

Usage

get_hathi_counts(
  htid,
  dir = getOption("hathiTools.ef.dir"),
  cache_format = getOption("hathiTools.cacheformat")
)

Arguments

`htid`	The Hathi Trust id of the item whose extracted features files are to be loaded into memory. If it hasn't been downloaded, the function will try to download it first.
`dir`	The directory where the download extracted features files are to be found. Defaults to `getOption("hathiTools.ef.dir")`, which is just "hathi-ef" on load.
`cache_format`	File format of cache for Extracted Features files. Defaults to `getOption("hathiTools.cacheformat")`, which is "csv.gz" on load. Allowed cache types are: compressed csv (the default), "none" (no local caching of JSON download; only JSON file kept), "rds", "feather" and "parquet" (suitable for use with arrow; needs the arrow package installed), or "text2vec.csv" (a csv suitable for use with the package text2vec).

Value

a tibble with the extracted features.

Author(s)

Ben Schmidt

Examples


# Download the 1863 version of "Democracy in America" by Tocqueville and get
# its extracted features

tmp <- tempdir()

get_hathi_counts("aeu.ark:/13960/t3qv43c3w", dir = tmp)

xmarquez/hathiTools documentation built on June 2, 2025, 5:12 a.m.

xmarquez/hathiTools index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

xmarquez/hathiTools
Access the Hathi Trust Bookworm and Extracted Features Files from R

get_hathi_counts: Reads the downloaded extracted features file for a given...
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

Reads the downloaded extracted features file for a given Hathi Trust id

Description

Usage

Arguments

Value

Author(s)

Examples

Related to get_hathi_counts in xmarquez/hathiTools...

R Package Documentation

Browse R Packages

We want your feedback!

xmarquez/hathiTools Access the Hathi Trust Bookworm and Extracted Features Files from R

get_hathi_counts: Reads the downloaded extracted features file for a given... In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R

Reads the downloaded extracted features file for a given Hathi Trust id

Description

Usage

Arguments

Value

Author(s)

Examples

Related to get_hathi_counts in xmarquez/hathiTools...

R Package Documentation

Browse R Packages

We want your feedback!

xmarquez/hathiTools
Access the Hathi Trust Bookworm and Extracted Features Files from R

get_hathi_counts: Reads the downloaded extracted features file for a given...
In xmarquez/hathiTools: Access the Hathi Trust Bookworm and Extracted Features Files from R