zenodo: Download corpus tarball from Zenodo

zenodo_get_tarballR Documentation

Download corpus tarball from Zenodo

Description

Download corpus tarball from Zenodo. Downloading both freely available data and data with restricted access is supported.

Usage

zenodo_get_tarball(
  url,
  destfile = tempfile(fileext = ".tar.gz"),
  checksum = TRUE,
  verbose = TRUE,
  progress = TRUE
)

gparlsample_url_restricted

Arguments

url

Landing page at Zenodo for resource. Can also be the URL for restricted access (?token= appended with a long key).

destfile

A character vector with the file path where the downloaded file is to be saved. Tilde-expansion is performed. Defaults to a temporary file.

checksum

A logical value, whether to check md5 sum.

verbose

A logical value, whether to output progess messages.

progress

A logical value, whether to report progress during download.

Format

An object of class character of length 1.

Details

A sample subset of the GermaParl corpus is deposited at Zenodo for testing purposes. There are identical open access and restricted versions of GermaParlSample to test different flavours of downloading a resource from Zenodo. The URL for restricted access includes an access token which is very lengthy. This URL is included as a dataset in the package to avoid excessive line in sample code. Note that URLs that give access to restricted data are usually not to be shared.

Value

The filename of the downloaded corpus tarball, designed to serve as input for corpus_install (as argument tarball). If the resource is not available, NULL is returned.

Examples


# Temporary directory structure as a preparatory step
Sys.setenv(CORPUS_REGISTRY = "")
cwb_dirs <- create_cwb_directories(
  prefix = tempdir(),
  ask = FALSE,
  verbose = FALSE
)
Sys.setenv(CORPUS_REGISTRY = cwb_dirs[["registry_dir"]])

# Download and install open access resource
gparl_url_pub <- "https://doi.org/10.5281/zenodo.3823245"
tarball_tmp <- zenodo_get_tarball(url = gparl_url_pub)
corpus_install(tarball = tarball_tmp)

# Download and install resource with restricted access
tarball_tmp <- zenodo_get_tarball(url = gparlsample_url_restricted)
corpus_install(tarball = tarball_tmp)


cwbtools documentation built on June 2, 2022, 5:06 p.m.