build_corpus: Build a Corpus of Works from the Internet Archive

Description Usage Arguments Details Value Examples

View source: R/build_corpus.R

Description

build_corpus downloads the OCR text versions of works found by searching the Internet Archive's metadata for the specified ‘keywords' over a given 'date_range' (provided in the format "yyyy TO yyyy"), and it returns a dataframe that includes the Internet Archive’s metadata about the retrieved works along with the location of the corresponding text files.

Usage

1
2
build_corpus(keywords, date_range = "1700 TO 1899",
  download_dir = "data-raw/corpus", max_results = "10000", chime = TRUE)

Arguments

keywords

The keywords to search in the metadata of the Internet Archive's text collection

date_range

The desired data range to search, specified in the format "yyyy TO yyyy"

download_dir

The directory (relative to your working directory) to which files from the Internet Archive will be downloaded.

max_results

The maximum number of text results

chime

Should the function chime on completion?

Details

Details needed

Value

A dataframe representing the corpus of downloaded texts

Examples

1
2
3
4
## Not run: 
 yf_corpus <- build_corpus(keywords = "yellow fever")

## End(Not run)

mariolaespinosa/historicalnetworks documentation built on Dec. 9, 2017, 2:04 p.m.