knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "README-", message = FALSE, warning = FALSE )
Authors: David Robinson
License: GPL-2
Download and process public domain works from the Project Gutenberg collection. Includes
gutenberg_download()
that downloads one or more works from Project Gutenberg by ID: e.g., gutenberg_download(84)
downloads the text of Frankenstein.gutenberg_metadata
contains information about each work, pairing Gutenberg ID with title, author, language, etcgutenberg_authors
contains information about each author, such as aliases and birth/death yeargutenberg_subjects
contains pairings of works with Library of Congress subjects and topicsInstall the package with:
install.packages("gutenbergr")
Or install the development version using devtools with:
devtools::install_github("ropensci/gutenbergr")
The gutenberg_works()
function retrieves, by default, a table of metadata for all unique English-language Project Gutenberg works that have text associated with them. (The gutenberg_metadata
dataset has all Gutenberg works, unfiltered).
options(dplyr.width = 140) options(width = 100)
Suppose we wanted to download Emily Bronte's "Wuthering Heights." We could find the book's ID by filtering:
library(dplyr) library(gutenbergr) gutenberg_works() %>% filter(title == "Wuthering Heights") # or just: gutenberg_works(title == "Wuthering Heights")
Since we see that it has gutenberg_id
768, we can download it with the gutenberg_download()
function:
wuthering_heights <- gutenberg_download(768) wuthering_heights
gutenberg_download
can download multiple books when given multiple IDs. It also takes a meta_fields
argument that will add variables from the metadata.
# 1260 is the ID of Jane Eyre books <- gutenberg_download(c(768, 1260), meta_fields = "title") books books %>% count(title)
It can also take the output of gutenberg_works
directly. For example, we could get the text of all Aristotle's works, each annotated with both gutenberg_id
and title
, using:
aristotle_books <- gutenberg_works(author == "Aristotle") %>% gutenberg_download(meta_fields = "title") aristotle_books
wikipedia
column in gutenberg_author
to Wikipedia content with the WikipediR package or to pageview statistics with the wikipediatrend package.format_reverse
function for reversing "Last, First" names).See the data-raw directory for the scripts that generate these datasets. As of now, these were generated from the Project Gutenberg catalog on r format(attr(gutenberg_metadata, "date_updated"), '%d %B %Y')
.
Yes! The package respects these rules and complies to the best of our ability. Namely:
https://www.gutenberg.lib.md.us/8/84/84.zip
.Still, this package is not the right way to download the entire Project Gutenberg corpus (or all from a particular language). For that, follow their recommendation to use wget or set up a mirror. This package is recommended for downloading a single work, or works for a particular author or topic.
Please note that the gutenbergr project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.