gutenberg_strip: Strip header and footer content from a Project Gutenberg book

gutenberg_stripR Documentation

Strip header and footer content from a Project Gutenberg book

Description

Strip header and footer content from a Project Gutenberg book. This is based on some formatting guesses so it may not be perfect. It will also not strip tables of contents, prologues, or other text that appears at the start of a book.

Usage

gutenberg_strip(text)

Arguments

text

A character vector with lines of a book

Value

A character vector with Project Gutenberg headers and footers removed

Examples



library(dplyr)
book <- gutenberg_works(title == "Pride and Prejudice") %>%
  gutenberg_download(strip = FALSE)

head(book$text, 10)
tail(book$text, 10)

text_stripped <- gutenberg_strip(book$text)

head(text_stripped, 10)
tail(text_stripped, 10)



gutenbergr documentation built on Nov. 12, 2023, 5:07 p.m.