book_sections: Gutenberg Project books dataset

Description Usage Format Source

Description

A mixed up collection of words from different book sections of two books.

Usage

1

Format

A tibble with 108,657 observations, each a word on a document. This data set is designed to show how LDA can be used to separate a set of mixed documents into two distinct "topics" (or books).

word

Words from a given section within a book.

document

The book section ID that the word came from.

Source

Data taken from two books of the Gutenberg Project


mangoTraining documentation built on April 28, 2021, 9:07 a.m.