mobydick | R Documentation |
This dataset contains the lemmatized version of the first 10 chapters of the novel Moby-Dick by Herman Melville. The data is structured as a dataframe with multiple linguistic annotations.
data(mobydick)
A dataframe with multiple rows and 26 columns:
Character: Unique document identifier
Integer: Paragraph index within the document
Integer: Sentence index within the paragraph
Character: Original sentence text
Integer: Start position of the token in the sentence
Integer: End position of the token in the sentence
Integer: Unique term identifier
Integer: Token index in the sentence
Character: Original token (word)
Character: Lemmatized form of the token
Character: Universal POS tag
Character: Language-specific POS tag
Character: Morphological features
Integer: Head token in dependency tree
Character: Dependency relation label
Character: Enhanced dependency relations
Character: Additional information
Character: Folder containing the document
Character: The word used to separate the chapters in the original book
Character: Source file name
Logical: Whether the document is selected
Logical: Whether POS was selected
Character: Highlighted sentence
Logical: Whether the document was manually selected
Logical: Whether hapax legomena were removed
Logical: Whether single-character words were removed
Character: Lemmatized form without multi-word units
Extracted and processed from the text of Moby-Dick by Herman Melville.
data(mobydick)
head(mobydick)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.