R/mobydick.R

#' Lemmatized Text of Moby-Dick (Chapters 1-10)
#'
#' This dataset contains the lemmatized version of the first 10 chapters of the novel Moby-Dick by Herman Melville.
#' The data is structured as a dataframe with multiple linguistic annotations.
#'
#' @format A dataframe with multiple rows and 26 columns:
#' \describe{
#'   \item{doc_id}{Character: Unique document identifier}
#'   \item{paragraph_id}{Integer: Paragraph index within the document}
#'   \item{sentence_id}{Integer: Sentence index within the paragraph}
#'   \item{sentence}{Character: Original sentence text}
#'   \item{start}{Integer: Start position of the token in the sentence}
#'   \item{end}{Integer: End position of the token in the sentence}
#'   \item{term_id}{Integer: Unique term identifier}
#'   \item{token_id}{Integer: Token index in the sentence}
#'   \item{token}{Character: Original token (word)}
#'   \item{lemma}{Character: Lemmatized form of the token}
#'   \item{upos}{Character: Universal POS tag}
#'   \item{xpos}{Character: Language-specific POS tag}
#'   \item{feats}{Character: Morphological features}
#'   \item{head_token_id}{Integer: Head token in dependency tree}
#'   \item{dep_rel}{Character: Dependency relation label}
#'   \item{deps}{Character: Enhanced dependency relations}
#'   \item{misc}{Character: Additional information}
#'   \item{folder}{Character: Folder containing the document}
#'   \item{split_word}{Character: The word used to separate the chapters in the original book}
#'   \item{filename}{Character: Source file name}
#'   \item{doc_selected}{Logical: Whether the document is selected}
#'   \item{POSSelected}{Logical: Whether POS was selected}
#'   \item{sentence_hl}{Character: Highlighted sentence}
#'   \item{docSelected}{Logical: Whether the document was manually selected}
#'   \item{noHapax}{Logical: Whether hapax legomena were removed}
#'   \item{noSingleChar}{Logical: Whether single-character words were removed}
#'   \item{lemma_original_nomultiwords}{Character: Lemmatized form without multi-word units}
#' }
#'
#' @usage data(mobydick)
#' @source Extracted and processed from the text of Moby-Dick by Herman Melville.
#' @examples
#' data(mobydick)
#' head(mobydick)
"mobydick"

Try the tall package in your browser

Any scripts or data that you put into this service are public.

tall documentation built on April 16, 2025, 5:10 p.m.