MarkdownDocumentChunks: Markdown documents chunks
In ragnar: Retrieval-Augmented Generation (RAG) Workflows

MarkdownDocumentChunks

R Documentation

Markdown documents chunks

Description

MarkdownDocumentChunks stores information about candidate chunks in a Markdown document. It is a tibble with three required columns:

start, end — integers. These are character positions (1-based, inclusive) in the source MarkdownDocument, so that substring(md, start, end) yields the chunk text. Ranges can overlap.
context — character. A general-purpose field for adding context to a chunk. This column is combined with text to augment chunk content when generating embeddings with ragnar_store_insert(), and is also returned by ragnar_retrieve(). Keep in mind that when chunks are deoverlapped (in ragnar_retrieve() or chunks_deoverlap()), only the context value from the first chunk is kept. markdown_chunk() by default populates this column with all the markdown headings that are in-scope at the chunk start position.

Additional columns can be included.

The original document is available via the ⁠@document⁠ property.

For normal use, chunk a Markdown document with markdown_chunk(); the class constructor itself is exported only so advanced users can generate or tweak chunks by other means.

Arguments

`chunks`	A data frame containing `start`, `end`, and `context` columns, and optionally other columns.
`document`	A `MarkdownDocument`.

Value

An S7 object that inherits from MarkdownDocumentChunks, which is also a tibble.

Examples

doc_text <- "# A\n\nB\n\n## C\n\nD"
doc <- MarkdownDocument(doc_text, origin = "some/where")
chunk_positions <- tibble::tibble(
  start = c(1L, 9L),
  end = c(8L, 15L),
  context = c("", "# A"),
  text = substring(doc, start, end)
)
chunks <- MarkdownDocumentChunks(chunk_positions, doc)
identical(chunks@document, doc)

ragnar documentation built on Aug. 8, 2025, 7:07 p.m.