rag: Retrieval-augmented Generation (RAG)

View source: R/rag.R

ragR Documentation

Retrieval-augmented Generation (RAG)

Description

Performs retrieval-augmented generation {llama-index}

Supports multiple local LLM backends via HuggingFace and llama-index.

Usage

rag(
  text = NULL,
  path = NULL,
  transformer = c("TinyLLAMA", "Gemma3-1B", "Gemma3-4B", "Qwen3-1.7B", "Ministral-3B"),
  prompt = "You are an expert at extracting themes across many texts",
  query,
  response_mode = c("accumulate", "compact", "no_text", "refine", "simple_summarize",
    "tree_summarize"),
  similarity_top_k = 5,
  retriever = c("vector", "bm25"),
  retriever_params = list(),
  output = c("text", "json", "table", "csv"),
  task = c("general", "emotion", "sentiment"),
  labels_set = NULL,
  max_labels = 5,
  global_analysis = FALSE,
  device = c("auto", "cpu", "cuda"),
  temperature = NULL,
  do_sample = NULL,
  max_new_tokens = NULL,
  top_p = NULL,
  keep_in_env = TRUE,
  envir = 1,
  progress = TRUE
)

Arguments

text

Character vector or list. Text in a vector or list data format. path will override input into text Defaults to NULL

path

Character. Path to .pdfs stored locally on your computer. Defaults to NULL

transformer

Character. Large language model to use for RAG. Available models include:

"TinyLLAMA"

Default. TinyLlama 1.1B Chat via HuggingFace. Fast and light local inference.

"Gemma3-1B / Gemma3-4B"

Google's Gemma 3 Instruct via HuggingFace: google/gemma-3-1b-it, google/gemma-3-4b-it.

"Qwen3-0.6B / Qwen3-1.7B"

Qwen 3 small Instruct models via HuggingFace: Qwen/Qwen3-0.6B-Instruct, Qwen/Qwen3-1.7B-Instruct.

"Ministral-3B"

Mistral's compact 3B Instruct via HuggingFace: ministral/Ministral-3b-instruct.

prompt

Character (length = 1). Prompt to feed into TinyLLAMA. Defaults to "You are an expert at extracting emotional themes across many texts"

query

Character. The query you'd like to know from the documents. Defaults to prompt if not provided

response_mode

Character (length = 1). Different responses generated from the model. See documentation here

Defaults to "tree_summarize"

similarity_top_k

Numeric (length = 1). Retrieves most representative texts given the query. Larger values will provide a more comprehensive response but at the cost of computational efficiency; small values will provide a more focused response at the cost of comprehensiveness. Defaults to 5.

Values will vary based on number of texts but some suggested values might be:

40-60

Comprehensive search across all texts

20-40

Exploratory with good trade-off between comprehensive and speed

5-15

Focused search that should give generally good results

These values depend on the number and quality of texts. Adjust as necessary

retriever

Character (length = 1). Retrieval backend: one of "vector" (default, semantic search using embeddings) or "bm25" (lexical BM25 search). BM25 uses llama-index's retriever when available and falls back to the Python rank_bm25 implementation otherwise. Scores are normalized to [0,1] for consistency.

retriever_params

List. Optional parameters passed to the selected retriever handler. Reserved keys include show_progress.

output

Character (length = 1). Output format: one of "text", "json", "table", or "csv".

  • "text" (default): returns a free-text response with retrieved content.

  • Structured outputs ("json"/"table"/"csv") are supported ONLY for Gemma3-1B and Gemma3-4B. For other models, requests for structured outputs fall back to "text".

  • For Gemma3-1B/4B and task = "sentiment" or "emotion", returns per-document dominant label and confidence.

  • For Gemma3-1B/4B and task = "general", returns the prior schema with labels, confidences, intensity, and evidence_chunks.

task

Character (length = 1). Task hint for structured extraction: one of "general", "emotion", or "sentiment". When "emotion" or "sentiment", the prompt constrains labels to a set (see labels_set).

labels_set

Character vector. Allowed labels for classification when task != "general". If NULL, defaults to Emo8 labels for task = "emotion" (c("joy","trust","fear","surprise","sadness", "disgust","anger","anticipation")) for task = "emotion" and c("positive","neutral","negative") for task = "sentiment".

max_labels

Integer (length = 1). Maximum number of labels to return in structured outputs; used to guide the model instruction when output != "text".

global_analysis

Boolean (length = 1). Whether to perform analysis across all documents globally (legacy behavior) or per-document (default). When FALSE (default), each document is analyzed individually then results are aggregated. When TRUE, all documents are processed together for a single global analysis. Defaults to FALSE.

device

Character. Whether to use CPU or GPU for inference. Defaults to "auto" which will use GPU over CPU (if CUDA-capable GPU is setup). Set to "cpu" to perform over CPU

temperature

Numeric or NULL. Overrides the LLM sampling temperature when using local HF models. Recommended: 0.0–0.2 for structured/classification; 0.3–0.7 for summaries.

do_sample

Logical or NULL. If FALSE, forces greedy decoding for maximum determinism. Defaults are conservative; set explicitly for reproducibility.

max_new_tokens

Integer or NULL. Maximum new tokens to generate. Suggested: 64–128 for label decisions; 256–512 for summaries.

top_p

Numeric or NULL. Nucleus sampling parameter. Typical: 0.7–0.95. Use with do_sample=TRUE.

keep_in_env

Boolean (length = 1). Whether the classifier should be kept in your global environment. Defaults to TRUE. By keeping the classifier in your environment, you can skip re-loading the classifier every time you run this function. TRUE is recommended

envir

Numeric (length = 1). Environment for the classifier to be saved for repeated use. Defaults to the global environment

progress

Boolean (length = 1). Whether progress should be displayed. Defaults to TRUE

Value

For output = "text", returns an object of class "rag" with fields: $response (character), $content (data.frame), and $document_embeddings (matrix). For output = "json", returns a JSON character(1) string matching the enforced schema. For output = "table", returns a data.frame suitable for statistical analysis.

Data Privacy

All processing is done locally with the downloaded model, and your text is never sent to any remote server or third-party.

Author(s)

Alexander P. Christensen <alexpaulchristensen@gmail.com>

Examples

# Load data
data(neo_ipip_extraversion)

# Example text
text <- neo_ipip_extraversion$friendliness[1:5]

## Not run: 
rag(
 text = text,
 query = "What themes are prevalent across the text?",
 response_mode = "tree_summarize",
 similarity_top_k = 5
)

# Structured outputs
rag(text = text, query = "Extract emotions", output = "json")
rag(text = text, query = "Extract emotions", output = "table")

## End(Not run)


transforEmotion documentation built on Jan. 8, 2026, 5:06 p.m.