get_word_frequency: Get Word Frequency Information
In chatRater: Rating and Evaluating Texts Using Large Language Models

View source: R/chatRater.R

get_word_frequency

R Documentation

Get Word Frequency Information

Description

Uses an LLM to obtain frequency information for a specified word position in the stimulus. The user can specify a corpus; if none is provided and corpus_source is "llm", the LLM will generate or assume a representative corpus.

Usage

get_word_frequency(
  stimulus,
  position = "first",
  corpus = "",
  corpus_source = ifelse(corpus != "", "provided", "llm"),
  model = "gpt-3.5-turbo",
  api_key = "",
  top_p = 1,
  temp = 0
)

Arguments

`stimulus`	A character string representing the language material.
`position`	A character string indicating which word to analyze ("first", "last", "each", or "total").
`corpus`	An optional character string representing the corpus to use for frequency analysis.
`corpus_source`	A character string, either "provided" or "llm". Default is "provided" if corpus is given, otherwise "llm".
`model`	A character string specifying the LLM model (default "gpt-3.5-turbo").
`api_key`	API key as a character string.
`top_p`	Numeric value for probability mass (default 1).
`temp`	Numeric value for temperature (default 0).

Details

Default definition: "Word frequency is defined as the number of times a word appears in a corpus (often log-transformed)."

Value

A numeric value representing the frequency (or a JSON string if "each" is specified).

Examples

## Not run: 
  freq_first <- get_word_frequency("The quick brown fox jumps over the lazy dog",
                                   position = "first",
                                   corpus = "A sample corpus text with everyday language.",
                                   corpus_source = "provided",
                                   model = "gpt-3.5-turbo",
                                   api_key = "your_api_key")
  cat("Frequency (first word):", freq_first, "\n")

## End(Not run)

chatRater documentation built on April 4, 2025, 1:03 a.m.