View source: R/get_training_set.R
get_training_set | R Documentation |
Creates a training set from a list of similarity matrices and labels it using a zero-shot GPT prompt.
get_training_set(
sim,
num_bins = 50,
samples_per_bin = 10,
n = 500,
record_type = "entity",
instructions = NULL,
model = "gpt-3.5-turbo-instruct",
openai_api_key = Sys.getenv("OPENAI_API_KEY"),
parallel = TRUE
)
sim |
A matrix of similarity scores |
num_bins |
Number of bins to split similarity scores for stratified random sampling (defaults to 50) |
samples_per_bin |
Number of string pairs to sample from each bin (defaults to 5) |
n |
Sample size for the training dataset |
record_type |
A character describing what type of entity the rows and columns of |
instructions |
A string containing additional instructions to include in the LLM prompt during validation. |
model |
Which OpenAI model to prompt; defaults to 'gpt-3.5-turbo-instruct' |
openai_api_key |
Your OpenAI API key. By default, looks for a system environment variable called "OPENAI_API_KEY" (recommended option). Otherwise, it will prompt you to enter the API key as an argument. |
parallel |
TRUE to submit API requests in parallel. Setting to FALSE can reduce rate limit errors at the expense of longer runtime. |
A dataset with string pairs A
and B
, along with a match
column indicating whether they match.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.