knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(mergen)
mergen is a package which employs artificial intelligence to convert data analysis questions into executable code, explanations, and algorithms. The self-correction features allow the generated code to be optimized for performance and accuracy. mergen features a user-friendly chat interface, enabling users to interact with the AI agent and extract valuable insights from their data effortlessly.
This document introduces you to mergens basic set of tools, and shows you how to apply them to answer data analysis related questions and generate relevant R code.
To be able to interact with an AI agent and use this agent for subsequent tasks, mergen contains the setupAgent
function for setting up a framework for the agent.Mergen allows you to set up an agent for the openai API platform as well as for the replicate API platform.
For setting up an agent for the openai API platform, you can make use of the setupAgent
function by setting the name="openai"
argument.
Let's look how to setting up an agent works:
myAgent <- setupAgent(name="openai",type="chat",model="gpt-4",ai_api_key = "your_key") myAgent
the setupAgent
function returns a list containing all the agent information which can be used by subsequent functions. mportant to note is that the ai_api_key
should be your OpenAI API key, provided as a string.
setupAgent
also contains functionality for setting up an agent for replicate AIs. Let's look at how this works:
myAgent <- setupAgent(name="replicate",type=NULL,model="llama-2-70b-chat",ai_api_key="my_key") myAgent
Once you have set up an agent, it is time to ask some questions to your AI model of choice! For this, you can make use of the sendPrompt
function, or the selfcorrect
function. The choice of which one to use depends on whether you want possible errors in the answered code to be corrected by sending another request to the model or not.
Sending a prompt with the sendPrompt
function is very easy. The function takes the arguments agent
, prompt
, return.type
and context
. By default the context is set to rbionfoExp. This tells your model of choice to act as a bioinformatics expert, and return any code as R code in triple backticks. Your prompt must be given as a string, but can contain any question and additional information that you want to send. The return value is a string containing the models answer.
answer <- sendPrompt(myAgent, "how do I perform PCA on data in a file called test.txt?",return.type = "text") answer
answer <- "\n\nThe following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```\npca <- princomp(normalized.data)\n```" print (answer)
Sending a prompt with the selfcorrect function, will allow the possible generated code to be optimized for performance and accuracy. If the code that is returned by the model is not excecutable, the selfcorrect function will send the prompt back to the agent together with a list of errors and warnings, so that the code can be optimized. The amount of rounds of possible selfcorrect can be set by the user using the attempts = n
argument. The return value is a list containing the initial answer of the agent and the final answer after n rounds of selfcorrection.
botResponses <- list( "\n\nThe following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```R\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```r\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```{R}\npca <- princomp(normalized.data)\n```", "\n\nThe second response.The following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```\ndata <- read.table(\"test.txt\", header = TRUE, sep = \"\\t\")\n```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n```\nnormalized.data <- scale(data, center = TRUE, scale = TRUE)\n```\n\nFinally, the normalized data will be used to do a Principal Component Analysis (PCA):\n\n```\npca <- princomp(normalized.data)\n```", "\n\nThe third response.The following R code will read the file called \"test.txt\", normalize the table and do PCA. First, the code will read the file into an R data frame: \n\n```r\nplot(1:10)```\n\nNext, the data will be normalized to the range of 0 to 1:\n\n" ) answer <- list(init.response=botResponses[[1]], init.blocks=extractCode(clean_code_blocks(botResponses[[1]])), final.response=botResponses[[3]], final.blocks=extractCode(clean_code_blocks(botResponses[[3]])), code.works=TRUE, exec.result="path/to/html/file", tried.attempts=3)
answer <- selfcorrect(myAgent, prompt="How do I perform PCA?",attempts=3)
print(answer)
Once you have sent a prompt and recieved the answer, mergen features a function extractCode
that allows the user to extract code blocks from the given text. Before using this, however, the code blocks need to be cleaned up, as every agent will return its answer in a slightly different way. This can be done with the help of the clean_code_blocks
function. Below is an example of what clean_code_blocks
does with the answer returned by our agent above:
code_cleaned <- clean_code_blocks(answer$final.response) cat(code_cleaned)
As you can see above, clean_code_blocks
ensures that all code is stripped from extra symbols such as {r}, R, r and {R}. This ensures that the function extractCode
can extract the code blocks properly. The extractCode
function takes as input a string, and also allows the user to set a delimiter used to enclose the code blocks (default is three backtics). Now lets have a look at what the extractCode
function returns:
final_code <- extractCode(code_cleaned,delimiter = "```") print (final_code)
As shown above, extractCode
returns a list containing the actual code and the associated text. The code block can then be tested for execution using the executeCode
function.
mergen features functions that make it easy for the user to run the code returned by an AI agent. Once code blocks are cleaned up and extracted, code blocks can be executed using the executeCode
function. Before doing that, however, it is advised to run the extractInstallPkg
function. This function extracts package names and installs any missing packages needed for the code to run. Finally, the executeCode
function can be used. Lets see what the executeCode
function does:
executeCode(final_code$code)
As shown above, the code runs as it should! It is important to note that the executeCode
function will not change the global environment. Any variables that might be created while executing the code will be deleted as the function completes.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.