Local Models with Ollama
In gptstudio: Use Large Language Models Directly in your Development Environment

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

The ollama service allows you to run open source LLMs locally, providing a command line interface and an API. By wrapping the later, we can use it within our chat app.

You can run ollama in any platform as a docker container. The following code runs the CPU-only version:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama

This code:

pulls the latest ollama image from the ollama hub (ollama/ollama)
exposes the ollama API in http://localhost:11434 (-p 11434:11434)
sets up the ollama volume, to be used in the "/root/.ollama" path inside the container. this will allow you to update the container later without losing your already downloaded models. (-v ollama:/root/.ollama)
assigns the name "ollama" to the container (--name ollama)
runs the container in detached mode (docker run -d)

You can see more docker options in the official blog post.

Before using the service, you need to pull a model. Run the following code inside your container to pull llama2:

ollama pull llama2

Check the ollama library to see more models. For more advanced install options, check the official documentation.

By default, the chat addin will use http://localhost:11434 to locate the ollama API. You can customize this by setting up the OLLAMA_HOST environmental variable with usethis::edit_r_environ().

An Example with Ollama

Here is a short video showing you how to get started with ollama. It assumes that you have already installed docker. See the docker installation guide for more information.

{width=700px}