This vignette is intended as a primer for the not-beginner^[If you're no longer a beginner, but aren't sure if you're an intermediate or advanced R programmer, perhaps you are a "not-beginner"! For more thoughts on this topic see Meghan Harris's blog post How I Became A “Not-Beginner” in R] R user who has no (or minimal) experience with Docker.
One of the main ways to create AWS Lambda functions is with Docker images. Don't worry if you don't know what that is yet. What's important to know is that for lambdr to work you need to have a valid Docker image.
The sections that follow will explain what a Docker image is and how to make one for use with lambdr.
An image is a bit like a simple, switched-off virtual computer. Or, as Docker puts it, "a standardized package that includes all of the files, binaries, libraries, and configurations to run a container."
And what's a container? Well, "A container is simply an isolated process with all of the files it needs to run." That sounds kind of vague, and it is, so for the purposes of lambdr you can think of it as an instance of the virtual computer (image) that's switched on and can run your R code. It's independent of your own computing environment, but you can give it access to things like directories, so that there's a live link between the file system in the container and your local code.
The image doesn't come from nowhere. It's created from a Dockerfile, which is "a text-based document" that "provides instructions to the image builder on the commands to run, files to copy, startup command, and more."
It's also very useful to know the following terms:
This is what the flow looks like, from left to right:
For your R process, the image needs to contain a Linux distribution, your code, R itself, the R packages your code needs, and any system dependencies for Linux that R and the R packages require.
This is pretty similar to how most of us work locally. We have an operating system - typically macOS, Windows, or Linux. We install R. We install R packages. And we install any system dependencies that we need along the way, e.g. imagemagick, or Postgres.
Below is an example of a Dockerfile that could be used with lambdr. The purpose is just to show a minimal example that explains the basic concepts. It is not as an example Dockerfile to be used in production.
If you are confident that you understand Dockerfiles, images, and containers, and simply want a production-ready example to use as a reference, please see the article Placing an R Lambda Runtime in a Container.
Here is a full Dockerfile, followed by item-by-item explanations:
FROM docker.io/rocker/r-ver:4.4 RUN Rscript -e "options(warn = 2); install.packages('pak')" RUN Rscript -e "options( \ warn = 2, \ repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \ ); \ pak::pak( \ c( \ 'httr2', \ 'lambdr' \ ) \ )" # Lambda setup RUN mkdir /R COPY R/ /R RUN chmod 755 -R /R ENTRYPOINT Rscript R/runtime.R CMD ["handler"]
Dockerfiles contain instructions. In this example there are instructions
like FROM
and RUN
.
The first instruction is:
FROM docker.io/rocker/r-ver:4.4
To understand FROM
you need to know that Docker images are built up of layers.
Each layer is added to the ones that preceded it. This even includes images
someone else has built and put online. That's great, because they can form the
base on which you continue to build - which is also why they're known as
base images.
Base images are hosted in repositories. In our example, the repository is
docker.io
. The image in question is provided by
rocker, who make reliable Docker containers for R
environments. The image has a name, r-ver
, and a tag, 4.4
.
The instruction FROM
simply tells the builder to download and use the base
image as a starting point.
This particular base image has a version of Linux (Ubuntu 'Jammy'), R, and the system dependencies required for using R. Information about the image was available on the rocker website (at time of writing).
The next instruction used is RUN
. Every time RUN
is used it makes a new
image layer and executes whatever argument/s it has been given.
Here, it is used to install R packages - first pak,
which is itself an R package manager, and then httr2
and lambdr
by
using pak
:
RUN Rscript -e "options(warn = 2); install.packages('pak')" RUN Rscript -e "options( \ warn = 2, \ repos = c(CRAN = 'https://p3m.dev/cran/__linux__/jammy/2024-07-30') \ ); \ pak::pak( \ c( \ 'httr2', \ 'lambdr' \ ) \ )"
This is probably not how you're used to executing R code. It looks like this for a few reasons.
1) Using Rscript
, which is "A binary front-end to R, for use in scripting
applications". It can be given R files or, like in this case, in-line scripts to
execute. Basically, it's a non-interactive method of executing R code
2) Stringified R code littered with \
. The \
escapes any newlines
A warning level and a 'special' CRAN repository are also supplied. These ensure that the Docker image will error out if there's a problem installing packages, and that the packages are the latest amd64 binary versions for Ubuntu 'Jammy' as at the date supplied. For more detail about why these are set, see Placing an R Lambda Runtime in a Container.
For now, just know that RUN
added two layers where R packages get installed.
In this section the RUN
instruction is used again, alongside COPY
:
# Lambda setup RUN mkdir /R COPY R/ /R RUN chmod 755 -R /R
A directory called /R
is made in the image, then the local folder of R code is
copied into the image folder. COPY
uses the same syntax as cp
, i.e. source
destination
.
Then, chmod 755 -R
gets applied to the /R
folder. In short, this just makes
sure the files in the folder can be executed in the image.
The final instructions are ENTRYPOINT
and CMD
. At this point, all the R
code, packages, and dependencies have been installed. All that's left is to tell
Lambda where to find the runtime interface client and handler
function -
both of which are new concepts explained below.
ENTRYPOINT Rscript R/runtime.R CMD ["handler"]
The runtime.R
file contains two things necessary for Lambda to run:
1) A function called handler()
- The orchestrating function that takes input, does stuff, and returns output
- The equivalent to 'main' in other programming languages
2) The lambdr
function lambdr::start_lambda()
- This starts the R runtime interface client, which will then listen for any
incoming events and pass return values back out of the Lambda function
An example runtime.R
is given in the article
Placing an R Lambda Runtime in a Container.
For this introduction all you need to know is that ENTRYPOINT
needs to
execute Rscript
on R/runtime.R
so that lambdr::start_lambda()
gets called.
CMD
simply takes the name of the handler function. You can call the handler
whatever you like, but as there is only ever one handler per Lambda, by
convention we just call it handler
. The CMD
gets passed as an argument to
lambdr::start_lambda()
.
You're probably wondering "How do I use the Dockerfile!?"
We won't get into that here.
Instead, the article Placing an R Lambda Runtime in a Container has sections about development versus deployment containers, and how to practically figure that stuff out for use with lambdr. We recommend reading the whole article.
Before moving on to the article, you may also want to look into the basics of Docker more generally. You can get a good feel for what using Docker looks like in practice by watching Docker Tutorial for Beginners by James Murphy. Or, if you want a hands-on tutorial there is the official Docker 101 Tutorial.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.