Credit to https://github.com/cw25/opencpu_service_ci_tutorial
R/
: Your R code goes in this directorydocker/installer.R
: Used to install any R dependencies your code might have. This will take the place of library()
calls in your R codedocker/opencpu_config/Renviron
: (optional) If your code relies on environment variables (via Sys.getenv()
), you will need to set those variables here so that OpenCPU has access to them when it runs inside the Docker containerdocker/opencpu_config/server.conf
: (optional) Tells the OpenCPU to preload your R dependencies for better performancetests/testthat.R
: The master test file that initiates tests for your packagetests/testthat/
: Your individual tests go in this directory.gitignore
: Prevents your local junk (things like the .DS_Store files on your Mac) from being stored/tracked by Git.travis.yml
: Travis integration settingsDESCRIPTION
: Metadata about the R package you are developingDockerfile
: Instructions for Docker on how to containerize your applicationLICENSE
: Legal things that people rarely readNAMESPACE
: R package instructions for importing other libraries and exporting your own functionsREADME.md
: Bad jokes and explanatory gymnasticsFor this tutorial, we will build a very simple service that takes in a text string and returns the average number of characters per word. The stringr
package provides some very useful functions that will allow us to do this quickly and easily. Here is the code that we will build our service around:
getMeanWordLength <- function(text) {
words <- str_split(text, " ")
word_lengths <- lapply(words, str_length)[[1]]
return(mean(word_lengths))
}
Our R code should live in the R/
directory of our project, so we will place this code in R/getMeanWordLength.R
. Now, we will need to complete the other files that the R packaging system expects: DESCRIPTION
and NAMESPACE
.
The DESCRIPTION
file is pretty self-explanatory, but pay special attention to the Depends:
and Suggests:
lines. Our package depends on stringr
, so we need to make that dependency explicit by including it in Depends:
. Because we will also be using testthat
for our unit tests, we need to name it in the Suggests:
line. Here's what our DESCRIPTION
file looks like:
Package: stringstats
Title: String Statistics API
Version: 1.0
Date: 2017-08-30
Authors@R: person("Christopher", "Walker", email = "cw25@me.com", role = c("aut", "cre"))
Author: Christopher Walker [aut, cre]
Maintainer: Christopher Walker <cw25@me.com>
Description: API for calculating simple statistics about text strings.
Depends: R, stringr
Suggests: testthat
License: MIT
Encoding: UTF-8
LazyData: true
We also need to update NAMESPACE
to tell our package how to access the functions we need from the stringr
package. Thankfully, it's a simple one-liner:
import(stringr)
We have a bare bones R package now. Now let's look at how we can write and run tests to ensure that our package does what we expect it to do.
Docker is going to do most of the heavy lifting for us where OpenCPU is concerned. All we need to worry about is how to configure the OpenCPU server. OpenCPU generally works very well out of the box, but I'll share two things that I've found useful. (Both are optional, so you can skip them if you like.)
OpenCPU can accept runtime configuration options from a server.conf
file. We will store that file at docker/opencpu_config/server.conf
in our project and later we will tell Docker to inject it into our service container. I like to use server.conf
to tell OpenCPU about my R dependencies in advance so it will preload those libraries at server startup. The format is simple JSON:
{
"preload": ["stringr"]
}
I've also run into cases where I want OpenCPU to have access to environment variables. docker/opencpu_config/Renviron
stores those variables.
(Note: Our .gitignore
file is set up to exclude this file, so if you are using it to store something sensitive like database access credentials, they won't wind up publicly accessible in Github.)
In this tutorial, we won't actually use Renviron
, but if we did, it would look something like this:
MYSERVICE_VAR1=foo
MYSERVICE_VAR2=bar
Our R code would then be able to access those environment variables using calls to Sys.getenv()
.
We will need to setup two Docker-related files in order to build a Docker image of our service. Once the image is built, we will spin up a container with our image and see OpenCPU in action. Let's start with our Dockerfile
:
# Use the official OpenCPU Dockerfile as a base
FROM opencpu/base
# Put a copy of our R code into the container
WORKDIR /usr/local/src
COPY . /usr/local/src/app
# Move OpenCPU configuration files into place
COPY docker/opencpu_config/* /etc/opencpu/
# Run our custom install script to install R dependencies
RUN /usr/bin/R --vanilla -f app/docker/installer.R
# Install our code as an R package on the server
RUN tar czf /tmp/stringstats.tar.gz app/ \
&& /usr/bin/R CMD INSTALL /tmp/stringstats.tar.gz
Again, OpenCPU has done lots of the heavy lifting for us. The opencpu/base
image takes care of the low-level setup and we only have to worry about our service. (If you're really curious to see the OpenCPU server's Dockerfile
, you can take a look here.)
In our Dockerfile
, you may have noticed that there is a command that runs a custom install script, docker/installer.R
. We will use that script to install our R package dependencies. For this tutorial, we only need to install stringr
from CRAN:
install.packages(c('stringr'), repos='http://cran.us.r-project.org', dependencies=TRUE)
If we wanted to install multiple CRAN packages, we would simply add them to our install.packages()
call. We might also use devtools::install_github()
to install R packages hosted on Github.
$ docker build -t stringstats .
Sending build context to Docker daemon 243.7kB
Step 1/6 : FROM opencpu/base
---> 9f6c992d11d8
Step 2/6 : WORKDIR /usr/local/src
---> Using cache
---> 90ca706a641d
Step 3/6 : COPY . /usr/local/src/app
---> 6ab2e5381552
Removing intermediate container 454e6a09e6c3
Step 4/6 : COPY docker/opencpu_config/* /etc/opencpu/
---> 9ebff403e548
Removing intermediate container 1373debe2889
Step 5/6 : RUN /usr/bin/R --vanilla -f app/docker/installer.R
---> Running in 971798f47f33
...(lots of output as R installs stringr and its dependencies)...
---> 58c99ee9cc7b
Removing intermediate container 971798f47f33
Step 6/6 : RUN tar czf /tmp/stringstats.tar.gz app/ && /usr/bin/R CMD INSTALL /tmp/stringstats.tar.gz
---> Running in fde9754b5566
* installing to library '/usr/local/lib/R/site-library'
* installing *source* package 'stringstats' ...
** R
** preparing package for lazy loading
** help
No man pages found in package 'stringstats'
*** installing help indices
** building package indices
** testing if installed package can be loaded
* DONE (stringstats)
---> b8228a7adfc2
Removing intermediate container fde9754b5566
Successfully built b8228a7adfc2
Successfully tagged stringstats:latest
Now that we've built an image, let's launch a container: docker run -d -p 8004:8004 stringstats
. OpenCPU provides a UI on port 8004, so we make sure to tell Docker to map that port to port 8004 on localhost. This will let us access the running container in our web browser for testing.
$ docker run -d -p 8004:8004 stringstats
448ee587c78a224749ab5d594c0a095eb13001e5e1b9441190a566fd136f615a
That long ID is your unique container ID, but I find it much easier to lean on docker ps
to see my running containers, get their unique IDs and names, etc.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
448ee587c78a stringstats "/bin/sh -c 'apach..." 3 minutes ago Up 3 minutes 80/tcp, 443/tcp, 0.0.0.0:8004->8004/tcp pensive_colden
Here's the moment of truth... let's open a web browser and test our running API. Browse to http://localhost:8004/ocpu/test/
and you should see OpenCPU's test page:
First, let's make sure that OpenCPU has our stringstats
package installed. In the "HTTP Request Options" form, try this endpoint (leave the Method set to GET): ../library/stringstats/R/getMeanWordLength
When we test the request, it should succeed with code HTTP 200 OK
:
In production, consumers of our API obviously won't use the OpenCPU testing interface. If you want to directly hit the API itself, try browsing to http://localhost:8004/ocpu/library/stringstats/R/getMeanWordLength/print
We've been using GET
requests, so instead of executing our code, OpenCPU is just showing us the underlying code for our endpoint. When we want to actually execute the code, we will use POST
requests instead. Go back to the OpenCPU test interface and try a POST
request.
POST
POST
argumenttext
, so we set the "Param Name" to text
alsoWhat the deuce is that output?! A bunch of weirdo file paths or URLs? That's not what we expected. Here are the URLs that popped up for me (they use temporary IDs, so yours will look just a bit different):
/ocpu/tmp/x074d9e56cf/R/getMeanWordLength
/ocpu/tmp/x074d9e56cf/R/.val
/ocpu/tmp/x074d9e56cf/stdout
/ocpu/tmp/x074d9e56cf/source
/ocpu/tmp/x074d9e56cf/console
/ocpu/tmp/x074d9e56cf/info
/ocpu/tmp/x074d9e56cf/files/DESCRIPTION
I won't go into detail on all of these, but the basic idea is that OpenCPU captures a number of different streams of information for every request. You can see the raw stdout output, the code block that was executed, the exact function call, etc. The .val
URL is the one we would use to see the results of our API call, so I'll browse to http://localhost:8004/ocpu/tmp/x074d9e56cf/R/.val
to view the output:
[1] 4.428571
It works! But wait, it redirected me! I wound up at http://localhost:8004/ocpu/tmp/x074d9e56cf/R/.val/print
. To use this as a service, I'd want JSON output instead. Edit the URL and try http://localhost:8004/ocpu/tmp/x074d9e56cf/R/.val/json
. You should see the same data presented as JSON.
This is all great for testing, but what about production? We don't want to hit the service twice for every request. Ideally, we would just send a single POST
request to the endpoint we want, and get a JSON payload back without ever seeing those temporary IDs. In that case, we would send our POST
request directly to: http://localhost:8004/ocpu/library/stringstats/R/getMeanWordLength/json
$ curl -k -H "Content-Type: application/json" -X POST -d '{"text": "This is a direct API request"}' http://localhost:8004/ocpu/library/stringstats/R/getMeanWordLength/json
[3.8333]
When working with OpenCPU, I highly recommend keeping a link to the OpenCPU API docs handy!
The last thing we need to do is setup our Travis CI integration. Once you've logged in to Travis, click on your avatar icon in the upper right to visit your account settings page. On that page, you'll see a list of your public Github repos, with handy instructions.
Just click the button-slider icon next to your repo's name and you should see a green check mark indicating that Travis has been enabled for your repo.
Now, click on the name of your repo to watch your builds. You won't see anything there yet because we also need to set up the .travis.yml
file in our repo:
sudo: false
warnings_are_errors: false
language: r
cache: packages
Now that both sides are configured properly, the next time you commit and push changes to Github, Travis will automatically trigger a build. Here's what it looks like on the Travis side. The build status is colored yellow to indicate a build in progress.
After it completes, it will turn red upon failure or green upon success.
Success! We now have a pipeline that takes us through development, testing, and CI. Actual deployment to production is left as an exercise for the reader ;-)
Many thanks to:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.