outsider allows users to install and run programs on their own computer
without the need to leave the R environment. If there is a command-line program
that is not available through outsider you can create your own module!
If you are able to install the program on your own machine and have some experience with R packages and GitHub then you should be readily able to create one.
On this page, we will outline what an outsider module is and then provide
a simple walkthrough for creating the simplest of simple modules: a command line
program that prints whatever text the user provides it.
To follow this guide you will need the following:
At its heart, an outsider module is just an R package with a Dockerfile.
All modules have the following basic file/folder structure:
# Core files and folders of a module #
- DESCRIPTION
- R/
    - functions.R
    - import.R
- inst/
    - dockerfiles/
        - latest/
            - Dockerfile
    - om.yaml
- README.md
- README.Rmd
- .travis.yml
The R package is determined by the DESCRIPTION file and the R/ folder.
The first of these describes the package details (package name, author,
dependencies, etc.) and the second contains the R code that makes up the
package.
The R/ folder can have any number of scripts with whatever names a developer
chooses (skeleton default functions.R and import.R).
The outsider package and its modules depend on Docker to run. Docker is a
service that is able to host isolated software packages, termed "containers",
that act like virtual machines but, thanks to
OS-level virtualisation,
are in fact much more lightweight. By using a Docker container a user is able to
run specific code, often designed for different operating systems, on any
machine that has Docker installed. A Docker container is created by
running a Docker "image" where the image acts as a description of the code
necessary for the container to function (base operating system, required
programs, etc.). And these images are described by a Dockerfile.
|Docker term|Description| | --------- |:----------| |Container|Isolated environment where applications are hosted and launched| |Image|A file that acts as the "blueprint" for containers| |Dockerfile|Text-based file describing the steps (layers) that define an image|
In an outsider module, the dockerfiles/ folder in the module contains the
Dockerfile that describes how to install the external command-line program
that the developer wishes to run through R. dockerfiles/ can have multiple
versions of a Dockerfile but every module must have a latest/.
Why "latest"? Every Docker image is tagged with a version name/number. For
outsidermodules all images must have a "latest" tag which acts as the default tag. Additionally, a developer may add any number of additional tagged versions of their program (e.g.legacy version of a program) to their module by creating new Dockerfiles in separate directories underdockerfiles/.
For more information on Docker, see:
To make modules discoverable on GitHub, all modules require an om.yml.
This file has two elements (program and details) encoded in the
YAML format. In addition, the module
has a README.md file that provides the text describing the module on the
module's GitHub (or other code-sharing site) homepage. .md format is like a
simplified HTML. It can be generated by hand or -- by default with
outsider.devtools -- it can be rendered from a .Rmd file, in this case
README.Rmd. The advantage of the .Rmd-approach is that R code chunks are
parsed and their output is then included in the .md version of the file. This
allows users to better understand the code as they can directly see the output.
Finally, a .travis.yml provides instructions detailing how the module should
be tested on Travis-CI. By default,
the outsider.devtools helper functions create a .travis.yml file that tests
by installing the module and then running the examples of the exported R
functions.
We will walk you through how to create your own outsider module that will
simply print (through echo) any text provided. This process comes in a
series of steps:
i. Generate core files and folders ii. Create Docker image iii. Document and build the R package iv. Try the module v. Upload to GitHub and Docker Hub
What's "Docker Hub"? Docker Hub is service that hosts Docker images. It can simplify Dockerfile creation as image layers can be sourced from pre-existing images available via Docker Hub. Additionally, by sharing your images generated from your module Dockerfile on Docker Hub, you will be speeding up the installation step for end-users.
As displayed above, we need to generate the core files and folders of a module.
This process can be easily performed using the module_skeleton() function.
This function takes a few details about the developer and the program and then
generates all the necessary core file structures.
The necessary information required for the module to run are our GitHub and
Docker Hub usernames plus the name for the program we wish to provide as a
module (which is echo -- a UNIX command for printing). In these code snippets,
the usernames are of the outsider maintainer. In order for these examples to
work for you, you will need to change "dombennett" to your own usernames.
(Note, your usernames may differ for GitHub and Docker Hub).
library(outsider.devtools) # the file location of where the module is saved is returned to "module_path" module_path <- module_skeleton(repo_user = 'dombennett', program_name = 'echo', docker_user = 'dombennett', flpth = tempdir(), full_name = 'D.J. Bennett', email = 'dominic.john.bennett@gmail.com', service = 'github') # folder name where module is stored print(basename(module_path))
## [1] "om..echo"
The above code will create an outsider module with the module and directory
name om..echo at the file location module_path.
After running the above code, you should take a minute to inspect the generated
files. In particular you should look at the DESCRIPTION file, the Dockerfile
and functions.R.
File tree module_path (before build)
## ├── DESCRIPTION ## ├── R ## │ ├── functions.R ## │ └── import.R ## ├── README.Rmd ## ├── examples ## │ └── example.R ## └── inst ## ├── dockerfiles ## │ └── latest ## │ └── Dockerfile ## └── om.yml
Why
om..echo? All modules must start with "om.." in order for them to be discovered on GitHub. This is not a requirement for the functioning of the module, it just allowsoutsider::module_search()to find them.
At this stage, we would then edit the inst/dockerfiles/latest/Dockerfile and
the R/functions.R to work for our chosen external program. But because echo
-- the program we wish to port through outsider -- is so simple we don't
actually have to make any changes to these files. By default, the Dockerfile
is based on Ubuntu which ships with echo and our starting function in
functions.R is based around running echo: it parses the arguments, creates
an outsider object, and then launches the object.
# The echo function in om..echo echo <- function(...) { # convert the ... into an argument list arglist <- arglist_get(...) # create an outsider object: describe the arguments and program otsdr <- outsider_init(pkgnm = 'om..echo', cmd = 'echo', arglist = arglist) # run the command run(otsdr) }
What's
...? In function calls in R,...indicate that any number of arguments can be provided to a function. Thearglist_getfunction will take the...and convert them into a character vector that can be parsed.outsiderrecommends module functions make use of this feature so that any number of arguments can be passed to external programs. Additionally, this has the advantage that the developer would then not need to document all the arguments of the external program. For many external programs there may be hundreds of arguments, all of which are likely to be already documented; viewable through commands like-hor--help. For programs with few arguments or where the execution of the external program requires additional setting-up (input files, environment settings, etc.) or where argument definitions cannot be displayed with-hor--helpthen it is best to provide richer documentation at the R level. In these instances, the specific arguments to the external program can either be separate R arguments,function(arg1, arg2), or they can be a character vector along with other arguments,function(arglist=arglist_get(...), input=NULL, memmory="1GB").
With the skeleton set-up, we can now install the R package using
module_build(). At this stage we only want to build the package components,
not the Docker image, so we can set all build options to TRUE except for the
build_image option.
module_build(flpth = module_path, build_documents = TRUE, build_package = TRUE, build_image = FALSE, build_readme = TRUE)
## Updating om..echo documentation
## First time using roxygen2. Upgrading automatically...
## Updating roxygen version in /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION
## Loading om..echo
## Building om..echo readme
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::document() ... ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Writing NAMESPACE ## Writing NAMESPACE ## Writing echo.Rd ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::install() ... ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION’ ... ✓ checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION’ ## ─ preparing ‘om..echo’: ## checking DESCRIPTION meta-information ... ✓ checking DESCRIPTION meta-information ## ─ checking for LF line-endings in source and make files and shell scripts ## ─ checking for empty or unneeded directories ## ─ building ‘om..echo_0.0.1.tar.gz’ ## ## Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \ ## /var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T//RtmpquGDgt/om..echo_0.0.1.tar.gz --install-tests ## - * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’ ## | * installing *source* package ‘om..echo’ ... ## - ** using staged installation ## | ** R ## - ** inst ## | ** byte-compile and prepare package for lazy loading ## - | ** help ## - *** installing help indices ## | ** building package indices ## - ** testing if installed package can be loaded from temporary location ## | ** testing if installed package can be loaded from final location ## - ** testing if installed package keeps a record of temporary installation path ## | * DONE (om..echo) ## - | ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::build_readme() ... ## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
What does
build_documents = TRUEdo? In addition to the core files indicated above, an R package also requires R documentation files that are stored inman/-- these provide the?[function]utility. The above function generates these files viaroxygencomments and tags located in the R scripts that make up the package, i.e. all comments that begin#'. For more information, look up "object documentation in R".What does
build_readme = TRUEdo? This builds theREADME.mdfile from theREADME.Rmdfile. The two files will contain the same text and images, but for any R code snippets in the.Rmd, these are first run and their output is captured and placed in the.mdfile. Remember that theREADME.mdacts as the homepage to the module on GitHub.
File tree module_path (after build)
## ├── DESCRIPTION ## ├── NAMESPACE ## ├── R ## │ ├── functions.R ## │ └── import.R ## ├── README.Rmd ## ├── README.html ## ├── README.md ## ├── examples ## │ └── example.R ## ├── inst ## │ ├── dockerfiles ## │ │ └── latest ## │ │ └── Dockerfile ## │ └── om.yml ## └── man ## └── echo.Rd
Dockerfile commands Dockerfiles are series of instructions for constructing a "containerised" machine called a Docker image (functionally it's a bit like a virtual machine, but more lightweight). Each command begins with a capitalised instruction followed by arguments. The most common instruction would be "RUN", this executes command-line code within the Docker image system. For example,
RUN echo "command!"would pass "command!" to the program "echo". All Dockerfiles begin with a FROM instruction, which pulls a Docker image on which to build your own image. For example, many images are built upon the Linux operating system Ubuntu in which case the first line of the Dockerfile would be "FROM ubuntu:latest". This first line would then download the latest Ubuntu Docker image and all subsequent "RUN" instructions would be running in Ubuntu command-line. For far more detailed information on Dockerfiles, see the Docker docs.
In our om..echo we have a dockerfiles folder that contains a Dockerfile
describing the Docker image for our echo program. Our "latest" Dockerfile
contains the instructions to pull the Docker image of the latest Ubuntu release,
to create a folder called "working_dir" and then set this new folder as the
"WORKDIR".
# Example host distro FROM ubuntu:latest # Install program using RUN lines # outsider *requires* working_dir RUN mkdir /working_dir WORKDIR /working_dir
What's the WORKDIR? The WORKDIR sets the working directory when a command is passed to the Docker image. All
outsidermodules require this to be "working_dir" as it allowsoutsiderfunctions to know where to transfer files to and from the container.
Using the Dockerfile that was created with the skeleton, we can now build our
Docker image using module_build(). (In practice, we'd combine steps 2 and 3
by calling module_build and setting all the build arguments to TRUE.) Docker
will build the image from the Dockerfile and store it along with all the other
images that are available on your machine, you do not need to worry about where
the Docker image is stored. With an existing image that is associated with the
module, when the module's code is called a new container is created from the
image and commands are passed to the container from R.
module_build(flpth = module_path, tag = 'latest', build_image = TRUE, build_documents = FALSE, build_package = FALSE, build_readme = FALSE)
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running docker_build() ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Command: ## docker build -t dombennett/om_echo:latest /Library/Frameworks/R.framework/Versions/3.6/Resources/library/om..echo/dockerfiles/latest ## .....................................................................................................................
What is a tag? A Docker tag is akin to the version number of the image. By default, if no tag is provided, Docker will use 'latest'.
Note: The functions try to be helpful by printing to screen the exact Docker commands of the tasks being performed. This is to give you a better idea of what is happening and to allow you to recreate the operations via a terminal.
With a Docker image and an installed R package we are now ready to try out the module before we upload it online. The below command will look up the associated image for module, create a new container, run the command and then shut-down the container. (If no associated image can be found, it will attempt to pull one from Docker Hub.)
library(outsider)
## ---------------- ## outsider v 0.1.0 ## ---------------- ## - Security notice: be sure of which modules you install
# the repo always refers to the future github repo echo <- module_import('echo', repo = 'dombennett/om..echo') echo('hello world!')
After we have played with the module and ensured it works as we would hope, we can upload it to our GitHub and Docker accounts so that others may download it.
To upload to GitHub we must first create a version of the repository online.
Visit your GitHub account and then create a new repository by clicking
Repositories > New. Ensure to name the online version with the same name as
your module, i.e. om..echo. (No initial steps are required to upload to
Docker-Hub.)
# based on the module details, the function will determine which code-sharing # service to upload to module_upload(flpth = module_path, code_sharing = TRUE, dockerhub = TRUE)
Delete it all
Don't want om..echo on your computer? Delete it so ...
# to delete the Docker image and uninstall the R package module_uninstall(repo = 'dombennett/om..echo')
## Removing 'om..echo'
## Removing package from '/Library/Frameworks/R.framework/Versions/3.6/Resources/library' ## (as 'lib' is unspecified)
# to delete the repo folder unlink(x = module_path, recursive = TRUE, force = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.