outsider allows users to install and run programs on their own computer without the need to leave the R environment. If there is a command-line program that is not available through outsider you can create your own module!

If you are able to install the program on your own machine and have some experience with R packages and GitHub then you should be readily able to create one.

On this page, we will outline what an outsider module is and then provide a simple walkthrough for creating the simplest of simple modules: a command line program that prints whatever text the user provides it.

To follow this guide you will need the following:


Basics

Module structure

At its heart, an outsider module is just an R package with a Dockerfile. All modules have the following basic file/folder structure:

# Core files and folders of a module #
- DESCRIPTION
- R/
    - functions.R
    - import.R
- inst/
    - dockerfiles/
        - latest/
            - Dockerfile
    - om.yaml
- README.md
- README.Rmd
- .travis.yml

R files and folders

The R package is determined by the DESCRIPTION file and the R/ folder. The first of these describes the package details (package name, author, dependencies, etc.) and the second contains the R code that makes up the package. The R/ folder can have any number of scripts with whatever names a developer chooses (skeleton default functions.R and import.R).

Docker files and folders

The outsider package and its modules depend on Docker to run. Docker is a service that is able to host isolated software packages, termed "containers", that act like virtual machines but, thanks to OS-level virtualisation, are in fact much more lightweight. By using a Docker container a user is able to run specific code, often designed for different operating systems, on any machine that has Docker installed. A Docker container is created by running a Docker "image" where the image acts as a description of the code necessary for the container to function (base operating system, required programs, etc.). And these images are described by a Dockerfile.

|Docker term|Description| | --------- |:----------| |Container|Isolated environment where applications are hosted and launched| |Image|A file that acts as the "blueprint" for containers| |Dockerfile|Text-based file describing the steps (layers) that define an image|

In an outsider module, the dockerfiles/ folder in the module contains the Dockerfile that describes how to install the external command-line program that the developer wishes to run through R. dockerfiles/ can have multiple versions of a Dockerfile but every module must have a latest/.

Why "latest"? Every Docker image is tagged with a version name/number. For outsider modules all images must have a "latest" tag which acts as the default tag. Additionally, a developer may add any number of additional tagged versions of their program (e.g.legacy version of a program) to their module by creating new Dockerfiles in separate directories under dockerfiles/.

For more information on Docker, see:

GitHub files and folders

To make modules discoverable on GitHub, all modules require an om.yml. This file has two elements (program and details) encoded in the YAML format. In addition, the module has a README.md file that provides the text describing the module on the module's GitHub (or other code-sharing site) homepage. .md format is like a simplified HTML. It can be generated by hand or -- by default with outsider.devtools -- it can be rendered from a .Rmd file, in this case README.Rmd. The advantage of the .Rmd-approach is that R code chunks are parsed and their output is then included in the .md version of the file. This allows users to better understand the code as they can directly see the output.

Finally, a .travis.yml provides instructions detailing how the module should be tested on Travis-CI. By default, the outsider.devtools helper functions create a .travis.yml file that tests by installing the module and then running the examples of the exported R functions.


Walkthrough

We will walk you through how to create your own outsider module that will simply print (through echo) any text provided. This process comes in a series of steps:

i. Generate core files and folders ii. Create Docker image iii. Document and build the R package iv. Try the module v. Upload to GitHub and Docker Hub

What's "Docker Hub"? Docker Hub is service that hosts Docker images. It can simplify Dockerfile creation as image layers can be sourced from pre-existing images available via Docker Hub. Additionally, by sharing your images generated from your module Dockerfile on Docker Hub, you will be speeding up the installation step for end-users.

Generate the files and folders

As displayed above, we need to generate the core files and folders of a module. This process can be easily performed using the module_skeleton() function. This function takes a few details about the developer and the program and then generates all the necessary core file structures.

The necessary information required for the module to run are our GitHub and Docker Hub usernames plus the name for the program we wish to provide as a module (which is echo -- a UNIX command for printing). In these code snippets, the usernames are of the outsider maintainer. In order for these examples to work for you, you will need to change "dombennett" to your own usernames. (Note, your usernames may differ for GitHub and Docker Hub).

library(outsider.devtools)
# the file location of where the module is saved is returned to "module_path"
module_path <- module_skeleton(repo_user = 'dombennett', program_name = 'echo',
                               docker_user = 'dombennett', flpth = tempdir(),
                               full_name = 'D.J. Bennett',
                               email = 'dominic.john.bennett@gmail.com',
                               service = 'github')
# folder name where module is stored
print(basename(module_path))
## [1] "om..echo"

The above code will create an outsider module with the module and directory name om..echo at the file location module_path. After running the above code, you should take a minute to inspect the generated files. In particular you should look at the DESCRIPTION file, the Dockerfile and functions.R.

File tree module_path (before build)

## ├── DESCRIPTION
## ├── R
## │   ├── functions.R
## │   └── import.R
## ├── README.Rmd
## ├── examples
## │   └── example.R
## └── inst
##     ├── dockerfiles
##     │   └── latest
##     │       └── Dockerfile
##     └── om.yml

Why om..echo? All modules must start with "om.." in order for them to be discovered on GitHub. This is not a requirement for the functioning of the module, it just allows outsider::module_search() to find them.

At this stage, we would then edit the inst/dockerfiles/latest/Dockerfile and the R/functions.R to work for our chosen external program. But because echo -- the program we wish to port through outsider -- is so simple we don't actually have to make any changes to these files. By default, the Dockerfile is based on Ubuntu which ships with echo and our starting function in functions.R is based around running echo: it parses the arguments, creates an outsider object, and then launches the object.

# The echo function in om..echo
echo <- function(...) {
  # convert the ... into an argument list
  arglist <- arglist_get(...)
  # create an outsider object: describe the arguments and program
  otsdr <- outsider_init(pkgnm = 'om..echo',
                         cmd = 'echo', arglist = arglist)
  # run the command
  run(otsdr)
}

What's ...? In function calls in R, ... indicate that any number of arguments can be provided to a function. The arglist_get function will take the ... and convert them into a character vector that can be parsed. outsider recommends module functions make use of this feature so that any number of arguments can be passed to external programs. Additionally, this has the advantage that the developer would then not need to document all the arguments of the external program. For many external programs there may be hundreds of arguments, all of which are likely to be already documented; viewable through commands like -h or --help. For programs with few arguments or where the execution of the external program requires additional setting-up (input files, environment settings, etc.) or where argument definitions cannot be displayed with -h or --help then it is best to provide richer documentation at the R level. In these instances, the specific arguments to the external program can either be separate R arguments, function(arg1, arg2), or they can be a character vector along with other arguments, function(arglist=arglist_get(...), input=NULL, memmory="1GB").

Building the R package

With the skeleton set-up, we can now install the R package using module_build(). At this stage we only want to build the package components, not the Docker image, so we can set all build options to TRUE except for the build_image option.

module_build(flpth = module_path, build_documents = TRUE, build_package = TRUE,
             build_image = FALSE, build_readme = TRUE)
## Updating om..echo documentation
## First time using roxygen2. Upgrading automatically...
## Updating roxygen version in /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION
## Loading om..echo
## Building om..echo readme
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::document() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Writing NAMESPACE
## Writing NAMESPACE
## Writing echo.Rd
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::install() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
##   
   checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION’ ...

✓  checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..echo/DESCRIPTION’
## 

─  preparing ‘om..echo’:
## 

   checking DESCRIPTION meta-information ...

✓  checking DESCRIPTION meta-information
## 

─  checking for LF line-endings in source and make files and shell scripts
## 

─  checking for empty or unneeded directories
## 

─  building ‘om..echo_0.0.1.tar.gz’
## 


## 
Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
##   /var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T//RtmpquGDgt/om..echo_0.0.1.tar.gz --install-tests 
## 
-
* installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’
## 
|
* installing *source* package ‘om..echo’ ...
## 
-
** using staged installation
## 
|
** R
## 
-
** inst
## 
|
** byte-compile and prepare package for lazy loading
## 
-

|
** help
## 
-
*** installing help indices
## 
|
** building package indices
## 
-
** testing if installed package can be loaded from temporary location
## 
|
** testing if installed package can be loaded from final location
## 
-
** testing if installed package keeps a record of temporary installation path
## 
|
* DONE (om..echo)
## 
-

|


─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running devtools::build_readme() ...
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

What does build_documents = TRUE do? In addition to the core files indicated above, an R package also requires R documentation files that are stored in man/ -- these provide the ?[function] utility. The above function generates these files via roxygen comments and tags located in the R scripts that make up the package, i.e. all comments that begin #'. For more information, look up "object documentation in R".

What does build_readme = TRUE do? This builds the README.md file from the README.Rmd file. The two files will contain the same text and images, but for any R code snippets in the .Rmd, these are first run and their output is captured and placed in the .md file. Remember that the README.md acts as the homepage to the module on GitHub.

File tree module_path (after build)

## ├── DESCRIPTION
## ├── NAMESPACE
## ├── R
## │   ├── functions.R
## │   └── import.R
## ├── README.Rmd
## ├── README.html
## ├── README.md
## ├── examples
## │   └── example.R
## ├── inst
## │   ├── dockerfiles
## │   │   └── latest
## │   │       └── Dockerfile
## │   └── om.yml
## └── man
##     └── echo.Rd

Creating/Building the Docker image

Dockerfile commands Dockerfiles are series of instructions for constructing a "containerised" machine called a Docker image (functionally it's a bit like a virtual machine, but more lightweight). Each command begins with a capitalised instruction followed by arguments. The most common instruction would be "RUN", this executes command-line code within the Docker image system. For example, RUN echo "command!" would pass "command!" to the program "echo". All Dockerfiles begin with a FROM instruction, which pulls a Docker image on which to build your own image. For example, many images are built upon the Linux operating system Ubuntu in which case the first line of the Dockerfile would be "FROM ubuntu:latest". This first line would then download the latest Ubuntu Docker image and all subsequent "RUN" instructions would be running in Ubuntu command-line. For far more detailed information on Dockerfiles, see the Docker docs.

In our om..echo we have a dockerfiles folder that contains a Dockerfile describing the Docker image for our echo program. Our "latest" Dockerfile contains the instructions to pull the Docker image of the latest Ubuntu release, to create a folder called "working_dir" and then set this new folder as the "WORKDIR".

# Example host distro
FROM ubuntu:latest

# Install program using RUN lines

# outsider *requires* working_dir
RUN mkdir /working_dir
WORKDIR /working_dir

What's the WORKDIR? The WORKDIR sets the working directory when a command is passed to the Docker image. All outsider modules require this to be "working_dir" as it allows outsider functions to know where to transfer files to and from the container.

Using the Dockerfile that was created with the skeleton, we can now build our Docker image using module_build(). (In practice, we'd combine steps 2 and 3 by calling module_build and setting all the build arguments to TRUE.) Docker will build the image from the Dockerfile and store it along with all the other images that are available on your machine, you do not need to worry about where the Docker image is stored. With an existing image that is associated with the module, when the module's code is called a new container is created from the image and commands are passed to the container from R.

module_build(flpth = module_path, tag = 'latest', build_image = TRUE,
             build_documents = FALSE, build_package = FALSE,
             build_readme = FALSE)
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Running docker_build()
## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Command:
## docker build -t dombennett/om_echo:latest /Library/Frameworks/R.framework/Versions/3.6/Resources/library/om..echo/dockerfiles/latest
## .....................................................................................................................

What is a tag? A Docker tag is akin to the version number of the image. By default, if no tag is provided, Docker will use 'latest'.

Note: The functions try to be helpful by printing to screen the exact Docker commands of the tasks being performed. This is to give you a better idea of what is happening and to allow you to recreate the operations via a terminal.

Try the module.

With a Docker image and an installed R package we are now ready to try out the module before we upload it online. The below command will look up the associated image for module, create a new container, run the command and then shut-down the container. (If no associated image can be found, it will attempt to pull one from Docker Hub.)

library(outsider)
## ----------------
## outsider v 0.1.0
## ----------------
## - Security notice: be sure of which modules you install
# the repo always refers to the future github repo
echo <- module_import('echo', repo = 'dombennett/om..echo')
echo('hello world!')

Upload to GitHub and Docker Hub

After we have played with the module and ensured it works as we would hope, we can upload it to our GitHub and Docker accounts so that others may download it.

To upload to GitHub we must first create a version of the repository online. Visit your GitHub account and then create a new repository by clicking Repositories > New. Ensure to name the online version with the same name as your module, i.e. om..echo. (No initial steps are required to upload to Docker-Hub.)

# based on the module details, the function will determine which code-sharing
# service to upload to
module_upload(flpth = module_path, code_sharing = TRUE, dockerhub = TRUE)

Delete it all

Don't want om..echo on your computer? Delete it so ...

# to delete the Docker image and uninstall the R package
module_uninstall(repo = 'dombennett/om..echo')
## Removing 'om..echo'
## Removing package from '/Library/Frameworks/R.framework/Versions/3.6/Resources/library'
## (as 'lib' is unspecified)
# to delete the repo folder
unlink(x = module_path, recursive = TRUE, force = TRUE)

Next-up: Intermediate



AntonelliLab/outsider.devtools documentation built on June 20, 2022, 4:36 a.m.