In intermediate
we demonstrated how a complex, command-line program with input and output files
(that we partly developed) can be cast as an outsider
module.
On this page, we will develop a module for a typical command-line program with lots of arguments.
RAxML
For this walkthrough we will create a module for "RAxML" a program
that can generate evolutionary trees using a maximum-likelihood. The exact
functioning (or use even!) of the program is not relevant, but it demonstrates
well the typical program which outsider
aims to bring into the R environment:
lots of arguments, complex algorithm developed in an alternative language.
To demonstrate the program's complexity, here is the command's syntax:
raxmlHPC[-SSE3|-AVX|-PTHREADS|-PTHREADS-SSE3|-PTHREADS-AVX|-HYBRID|-HYBRID-SSE3|HYBRID-AVX] -s sequenceFileName -n outputFileName -m substitutionModel [-a weightFileName] [-A secondaryStructureSubstModel] [-b bootstrapRandomNumberSeed] [-B wcCriterionThreshold] [-c numberOfCategories] [-C] [-d] [-D] [-e likelihoodEpsilon] [-E excludeFileName] [-f a|A|b|B|c|C|d|D|e|E|F|g|G|h|H|i|I|j|J|k|m|n|N|o|p|P|q|r|R|s|S|t|T|u|v|V|w|W|x|y] [-F] [-g groupingFileName] [-G placementThreshold] [-h] [-H] [-i initialRearrangementSetting] [-I autoFC|autoMR|autoMRE|autoMRE_IGN] [-j] [-J MR|MR_DROP|MRE|STRICT|STRICT_DROP|T_<PERCENT>] [-k] [-K] [-L MR|MRE|T_<PERCENT>] [-M] [-o outGroupName1[,outGroupName2[,...]]][-O] [-p parsimonyRandomSeed] [-P proteinModel] [-q multipleModelFileName] [-r binaryConstraintTree] [-R binaryModelParamFile] [-S secondaryStructureFile] [-t userStartingTree] [-T numberOfThreads] [-u] [-U] [-v] [-V] [-w outputDirectory] [-W slidingWindowSize] [-x rapidBootstrapRandomNumberSeed] [-X] [-y] [-Y quartetGroupingFileName|ancestralSequenceCandidatesFileName] [-z multipleTreesFile] [-#|-N numberOfRuns|autoFC|autoMR|autoMRE|autoMRE_IGN] [--mesquite][--silent][--no-seq-check][--no-bfgs] [--asc-corr=stamatakis|felsenstein|lewis] [--flag-check][--auto-prot=ml|bic|aic|aicc] [--epa-keep-placements=number][--epa-accumulated-threshold=threshold] [--epa-prob-threshold=threshold] [--JC69][--K80][--HKY85] [--bootstop-perms=number] [--quartets-without-replacement] [---without-replacement] [--print-identical-sequences]
How are we going to code a module that is able to account for all these arguments?
As before, let's just jump right in with the module skeleton.
library(outsider.devtools) flpth <- module_skeleton(repo_user = 'dombennett', program_name = 'raxml', docker_user = 'dombennett', flpth = tempdir(), service = 'github') # folder name where module is stored print(basename(flpth))
## [1] "om..raxml"
Next, the Dockerfile. This is a much more complicated installation process.
First we require the apt programs wget
, make
and gcc
to download the
program and compile it. Then we need to download and compile the program.
The majority of the Dockerfile contents can be found in the installation
instructions of the
RAxML GitHub site.
dockerfile_text <- " FROM ubuntu:latest RUN apt-get update && apt-get install -y \ wget make gcc # 8.2 for this demonstration RUN wget https://github.com/stamatak/standard-RAxML/archive/v8.2.12.tar.gz && \ tar zxvf v8.2.12.tar.gz && \ rm v8.2.12.tar.gz && \ mv standard-RAxML-8.2.12 raxml RUN cd /raxml && make -f Makefile.SSE3.PTHREADS.gcc && \ rm *.o && cp raxmlHPC-PTHREADS-SSE3 /usr/bin/. RUN mkdir /working_dir WORKDIR /working_dir " # write to latest Dockerfile cat(dockerfile_text, file = file.path(flpth, 'inst', 'dockerfiles', 'latest', 'Dockerfile'))
Dockerfile defined. What should our R function be?
#' @name raxml #' @title raxml #' @description Run raxml #' @param arglist Arguments to raxml provided as a character vector #' @param outdir Filepath to where all output files should be returned. #' @example examples/example.R #' @export raxml <- function(arglist = arglist_get(...), outdir = getwd()) { files_to_send <- filestosend_get(arglist) arglist <- arglist_parse(arglist) otsdr <- outsider_init(pkgnm = 'om..raxml', cmd = 'raxmlHPC-PTHREADS-SSE3', wd = outdir, files_to_send = files_to_send, arglist = arglist) run(otsdr) }
In this function, input and output files are intermixed in the arguments passed
to RAxML. To determine what is a file that should be sent to the container from
a normal argument, we have filestosend_get
. This function goes through all the
provided arguments to check for any files (it tests whether they are valid file
paths) and returns them as a character vector to files_to_send
. Then we have
arglist_parse
, this converts all given arguments to their basename so that
absolute filepaths are dropped. All input files will be passed to the containers
working_dir/
and therefore no parental file path are required. This function
has additional functions for dropping irrelevant arguments in the context of an
outsider
module (e.g. no need to specify an output directory at the
command-line, this would need to be done at the initiation of the outsider
object.)
What's with this
outdir
-thingy? The RAxML function returns a series of files. To ensure they are all returned in a convenient location, we have created an additional argument to theraxml()
function calledoutdir
. It can be good practice to break-up information that needs to be processed by the module function into things likearglist
andoutdir
. In some instances, you may be calling a program where additional information can be provided outside of remit of the main program being emulated. For example in our last example we hadinput_file
andoutput_file
. Other examples include, number of threads, memory allocation for the program, etc.
OK! So that's what the function looks like. Let's write it to file.
function_text <- " #' @name raxml #' @title raxml #' @description Run raxml #' @param arglist Arguments to raxml provided as a character vector #' @param outdir Filepath to where all output files should be returned. #' @example examples/example.R #' @export raxml <- function(arglist = arglist_get(...), outdir = getwd()) { files_to_send <- filestosend_get(arglist) arglist <- arglist_parse(arglist) otsdr <- outsider_init(pkgnm = 'om..raxml', cmd = 'raxmlHPC-PTHREADS-SSE3', wd = outdir, files_to_send = files_to_send, arglist = arglist) run(otsdr) } " # write to R/functions.R cat(function_text, file = file.path(flpth, 'R', 'functions.R'))
Let's check the setup.
Ok, let's build, test and upload the module! (The test should pass because by
default, -h
is used in the example and this is a valid argument for RAxML.)
module_build(flpth = flpth)
## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::document() ... ## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Updating om..raxml documentation
## First time using roxygen2. Upgrading automatically...
## Updating roxygen version in /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION
## Writing NAMESPACE
## Loading om..raxml
## Writing NAMESPACE ## Writing raxml.Rd ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::install() ... ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION’ ... ✓ checking for file ‘/private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml/DESCRIPTION’ ## ─ preparing ‘om..raxml’: ## checking DESCRIPTION meta-information ... ✓ checking DESCRIPTION meta-information ## ─ checking for LF line-endings in source and make files and shell scripts ## ─ checking for empty or unneeded directories ## ─ building ‘om..raxml_0.0.1.tar.gz’ ## ## Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \ ## /var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T//RtmpquGDgt/om..raxml_0.0.1.tar.gz --install-tests ## * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’ ## - * installing *source* package ‘om..raxml’ ... ## | ** using staged installation ## - ** R ## | ** inst ## - ** byte-compile and prepare package for lazy loading ## | - ** help ## | *** installing help indices ## - ** building package indices ## | ** testing if installed package can be loaded from temporary location ## - ** testing if installed package can be loaded from final location ## | ** testing if installed package keeps a record of temporary installation path ## - * DONE (om..raxml) ## | - ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running docker_build() ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Command: ## docker build -t dombennett/om_raxml:latest /Library/Frameworks/R.framework/Versions/3.6/Resources/library/om..raxml/dockerfiles/latest ## ..................................................................................................................... ## ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────── ## Running devtools::build_readme() ... ## ─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
## Building om..raxml readme
module_test(flpth = flpth)
## Running /Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \ ## /private/var/folders/x9/m8kwpxps2v93xk52zqzm5lkh0000gp/T/RtmpquGDgt/om..raxml --install-tests ## * installing to library ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library’ ## - * installing *source* package ‘om..raxml’ ... ## | ** using staged installation ## - ** R ## | ** inst ## - ** byte-compile and prepare package for lazy loading ## | - ** help ## | *** installing help indices ## - ** building package indices ## | ** testing if installed package can be loaded from temporary location ## - ** testing if installed package can be loaded from final location ## | ** testing if installed package keeps a record of temporary installation path ## - * DONE (om..raxml) ## | -
## No Docker image found for 'om..raxml' -- attempting to pull/build image with tag 'latest'
## Removing package from '/Library/Frameworks/R.framework/Versions/3.6/Resources/library' ## (as 'lib' is unspecified)
## The gloating goat got away! The module is not working.... ## But keep on forming! You're doing correctly!
And then upload ....
module_upload(flpth = flpth, code_sharing = TRUE, dockerhub = TRUE)
OK. So you have a module that works and passes its tests. How do we inform the world that it works? The answer: Travis-CI
Travis is a free service that monitors your GitHub for updates and then runs a series of tests on a remote server to determine whether it passes its tests or not. The result of these tests can be added to a repo's README in form of an image badge: e.g. "Tests Passing".
To set-up travis for your repo, visit the website https://travis-ci.org/, sign
in with your GitHub account and then activate the service your repo of choice,
e.g. om..raxml
.
Then we need to provide the test instructions for Travis in the form of a YAML
file called .travis.yml
and upload this to our GitHub repo. The .travis.yml
can be created for outsider
modules so:
module_travis(flpth = flpth)
Then simply update the repo on GitHub!
After you have built a module that passes its tests and is available for download, there are a few extra things to consider:
om.yml
Add a more detailed descriptionREADME.Rmd
. To update the
README.md
for GitHub, you need to "knit" the .Rmd
file by running
devtools::build_readme()
-- this is the function module_build()
runs.outsider::module_search
can find outsider
module repos on GitHub if they have and om.yml
, have their name in the form
om..
and have the phrase outsider-module:
in their description.Delete it all
module_uninstall(repo = 'om..raxml') unlink(x = flpth, recursive = TRUE, force = TRUE)
# run an ubuntu container docker run -t -d --name test ubuntu:latest # iteractively run bash docker exec -it test bash # when finished, list all running containers and stop/rm docker ps -a docker stop test docker rm test
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.