The following guide is a step-by-step instruction manual for developing a DeGAUSS container using tools available in the dht
R package.
Within an empty directory, use dht::use_degauss_container()
to create all of the files needed for a DeGAUSS container. Note that the name of this initially empty directory will be used as the name of the geomarker in the documentation, code, and image repository; it is a good idea to only use lower case letters and underscores (e.g., census_block_group
) to comply with container naming conventions.
Most of the added files will usually not need to be edited:
| | |
|--:|:--|
|Makefile
| see Make
targets |
|test/my_address_file_geocoded.csv
| test input data file |
|LICENSE.md
| GPL license |
|.github/workflows/build-deploy-pr.yaml
| GitHub Actions continuous integration for pull requests |
|.github/workflows/build-deploy-release.yaml
| GitHub Actions continuous integration for releases |
|.dockerignore
| edit to include files other than entrypoint.R in the image |
However, some files must be edited:
| | |
|--:|:--|
|Dockerfile
| degauss_description
environment variable |
|entrypoint.R
| contains R code to be run in the container |
|README.md
| software documentation and example instructions |
Fill out the "Using", "Geomarker Methods", and "Geomarker Data" sections in README.md
, as applicable. Make sure that the version in the example call matches the version of the latest released container.
Within the Dockerfile
, ENV
instructions are used to define environment variables that capture metadata about the DeGAUSS container, including the name (degauss_name
), version (degauss_version
), and a short description (degauss_description
). These environment variables will be available to R code running from inside the container and can be used for dht::greeting()
as well as other {dht} functions that read and write geomarker data. Outside of a DeGAUSS container, these are also used by get_degauss_env_dockerfile()
, get_degauss_env_online()
, and get_degauss_core_lib_env()
.
When creating a new DeGAUSS container, all but degauss_description
are automatically defined and this value needs to be edited from
ENV degauss_description="insert short description here that finishes the sentence 'This container returns ...'"
to something specific and short (ideally less than 50 characters) that finishes the sentence "This container returns ...", like
ENV degauss_description="proximity and length of major roads"
When releasing a new version of the DeGAUSS container in the future, the environment variable degauss_version
can be edited in the Dockerfile
and R code in the container can use it for greetings, writing output files, and other operations that depend on the current version (or name or description). This prevents the need for manually changing the version number in several different locations for each new release.
Edit entrypoint.R
by replacing the example R code with R code that
completes the specific task to be performed by the container.
When ready to build the container, run
renv::init()
to initiate the renv
framework and create renv.lock
. Subsequent builds can update renv.lock
by using renv::snapshot()
.
R packages, especially spatial packages, often depend on external system dependencies. These will need to be installed using RUN
instructions in the Dockerfile. For example, the {sf} package for R requires gdal
and other programs, each of which require different install processes depending on the operating system. Since these R packages will always be running inside of a Docker container running Ubuntu 20.04, we can use remotes::system_requirements()
to get the specific required install instructions for {sf}:
remotes::system_requirements("ubuntu-20.04", package = "sf") # "apt-get install -y libudunits2-dev" "apt-get install -y libssl-dev" # "apt-get install -y libgdal-dev" "apt-get install -y gdal-bin" # "apt-get install -y libgeos-dev" "apt-get install -y libproj-dev"
These system requirements could be translated to RUN
instructions for the Dockerfile to make sure they are available before the R packages that require them are installed:
RUN apt-get update \ && apt-get install -yqq --no-install-recommends \ libudunits2-dev \ libssl-dev \ libgdal-dev \ gdal-bin \ libgeos-dev \ libproj-dev \ && apt-get clean
By default, the container will copy in entrypoint.R
and renv.lock
for use at runtime and ignore anything else in the working directory to automatically speed up build times and keep containers smaller in size. If the container requires any other files (e.g., .rds
datafiles), edit
Dockerfile
and .dockerignore
so that the files are copied to the
container and not ignored by Docker. For example, if we want to use
geomarker_data.rds
, we would make the following changes in Dockerfile
:
COPY entrypoint.R .
COPY geomarker_data.rds # copy .rds file from host to container when building
and in .dockerignore
:
# ignore everything ** # except what we need !/renv.lock !/entrypoint.R !/geomarker_data.rds # make sure the .rds file is not ignored
A test
directory is added with an example geocoded address file (test/my_address_file_geocoded.csv
) and is useful for interactive development and automated testing (see below).
make
for interactive developmentThe Makefile
defines several useful make
targets that can be useful when locally developing and testing:
make build
will build the current DeGAUSS image and name itmake test
will run the container on the included example geocoded CSV filemake shell
will run a DeGAUSS command, but start an interactive shell inside the container for debuggingmake clean
is equivalent to docker system prune -f
, which cleans up any stopped containers or dangling image layersAdd the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.