This notebook shows how to use the rampdata
package in an application.
This package will add several features to the main function, if we
initialize it to use those features.
When you read this, please consider first, not the syntax, but whether this would fit with the flow of how you work. We can make the syntax easier and more automatic if it can first ensure it doesn't preclude how you work.
We put code into packages, and one or more functions in that package
are top-level functions. A researcher or a workflow tool will call
those functions directly. These are main functions, and the rampdata
package helps write them so that they run better on the cluster.
A main function can either be called interactively, at an R command line,
or from the command line as R -e "function()"
, or from the command line
as a script. In the last case, you write a script, such as
gisdemog::create_uganda_cover()
I would save that single line into a file uganda_cover.R
and run it
from the command line with Rscript uganda_cover.R --shapefile uganda.shp
.
A main function function usually separates sanity-checking tasks
and input/output tasks from the work of the function. We try to
write the main function so that it can be called as a script,
meaning called from R --no-save -e "package::runit()"
or so
it can be called as a function from the R prompt.
We can write the function and wrap it to be called as a script
in these two steps, where the second one processes command-line
arguments.
create_uganda_cover <- function(shapefile, landscape) { inputs <- read_input_files(shapefile, landscape) outputs <- calculate_uganda_cover(inputs$shapefile, inputs$lanscape) write_outputs(outputs$cover) } create_uganda_cover_main <- do.call(create_uganda_cover, cover_parse_args(commandArgs()))
The first function is ready to call from the R command line. The second function is ready to call from the Bash command line on the cluster.
Sometimes we write scripts. They aren't different from main functions in any way that's important for the rampdata additions.
args <- do.call(create_uganda_cover, cover_parse_args(commandArgs(trailing = TRUE))) inputs <- read_input_files(args$shapefile, args$landscape) outputs <- calculate_uganda_cover(inputs$shapefile, inputs$lanscape) write_outputs(outputs$cover)
Most of the new work in the main function will be done
while parsing arguments. The one important piece to add
is a tryCatch
around the work of the main function. The goal
is to clear the slate for provenance and ensure we save it at
the end.
create_uganda_cover <- function(shapefile, landscape) { rampdata::clear.provenance() tryCatch({ inputs <- read_input_files(shapefile, landscape) outputs <- calculate_uganda_cover(inputs$shapefile, inputs$lanscape) write_outputs(outputs$cover) }, finally = { rampdata::write.meta.data( rampdata::ramp_prov_path("package", "create_uganda_cover")) } } create_uganda_cover_main <- function() { do.call(create_uganda_cover, cover_parse_args(commandArgs())) }
The provenance recording isn't nested. Clearing it will delete everything this package remembers.
When we parse arguments to the code, we need to initialize
Logging - Tell it whether we are on the cluster or on a local machine. When on the cluster, warnings, errors, and fatal messages will be saved to a separate file. We can also change the logging level for the whole application or for particular packages or R scripts.
Workflow versions - This is a file with lists of versions and the file paths they affect.
The location of the data directory is initialized as needed from a configuration file in the home directory.
.cover.doc <- 'Create Uganda Coverage Usage: create_uganda_coverage <workflow> [-v] [-q] [--local] Options: -v Set logging to debug level. -q Set logging to warn level. --local Working on a local machine, not the cluster. ' cover_parse_args <- function(args = NULL) { if (is.null(args)) { args <- commandArgs(TRUE) } parsed <- docopt::docopt(.cover.doc, args = args, version = "1.0") calling_function <- deparse(sys.call(sys.parent(n = 1))) set_logging_from_args( utils::packageName(), calling_function, parsed$v, parsed$q, parsed$local ) initialize_workflow(args$workflow) }
Each time we read or write a file, we record its provenance. There is an incantation when we read. We can shorten these lines of code if we determine they do what we want.
pop_rp <- workflow_path("admin1pop") file_rp <- add_path(pop_rp, file = "pop2000.csv") dt <- data.table::fread(as.path(file_rp)) prov.input.file(as.path(file_rp), "admin1pop")
When we write, we add a step to save the current provenance
of how the file was made. If we write a file called
output.csv
, then we should write its provenance beside
it as output.toml
.
pop_rp <- workflow_path("admin1pop") file_rp <- add_path(pop_rp, file = "pop2000.csv") dt <- data.table::fwrite(as.path(file_rp)) prov.output.file(as.path(file_rp), "admin1pop") write.meta.data( as.path(add_path(file_rp, file = "pop2000.toml")))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.