knitr::opts_chunk$set( collapse = TRUE, comment = "#>", eval=TRUE ) library(updraft)
updraft
is an R package designed to simplify building and executing "workflows", or a sequence of modular chunks of code with interdependencies that are configured to run on a set of inputs and produce a set of outputs. This enables you to decompose your code implementing a complex function into focused, modular components ("modules") that you can then execute without worrying about managing the dependencies between each. Additionally, this package enables you to:
Below is an introduction to the concepts underlying updraft
, along with some examples of how it can be leveraged.
The Module
is the basic building block of updraft
. A Workflow
is merely a set of Module
s to be executed. The smaller and more single-purpose these Modules
are, the better. There are two existing implementations, both aligning to the interface in ModuleInterface
.
The PackageFunctionModule
object is for when you merely want to execute a function that exists in some package. For example,
pasteInputs <- PackageFunctionModule$new( name = "pasteInputs" , fun = "paste0" , package = "base" )
To execute this simple module outside of a workflow, we can do so as follows:
pasteInputs$startExecution(list("hello, ", "world!")) # to see the output pasteInputs$getOutput()
The node encapsulates the assigned function and allows you to access its output value, along with metadata related to the calculation.
More typically, you'll want to do more complicated operations than vanilla package function calls. This is accomplished with the CustomFunctionModule
class.
For example, let's say you want to apply a custom function to inputs. You can easily define the function and instantiate a CustomFunctionModule
using it, as shown below:
myFunc <- function(a, b, c) { return(a + 2*(floor(b/c))) } myModule <- CustomFunctionModule$new( name = "myFunc" , fun = myFunc )
As with the PackageFunctionModule
we can merely execute this, but we also are able to access information regarding the state of execution, which is necessary to use these as building blocks for complex workflows. Below are some examples of those methods.
# If NULL, module passes base validations myModule$errorCheck() # Returns the module name myModule$getName() # Allows you to check whether module has its output available myModule$hasCompleted() # Can see which inputs will be used and are required myModule$getInputs() # Start execution and get results myModule$startExecution(list(a = 1, b = 2, c = 0.3)) myModule$getOutput()
In order to use these modules, we need some way to designate how the outputs and inputs should be connected. That way, each module will wait until it has all the information it needs in order to execute its function and create its output. This is done using components aligning to the ConnectionInterface
class.
This is the most straightforward implementation of a connection. Let's say we have two modules:
headModule = CustomFunctionModule$new( name = "headModule" , fun = function(x) { 5*x } ) tailModule = CustomFunctionModule$new( name = "tailModule" , fun = function(a = 1, b = 10) { a + b } )
Let's say we want to run the headModule
then pass the output directly to the tailModule
. We would need some way to map the output from headModule
to which argument of tailModule
it should map to. This is done with a DirectConnection
:
conn <- DirectedConnection$new( name = "conn" , headModule = headModule , tailModule = tailModule , inputArgument = "a" )
If we construct our Modules such that the input argument names are consistent between nodes (i.e. an output with the name "a" should be used for the input with the name "a") we can simplify the process of creating connections. This is done by using the Autowire
function.
NOTE: For this to work, the associated function object of a head module must explicitly return a named list.
startModule = CustomFunctionModule$new( name = "startModule" , fun = function(x) { return(list(a = 5*x, b = x + 1)) } ) aModule = CustomFunctionModule$new( name = "aModule" , fun = function(a) { return(a*10) } ) bModule = CustomFunctionModule$new( name = "bModule" , fun = function(b) { return(b*10) } )
To run the autowire function:
connections <- Autowire( headModules = startModule , tailModules = c(aModule, bModule) ) sapply(connections, function(conn) conn$getInputArgument())
Now that we've introduced Modules and Connections, we have everything needed to construct a workflow!
Currently, all workflows in updraft are DAGWorkflow
(directed acyclic graph). This just a collection of Connections and Modules (as we've introduced above), configured as to decompose a complex computation into a sequence of simpler ones. We'll go into the advantages of this later, but to begin we'll show a concrete example of hoow to construct this.
To begin, we can start with an empty workflow.
workflow <- DAGWorkflow$new( name="myWorkflow" )
Let's add the modules and connections from our previous example to see how they look in a workflow:
#### Modules #### startModule = CustomFunctionModule$new( name = "startModule" , fun = function(x) { return(list(a = 5*x, b = x + 1)) } ) aModule = CustomFunctionModule$new( name = "aModule" , fun = function(a) { return(a*10) } ) bModule = CustomFunctionModule$new( name = "bModule" , fun = function(b) { return(b*10) } ) #### Connections #### connections <- Autowire( headModules = startModule , tailModules = c(aModule, bModule) ) #### Add to workflow #### workflow$addModules(c(startModule, aModule, bModule)) workflow$addConnections(connections) #### Visualize it #### workflow$visualize()
You'll see that the DAG first computes the head module, then routes the outputs to the corresponding tail modules.
To execute a workflow, all we have to do is run:
Execute(workflow, argsContainer = list(x = 1))
By default, this will not make use of parallelism. To do that, you can run:
Execute( workflow , argsContainer = list(x = 1) , mode=PARALLEL_MODE )
Note that this may not speed up computation if there are limited opportunities for parallelization in your workflow (i.e. most of the modules will need to be executed sequentially).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.