knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) library(mlr3torch)
This vignette contains technical details about the inner workings of representing neural networks as mlr3pipelines::Graph
s.
If you are not familiar with mlr3pipelines
, start by reading the related sections from the mlr3 book first.
torch
PrimerWe start by sampling some input tensor: 2 batches with 3 features:
input = torch_randn(2, 3) input
A nn_module
is constructed from a nn_module_generator
.
nn_linear
is one of the simpler generators:
module_1 = nn_linear(in_features = 3, out_features = 4, bias = TRUE)
Applying this module gives a 2-batch of 4 units:
output = module_1(input) output
A neural network with one (4-unit) hidden layer and two outputs needs the following ingredients
activation = nn_sigmoid() module_2 = nn_linear(4, 3, bias = TRUE) softmax = nn_softmax(2)
We can pipe a tensor through the layers as follows.
output = module_1(input) output = activation(output) output = module_2(output) output = softmax(output) output
We will now continue with showing how such a neural network can be represented in mlr3torch
.
In mlr3torch
, nn_module
s are wrapped in a PipeOpModule
.
This has the advantage that the network structure can be represented as an mlr3pipelines::Graph
object where it is made explicit (can be plotted, can be extended or manipulated), compared to e.g. writing a function that pipes input through a series of modules.
A PipeOpModule
can be used to wrap a module directly, but it is usually constructed by a PipeOpTorch
(see later).
It typically has a single input and a single output, although multiple inputs are possible (module is then called with multiple arguments), and multiple outputs are possible when the module-function returns a list.
The input and output channels must be explicitly declared then during construction.
We will now continue to recreate the above network using PipeOpModule
s.
We can wrap the linear module_1
layers like this:
library(mlr3torch) po_module_1 = po("module_1", module = module_1)
Note that po("module_1")
is equivalent to po("module", id = "module_1")
.
This mechanism is convenient to avoid ID clashes in graphs that contain the same PipeOp
multiple times.
We can use the generated PipeOp
in the familiar way:
output = po_module_1$train(list(input))[[1]] output
Note we only use the $train()
, since torch modules do not have anything that maps to the state
(it is filled by an empty list).
The single hidden layer neural network can be constructed as a Graph
, which can then do the training all at once.
po_activation = po("module", id = "activation", activation) po_module_2 = po("module_2", module = module_2) po_softmax = po("module", id = "softmax", module = softmax) module_graph = po_module_1 %>>% po_activation %>>% po_module_2 %>>% po_softmax module_graph$plot(html = TRUE)
We can now use the graph's $train()
method to pipe a tensor through the whole Graph
.
output = module_graph$train(input)[[1]] output
While this object allows to easily perform a forward pass, it does not inherit from nn_module
, which is useful for various reasons.
Instead of having a class that inherits both from nn_module
and Graph
(which does not work in R6, since multiple inheritance is not available), there is a class that inherits from nn_module
and contains a Graph
member slot through composition.
This class is nn_graph
.
It is constructed with a Graph
, as well as information about the shape(s) of the torch_tensor
(s) it expects as inputs.
Shape info is communicated as an integer-valued numeric
vector; dimensions that are arbitrary, e.g. batch-size, is given as NA
.
Our network expects an input of shape c(NA, 3)
, since the first layer was created as nn_linear(in_features = 3, ...)
.
If the Graph
has multiple outputs, it is also possible to select a subset of outputs to use, or change the output order, by giving the output_map
argument.
# the name of the single input is: module_graph$input graph_module = nn_graph( module_graph, shapes_in = list(module_1.input = c(NA, 3)) )
This module gives us the convenience of torch nn_module
objects, e.g.:
graph_module$children
And it can be used to transform tensors just as any other torch::nn_module
:
graph_module(input)
PipeOpTorch
ModelDescriptor
The PipeOpModule
represents an nn_module
that is fixed for a specific tensor shape and which has no hyperparameters.
When constructing a neural network using these operators, one has to take care to have the output shape of operations match the input shapes of the following operations.
A complete Graph
of matching PipeOpModule
s can be constructed using operators that mostly inherit from PipeOpTorch
, making use of the ModelDescriptor
class.
The ModelDescriptor
class contains a Graph
of (mostly) PipeOpModule
s and some other information.
The PipeOpTorch
transforms a ModelDescriptor
and adds more PipeOpModule
s to the Graph
.
ModelDescriptor
s always build up a Graph
for a specific Task
. The easiest way to initialize a proper ModelDescriptor
is to use the appropriate PipeOpTorchIngress
for a given datatype.
Below we use PipeOpTorchIngressNumeric
, which is is is used for numeric data.
task = tsk("iris")$select(colnames(iris)[1:3]) po_torch_in = po("torch_ingress_num") md = po_torch_in$train(list(task))[[1]] md
The ModelDescriptor
is an S3 object that contains a Graph
, information about how to generate data ($ingress
and $task
), some further tags about how to build a model that are unrelated to architecture ($optimizer
, $loss
and $callbacks
) as well as all further information necessary to extend that graph along a given output ($pointer
and $pointer_shape
).
unclass(md)
The $pointer
identifies the output of the $graph
that PipeOpTorch
will extend.
Piping this ModelDescriptor
through PipeOpTorchLinear
, for example, adds a PipeOpModule
wrapping a torch::nn_linear
.
po_torch_linear = po("nn_linear", out_features = 4) md = po_torch_linear$train(list(md))[[1]] md$graph
The $pointer
is now updated to identify the output of that PipeOpModule
, and the $pointer_shape
shows that the shape has changed to 4 units (was 3 for the input before).
md$pointer md$pointer_shape
The model_descriptor_to_module()
function converts this to an nn_graph
, it is a functional torch::nn_module
.
small_module = model_descriptor_to_module(md, list(md$pointer)) small_module(input)
ModelDescriptor
to get DataThe ModelDescriptor
does not only represent the Graph
from which a nn_module
is created, but also the way in which the Task
is is processed to get input batches.
A torch::dataset
can be created by calling task_dataset()
; both the task
and the feature_ingress_tokens
arguments can be retrieved from the ModelDescriptor
.
The target_batchgetter
needs to be created extra (if necessary), since it depends on the ultimate machine learning model, which we have not looked at so far.
td = task_dataset( task = md$task, feature_ingress_tokens = md$ingress ) td
Use the $.getbatch()
method to get a batch that can be given to the nn_module
.
Note it has an $x
and an $y
slot, the latter of which is not used, to account for possible target batches.
The $x
slot is also a list
, since it should be able to handle NNs with multiple inputs (see below).
batch = td$.getbatch(1:3) batch small_module(batch$x[[1]])
The sequential NN from above can easily be implemented as follows:
graph_generator = po("torch_ingress_num") %>>% po("nn_linear", out_features = 4, id = "linear1") %>>% po("nn_sigmoid") %>>% po("nn_linear", out_features = 3, id = "linear2") %>>% po("nn_softmax", dim = 2)
Note how the second nn_linear
does not need to be informed about the output dimension of the first nn_linear
, since the ModelDescriptor
that is passed along the Graph
edges knows this info (in the $pointer_shape
slot).
md_sequential = graph_generator$train(task)[[1]] graph_module = model_descriptor_to_module(md_sequential, list(md_sequential$pointer)) graph_module(input)
One of the main features of mlr3pipelines
is its ability to easily represent computational Graph
s.
The ModelDescriptor
/ PipeOpTorch
setup is built to make full use of this functionality.
It is possible to have multiple inputs into a NN by using multiple PipeOpTorchIngress
inputs, it is possible to have parallel and alternative path branching, and it is possible to have multiple outputs.
Consider the following (a bit nonsensical) network that operates differently on the "Petal"
than on the "Sepal"
features of tsk("iris")
We manually split the task here, further down it is shown that the wholly integrated mlr3pipelines
pipeline can do this automatically.
iris_petal = tsk("iris")$select(c("Petal.Length", "Petal.Width")) iris_sepal = tsk("iris")$select(c("Sepal.Length", "Sepal.Width"))
graph_sepal = po("torch_ingress_num", id = "sepal.in") %>>% po("nn_linear", out_features = 4, id = "linear1") graph_petal = po("torch_ingress_num", id = "petal.in") %>>% po("nn_linear", out_features = 3, id = "linear2") %>>% po("nn_tanh") %>>% po("nn_linear", out_features = 5, id = "linear3") graph_common = ppl("branch", graphs = list( sigmoid = po("nn_sigmoid"), relu = po("nn_relu") )) %>>% gunion(list( po("nn_linear", out_features = 1, id = "lin_out"), po("nn_linear", out_features = 3, id = "cat_out") %>>% po("nn_softmax", dim = 2) )) graph_iris = gunion(list(graph_sepal, graph_petal)) %>>% po("nn_merge_cat") %>>% graph_common graph_iris$plot(html = TRUE)
We can use this to create a neural network for the iris
tasks we created above. We set the $keep_results
debug flag here so we can do some inspection about what is happening:
graph_iris$param_set$values$branch.selection = "relu" graph_iris$keep_results = TRUE iris_mds = graph_iris$train( input = list(sepal.in.input = iris_sepal, petal.in.input = iris_petal), single_input = FALSE ) iris_mds
We make multiple observations here:
We can observe how the ModelDescriptor
grows as it is passed along the edges of graph_iris
. Note that the $graph
slot of that ModelDescriptor
is often updated by-reference, so by the time we inspect intermediate results, they may contain the complete graph. However, see how the $ingress
, $pointer
and $pointer_shape
of the ModelDescriptor
s that take the sepal.in
-path differ from the ones that take the petal.in
-path:
```r
graph_iris$pipeops$linear1$.result[[1]]$ingress graph_iris$pipeops$linear1$.result[[1]]$pointer graph_iris$pipeops$linear1$.result[[1]]$pointer_shape
graph_iris$pipeops$linear3$.result[[1]]$ingress graph_iris$pipeops$linear3$.result[[1]]$pointer graph_iris$pipeops$linear3$.result[[1]]$pointer_shape ```
po("nn_merge_cat")
unites the two ModelDescriptor
s and contains the common ingress. The pointer_shape
now reflects the output of the "cat"-operation: the 2nd dimension is added up:
r
graph_iris$pipeops$nn_merge_cat$.result[[1]]$ingress
graph_iris$pipeops$nn_merge_cat$.result[[1]]$pointer_shape
1. Multiple ModelDescriptor
s were created, since the graph_iris
has multiple outpus.
This makes it possible to create a neural network with multiple outputs.
We need to unite the outputs of graph_iris
using model_descriptor_union()
before we can pass it to model_descriptor_to_module()
.
We need to collect all output_pointers
separately.
The parameter list_output
must be set to TRUE
since the module has multiple outputs.
```r iris_mds_union = model_descriptor_union(iris_mds[[1]], iris_mds[[2]]) output_pointers = list(iris_mds[[1]]$pointer, iris_mds[[2]]$pointer) output_pointers iris_module = model_descriptor_to_module(iris_mds_union, output_pointers, list_output = TRUE) ```
The PipeOpBranch
disappears in the resulting Graph
of PipeOpModule
in the iris_module
.
This is because only the PipeOpTorch
s in the graph_iris
add anything to the ModelDescriptor
s.
The branch is interpeted when graph_iris
runs, and only the nn_relu
path is followed. The iris_module
therefore contains a Graph
that does "relu" activation:
r
iris_module$graph$plot(html = TRUE)
The ModelDescriptor
's $task
slot contains a Task
with all features that are used to create the input data for all NN inputs. It can be given to task_dataset()
, along with the $ingress
, to create a torch
dataset
that creates all batches. As above, any output of graph_iris
can be used:
```r iris_mds_union$task # contains all features
iris_td = task_dataset( task = iris_mds_union$task, feature_ingress_tokens = iris_mds_union$ingress )
batch = iris_td$.getbatch(1:2)
batch
``
1. The resulting module has multiple inputs and multiple outputs. We call it with the first two rows of iris, but set the debug
$keep_resultsflag so we can inspect what is happening in the
nn_module's
$graph`:
```r iris_module$graph$keep_results = TRUE
iris_module(
sepal.in.input = batch$x$sepal.in.input,
petal.in.input = batch$x$petal.in.input
)
``
The first linear layer that takes "Sepal" input (
"linear1") creates a 2x4 tensor (batch size 2, 4 units), while the
"linear3"` layer has 2x5 output:
r
iris_module$graph$pipeops$linear1$.result
iris_module$graph$pipeops$linear3$.result
We observe that the po("nn_merge_cat")
concatenates these, as expected:
r
iris_module$graph$pipeops$nn_merge_cat$.result
We have now seen how NN Graph
s of PipeOpModule
are created and turned into nn_module
s.
Using PipeOpTorch
even creates ModelDescriptor
objects that contain additional info about how batch tensors are extracted from Task
s.
For a complete Learner
, it is still necessary to define the loss-function used for optimization, the optimizer, and optionally some callbacks.
We have already covered their class representations -- TorchLoss
, TorchOptimizer
, TorchCallbacks
, in the Get Started vignette.
Here we use adam as the optimizer, cross-entropy as the loss function, and the history callback.
adam = t_opt("adam", lr = 0.02) adam xe = t_loss("cross_entropy") xe history = t_clbk("history") history
LearnerTorchModel
LearnerTorchModel
represents a supervised model (regression or classification) using torch
NNs.
It needs a nn_module
, as well as a list of TorchIngressToken
that define how batches are created from a Task
.
TorchIngressToken
hard-code the column-names of a Task
that are used for data-input, the Learner
created like this therefore only works for the specific Task
created.
(Generally the full mlr3pipelines
-UI should be used if this is a problem, see below.)
The following uses the sequential NN from above:
lr_sequential = lrn("classif.torch_model", task_type = "classif", network = model_descriptor_to_module(md_sequential, list(md_sequential$pointer)), ingress_tokens = md_sequential$ingress, optimizer = adam, callbacks = history, loss = xe ) lr_sequential
Before training the model, we set some more hyperparameters.
lr_sequential$param_set$set_values( batch_size = 50, epochs = 100, measures_train = msrs(c("classif.logloss", "classif.ce")) ) # This is required to evaluate the logloss during training lr_sequential$predict_type = "prob" lr_sequential$train(md_sequential$task)
The following calls the $predict_newdata
function to plot the response surface along the Sepal.Width = mean(Sepal.Width)
plane, along with the ground-truth values:
library(data.table) library(ggplot2) newdata = cbind(data.table(Sepal.Width = mean(iris$Sepal.Width)), CJ( Sepal.Length = seq(min(iris$Sepal.Length), max(iris$Sepal.Length), length.out = 30), Petal.Length = seq(min(iris$Petal.Length), max(iris$Petal.Length), length.out = 30) )) predictions = lr_sequential$predict_newdata(newdata) plot_predictions = function(predictions) { ggplot(cbind(newdata, Species = predictions$response), aes(x = Sepal.Length, y = Petal.Length, fill = Species)) + geom_tile(alpha = .3) + geom_tile(alpha = .3) + geom_point(data = iris, aes(x = Sepal.Length, y = Petal.Length, fill = Species), color = "black", pch = 21, size = 3) + theme_bw() } plot_predictions(predictions)
The model shown above is constructed using the ModelDescriptor
that is generated from a Graph
of PipeOpTorch
operators.
The ModelDescriptor
furthermore contains the Task
to which it pertains.
This makes it possible to use it to create a NN model that gets trained right away, using PipeOpTorchModelClassif
.
The only missing prerequisite now is to add the desired TorchOptimizer
and TorchLoss
information to the ModelDescriptor
.
ModelDescriptor
Remember that ModelDescriptor
has the $optimizer
, $loss
and $callbacks
slots that are necessary to build a complete Learner
from an NN.
They can be set by corresponding PipeOpTorch
operators.
po("torch_optimizer")
is used to set the $optimizer
slot of a ModelDescriptor
; it takes the desired TorchOptimizer
object on construction and exports its ParamSet
.
po_adam = po("torch_optimizer", optimizer = adam) # hyperparameters are made available and can be changed: po_adam$param_set$values md_sequential = po_adam$train(list(md_sequential))[[1]] md_sequential$optimizer
This works analogously for the loss-function.
po_xe = po("torch_loss", loss = xe) md_sequential = po_xe$train(list(md_sequential))[[1]] md_sequential$loss
And also for callbacks:
po_history = po("torch_callbacks", callbacks = t_clbk("history")) md_sequential = po_history$train(list(md_sequential))[[1]] md_sequential$callbacks
LearnerTorchModel
The ModelDescriptor
can now be given to a po("torch_model_classif")
.
po_model = po("torch_model_classif", batch_size = 50, epochs = 50) po_model$train(list(md_sequential))
po("torch_model_classif")
behaves similarly to a PipeOpLearner
: It returns NULL
during training, and the prediction on $predict()
.
po("torch_model_classif")
behaves similarly to a PipeOpLearner
: It returns NULL
during training, and the prediction on $predict()
.
newtask = TaskClassif$new("newdata", cbind(newdata, Species = factor(NA, levels = levels(iris$Species))), target = "Species") predictions = po_model$predict(list(newtask))[[1]] plot_predictions(predictions)
Remember that md_sequential
was created using a Graph
that the initial Task
was piped through.
If we combine such a Graph
with PipeOpTorchModelClassif
, we get a Graph
that behaves like any other Graph
that ends with a PipeOpLearner
, and can therefore be wrapped as a GraphLearner
.
The following uses one more hidden layer than before:
graph_sequential_full = po("torch_ingress_num") %>>% po("nn_linear", out_features = 4, id = "linear1") %>>% po("nn_sigmoid") %>>% po("nn_linear", out_features = 3, id = "linear2") %>>% po("nn_softmax", dim = 2, id = "softmax") %>>% po("nn_linear", out_features = 3, id = "linear3") %>>% po("nn_softmax", dim = 2, id = "softmax2") %>>% po("torch_optimizer", optimizer = adam) %>>% po("torch_loss", loss = xe) %>>% po("torch_callbacks", callbacks = history) %>>% po("torch_model_classif", batch_size = 50, epochs = 100) lr_sequential_full = as_learner(graph_sequential_full) lr_sequential_full$train(task)
Compare the resulting Graph
graph_sequential_full$plot(html = TRUE)
With the Graph
of the trained model:
model = lr_sequential_full$graph_model$state$torch_model_classif$model model$network$graph$plot(html = TRUE)
Predictions, as before (we can use predict_newdata
again):
predictions = lr_sequential_full$predict_newdata(newdata) plot_predictions(predictions)
We are not just limited to PipeOpTorch
in these kinds of Graph
s, and we are also not limited to having only a single PipeOpTorchIngress
. The following pipeline, for example, removes all but the Petal.Length
columns from the Task
and fits a model:
gr = po("select", selector = selector_name("Petal.Length")) %>>% po("torch_ingress_num") %>>% po("nn_linear", out_features = 5, id = "linear1") %>>% po("nn_relu") %>>% po("nn_linear", out_features = 3, id = "linear2") %>>% po("nn_softmax", dim = 2) %>>% po("torch_optimizer", optimizer = adam) %>>% po("torch_loss", loss = xe) %>>% po("torch_model_classif", batch_size = 50, epochs = 50) gr$plot(html = TRUE)
lr = as_learner(gr) lr$train(task) predictions = lr$predict_newdata(newdata) plot_predictions(predictions)
How about using Petal.Length
and Sepal.Length
separately at first?
gr = gunion(list( po("select", selector = selector_name("Petal.Length"), id = "sel1") %>>% po("torch_ingress_num", id = "ingress.petal") %>>% po("nn_linear", out_features = 3, id = "linear1"), po("select", selector = selector_name("Sepal.Length"), id = "sel2") %>>% po("torch_ingress_num", id = "ingress.sepal") %>>% po("nn_linear", out_features = 3, id = "linear2") )) %>>% po("nn_merge_cat") %>>% po("nn_relu", id = "act1") %>>% po("nn_linear", out_features = 3, id = "linear3") %>>% po("nn_softmax", dim = 2, id = "act3") %>>% po("torch_optimizer", optimizer = adam, lr = 0.1) %>>% po("torch_loss", loss = xe) %>>% po("torch_model_classif", batch_size = 50, epochs = 50) gr$plot(html = TRUE)
lr = as_learner(gr) lr$train(task) predictions = lr$predict_newdata(newdata) plot_predictions(predictions)
All these examples have hopefully demonstrated the possibilities that come with the representation of neural network layers as PipeOp
s.
Even though this vignette was quite technical, we hope to have given you an in-depth understanding of the underlying mechanisms.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.