knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
In this vignette, we will show how to build neural network architectures as mlr3pipelines::Graphs
s.
We will create a simple CNN for the tiny-imagenet task, which is a subset of well-known Imagenet benchmark.
library(mlr3torch) imagenet = tsk("tiny_imagenet") imagenet
The central ingredients for creating such graphs are PipeOpTorch
operators.
To mark the entry-point of the neural network, we use a PipeOpTorchIngress
, for which three different flavors exist:
po("torch_ingress_num")
for numeric datapo("torch_ingress_categ")
for categorical columnspo("torch_ingress_ltnsr")
for lazy_tensor
sBecause the imagenet task contains only one feature of type lazy_tensor
, we go for the last option:
architecture = po("torch_ingress_ltnsr")
We now define a relatively simple convolutional neural network.
Note that in the code below po("nn_relu_1")
is equivalent to po("nn_relu", id = "nn_linear_1")
.
This is needed, because mlr3pipelines::Graph
s require that each PipeOp
has a unique ID.
What we can further notice is that we don't have to specify the input dimension for the convolutional layers, which are inferred from the task during $train()
ing.
This means that our Learner
can be applied to tasks with different image sizes, each time building up the correct network structure.
architecture = architecture %>>% po("nn_conv2d_1", out_channels = 64, kernel_size = 11, stride = 4, padding = 2) %>>% po("nn_relu_1", inplace = TRUE) %>>% po("nn_max_pool2d_1", kernel_size = 3, stride = 2) %>>% po("nn_conv2d_2", out_channels = 192, kernel_size = 5, padding = 2) %>>% po("nn_relu_2", inplace = TRUE) %>>% po("nn_max_pool2d_2", kernel_size = 3, stride = 2)
We can now continue with specifying the classification part of the network, which is a dense network that repeats a layer twice:
dense_layer = po("nn_dropout") %>>% po("nn_linear", out_features = 4096) %>>% po("nn_relu_6")
In order to repeat a segment from a network multiple times, we can use po("nn_block")
, which we here repeat twice.
Then, we follow with the output head of the network, where we don't have to specify the number of classes, as they can also be inferred from the task
classifier = po("nn_block", dense_layer, n_blocks = 2L) %>>% po("nn_head")
Next, we can combine the convolutional part with the dense head:
architecture = architecture %>>% po("nn_flatten") %>>% classifier
Below, we display the network:
architecture$plot(html = TRUE)
To turn this network architecture into an mlr3::Learner
what is left to do is to configure the loss, optimizer, callbacks, and training arguments, which we do now:
We use the standard cross-entropy loss, SGD as the optimizer and checkpoint our model every 20 epochs.
checkpoint = tempfile() architecture = architecture %>>% po("torch_loss", t_loss("cross_entropy")) %>>% po("torch_optimizer", t_opt("sgd", lr=0.01)) %>>% po("torch_callbacks", t_clbk("checkpoint", freq = 20, path = checkpoint)) %>>% po("torch_model_classif", batch_size = 32, epochs = 100L, device = "cuda") cnn = as_learner(architecture) cnn$id = "cnn"
This created Learner
now exposes all configuration options of the individual PipeOp
s in its $param_set
, from which we show only a subset for readability:
as.data.table(cnn$param_set)[c(32, 34, 42), 1:4]
We can still change them, or if we wanted to, even tune them! Below, we increase the number of blocks and latent dimension of the dense part, as well as change the learning rate of the SGD optimizer.
cnn$param_set$set_values( nn_block.n_blocks = 4L, nn_block.nn_linear.out_features = 4096 * 2, torch_optimizer.lr = 0.2 )
cnn$param_set$set_values( torch_model_classif.epochs = 1L, torch_model_classif.device = "cpu" ) imagenet$filter(1L)
Finally, we train the learner on the task:
# otherwise downloading might show in the code-snippet below imagenet$data()
cnn$train(imagenet)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.