In mlr-org/mlr3torch: Deep Learning with 'mlr3'

library(ggplot2)
theme_set(theme_bw())
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

This vignette introduces the lazy_tensor class, which is a vector type that can be used to lazily represent torch tensors of arbitary dimensions. Among other things, it allows mlr3torch to work with images, which we will illustrate using the predefined MNIST task, which has one feature image of class "lazy_tensor". The images display the digits 0, ..., 9, and the goal is to classify them correctly.

library(mlr3torch)

mnist = tsk("mnist")
mnist

The name "lazy_tensor" stems from the fact, that the tensors are not necessarily stored in memory, as this is often impossible when working with large image datasets.
Therefore, we can easily access the data without any expensive data-loading. We see that the data contains one column label, which is the target variable, and an image which is the input feature.

mnist$head()

If we wanted to obtain the actual tensors representing the images, we can do so by calling materialize(), which will return a list of torch_tensors, not necessarily all with the same shape. Here, we only show a slice of the tensor for readability.

lt = mnist$data(cols = "image")[[1L]]
materialize(lt[1])[[1]][1, 12:16, 12:16]

If all elements have the same shape as is the case here, we could also obtain a single torch_tensor by specifying rbind = TRUE.

In order to train a Learner on a Task containing lazy_tensor columns it must support the lazy_tensor feature type, as is the case for the multi layer perceptron, which works both with numeric types, as well as the lazy_tensor.

mlp = lrn("classif.mlp",
  neurons = c(100, 100),
  epochs = 10, batch_size = 32
)
mlp

However, because lazy_tensors also have a specific shape, we also must ensure that the shape of the lazy_tensor matches the expected input shape of the learner. The multi layer perceptron expects a 2d-tensor where the first dimension is the batch dimension. But above we have seen that this is not the case for MNIST, where each element has shape (1, 28, 28). Therefore, we need to flatten the lazy_tensor, which we here do using po("trafo_reshape"):

reshaper = po("trafo_reshape", shape = c(-1, 28 * 28))
mnist_flat = reshaper$train(list(mnist))[[1L]]
mnist_flat$head()

Note that this does not actually reshape all the tensors in-memory, this will again only happen once materialize() is called.

We can now proceed to train the a simple multi-layer perceptron on the flattened mnist task:

mlp = lrn("classif.mlp",
  neurons = c(100, 100),
  epochs = 10, batch_size = 32
)
mlp$train(mnist_flat)

# ensure that code above is working
mlp$param_set$values$epochs = 1L
mlp$train(mnist_flat$clone()$filter(1:10))

Creating a Lazy Tensor

Every lazy_tensor is built on top of a torch::dataset, so we here assume that you are familiar with it. For more information on how to create torch::datasets, we recommend reading the torch package documentation. The only additional restriction that we impose on the dataset is that it must have a .getitem or .getbatch method that returns a list of named tensors.

As an example, we will create a lazy_tensor of length 1000, whose elements are drawn from a uniform distribution over $[0, 1]$. While the data is stored in-memory in this example, this is not necessary and the $.getitem() method can e.g. load images from disk.

mydata = dataset(
  initialize = function() {
    self$x = runif(1000, -1, 1)
  },
  .getbatch = function(i) list(x = torch_tensor(self$x[i])$unsqueeze(2)),
  .length = function() 1000
)()

In order to create a lazy_tensor from mydata, we have to annotate the returned shapes of the dataset by passing a named list to dataset_shapes. The first dimension must be NA, as it is the batch dimension. We can also set a shape to NULL to indicate that it is unknown, i.e. it varies between elements.

lt = as_lazy_tensor(mydata, dataset_shapes = list(x = c(NA, 1)))
lt[1:5]

Note that in this case, because we implemented the .getbatch method, we could have even omitted specifying the dataset_shapes as they could have been auto-inferred.

We can convert this vector to a torch_tensor just like before:

materialize(lt[1:5], rbind = TRUE)

Because we added no preprocessing, this is the same as calling the $.getbatch() method on mydata and selecting the element x.

torch_equal(
  materialize(lt[1], rbind = TRUE),
  mydata$.getbatch(1)$x
)

We continue with creating an example task from lt, where the relationship between the x and y variable is polynomial. Note that the target variable, both for classification and regression, cannot be a lazy_tensor, but must be a factor or numeric respectively.

library(data.table)
x = mydata$x
y = 0.2 + 0.1 * x - 0.1 * x^2 - 0.3 * x^3 + 0.5 * x^4 + 0.5 * x^7 + 0.6 * x^11 + rnorm(length(mydata)) * 0.1
dt = data.table(y = y, x = lt)
task_poly = as_task_regr(dt, target = "y", id = "poly")
task_poly

Below, we plot the data:

library(ggplot2)
ggplot(data = data.frame(x = x, y = y)) +
  geom_point(aes(x = x, y = y), alpha = 0.5)

In the next section, we will create a custom PipeOp to fit a polynomial regression model.

Custom Preprocessing

In order to create a custom preprocessing operator for a lazy tensor, we have to create a new PipeOp class. To make this as convenient as possible, mlr3torch offters a pipeop_preproc_torch() function that we recommend using for this purpose. Its most important arguments are:

id - Used as the default identifier of the PipeOp
fn - The preprocessing function. By default, the first argument is assumed to be the torch_tensor and the remaining arguments will be part of the PipeOp's parameter set.
shapes_out - A function that returns the shapes of the output tensors given the input shapes. This can also be set to NULL for an unknown shape or to "infer" for auto-inference, see ?pipeop_preproc_torch for more information.

Below, we create a PipeOp, that transforms a vector x into a matrix $(x^{d_1} ..., x^{d_n})$, where $d$ is the degrees parameter of the PipeOp.

PipeOpPreprocTorchPoly = pipeop_preproc_torch("poly",
  fn = function(x, degrees) {
    torch_cat(lapply(degrees, function(d) torch_pow(x, d)), dim = 2L)
  },
  shapes_out = "infer"
)

We can now create a new instance of this PipeOp by calling $new(), and we set the parameter degrees to those degrees that were used when simulating the data above. Further, we set the parameter stages, that is always available, to "both", which means that the preprocessing is applied during training and prediction. For data augmentation this can be set to "train".

po_poly = PipeOpPreprocTorchPoly$new()

po_poly$param_set$set_values(
  degrees = c(0, 1, 2, 3, 4, 7, 11),
  stages = "both"
)

To create our polynomial regression learner, we combine the polynomial preprocessor with a lrn("regr.mlp") with no hidden layer (i.e. a linear model) and train the learner on the task.

lrn_poly = as_learner(
  po_poly %>>% lrn("regr.mlp", batch_size = 256, epochs = 100,
  neurons = integer(0))
)

lrn_poly$train(task_poly)
pred = lrn_poly$predict(task_poly)

Below, we visualize the predictions and see that the model captured the non-linear relationship reasonably:

dt = melt(data.table(
  truth = pred$truth,
  response = pred$response,
  x = x),
  id.vars = "x", measure.vars = c("truth", "response")
)
dt$variable = factor(dt$variable, levels = c("truth", "response"))

ggplot(data = dt) +
  geom_point(aes(x = x, y = value, color = variable))

In the next section, we will briefly cover the implementation details of the lazy_tensor, which is not necessary to work with the data-type, so feel free to skip this part.

Digging Into Internals

Internally, the lazy_tensor vector uses the DataDescriptor class to represent the (possibly) preprocessed data. It is very similar to the ModelDescriptor class that is used to build up neural nerworks using PipeOpTorch objects. The DataDescriptor stores a torch::dataset, an mlr3pipelines::Graph and some metadata.

desc = DataDescriptor$new(
  dataset = mydata,
  dataset_shapes = list(x = c(NA, 1))
)

Per default, the preprocessing graph contains only a single PipOpNop that does nothing.

desc

The printed output of the data descriptor informs us about:

The number of PipeOps contained in the preprocessing graph
The output shapes of the dataset
The input map, i.e. how the data is passed to the preprocessing graph, which is important when there are multiple inputs
The pointer, which points to a specific channel of an output PipeOp. The output of this channel is the tensor represented by the DataDescriptor. Note that the id from the input po("nop") is randomly generated, which is needed to prevent id clashes then there are more than one input to the preprocessing graph.
The shape, which is the shape of the tensor at position pointer

A lazy tensor can be constructed from an integer vector and a DataDescriptor. The integer vector specifies which element of the DataDescriptor the lazy_tensor contains. Below, the first two elements of the lazy_tensor vector represent the same element of the DataDescriptor, while the third element represents a different element. Note that all indices refer to the same DataDescriptor.

lt = lazy_tensor(desc, ids = c(1, 1, 2))
materialize(lt, rbind = TRUE)

Internally, the lazy tensor is represented as a list of lists, each element containing an id and a DataDescriptor Currently, there can only be a single DataDescriptor in a lazy_tensor vector.

unclass(lt[[1]])

What happens during materialize(lt[1]) is the following:

# get index and data descriptor
desc = lt[[1]][[2]]
id = lt[[1]][[1]]

# retrieve the batch <id> from the datast
dataset_output = desc$dataset$.getbatch(id)

# batch is reorganized according to the input map
graph_input = dataset_output[desc$input_map]
names(graph_input) = names(desc$graph$input$name)

# the reorganized batch is fed into the preprocessing graph
graph_output = desc$graph$train(graph_input, single_input = FALSE)

# the output pointed to by the pointer is returned
tensor = graph_output[[paste0(desc$pointer, collapse = ".")]]
tensor

Preprocessing a lazy_tensor vector adds new PipeOps to the preprocessing graph and updates the metainformation like the pointer and output shape. To show this, we create a simple example task, using the lt vector as a feature.

taskin = as_task_regr(data.table(x = lt, y = 1:3), target = "y")

taskout = po_poly$train(list(taskin))[[1L]]

lt_out = taskout$data(cols = "x")$x

descout = lt_out[[1]][[2]]

descout

descout$graph

We see that the $graph has a new pipeop with id "poly.x" and the output pointer points to poly.x. Also we see that the shape of the tensor is now c(NA, 7) and not c(NA, 1) as before, which we can verify by calling materialize() again: