mlr_pipeops_preproc_torch | R Documentation |
This PipeOp
can be used to preprocess (one or more) lazy_tensor
columns contained in an mlr3::Task
.
The preprocessing function is specified as construction argument fn
and additional arguments to this
function can be defined through the PipeOp
's parameter set.
The preprocessing is done per column, i.e. the number of lazy tensor output columns is equal
to the number of lazy tensor input columns.
To create custom preprocessing PipeOp
s you can use pipeop_preproc_torch
.
In addition to specifying the construction arguments, you can overwrite the private .shapes_out()
method.
If you don't overwrite it, the output shapes are assumed to be unknown (NULL
).
.shapes_out(shapes_in, param_vals, task)
(list()
, list(),
Taskor
NULL) ->
list()\cr This private method calculates the output shapes of the lazy tensor columns that are created from applying the preprocessing function with the provided parameter values (
param_vals). The
taskis very rarely needed, but if it is it should be checked that it is not
NULL'.
This private method only has the responsibility to calculate the output shapes for one input column, i.e. the
input shapes_in
can be assumed to have exactly one shape vector for which it must calculate the output shapes
and return it as a list()
of length 1.
It can also be assumed that the shape is not NULL
(i.e. unknown).
Also, the first dimension can be NA
, i.e. is unknown (as for the batch dimension).
See PipeOpTaskPreproc
.
In addition to state elements from PipeOpTaskPreprocSimple
,
the state also contains the $param_vals
that were set during training.
In addition to the parameters inherited from PipeOpTaskPreproc
as well as those specified during construction
as the argument param_set
there are the following parameters:
stages
:: character(1)
The stages during which to apply the preprocessing.
Can be one of "train"
, "predict"
or "both"
.
The initial value of this parameter is set to "train"
when the PipeOp
's id starts with "augment_"
and
to "both"
otherwise.
Note that the preprocessing that is applied during $predict()
uses the parameters that were set during
$train()
and not those that are set when performing the prediction.
During $train()
/ $predict()
, a PipeOpModule
with one input and one output channel is created.
The pipeop applies the function fn
to the input tensor while additionally
passing the parameter values (minus stages
and affect_columns
) to fn
.
The preprocessing graph of the lazy tensor columns is shallowly cloned and the PipeOpModule
is added.
This is done to avoid modifying user input and means that identical PipeOpModule
s can be part of different
preprocessing graphs. This is only possible, because the created PipeOpModule
is stateless.
At a later point in the graph, preprocessing graphs will be merged if possible to avoid unnecessary computation.
This is best illustrated by example:
One lazy tensor column's preprocessing graph is A -> B
.
Then, two branches are created B -> C
and B -> D
, creating two preprocessing graphs
A -> B -> C
and A -> B -> D
. When loading the data, we want to run the preprocessing only once, i.e. we don't
want to run the A -> B
part twice. For this reason, task_dataset()
will try to merge graphs and cache
results from graphs. However, only graphs using the same dataset can currently be merged.
Also, the shapes created during $train()
and $predict()
might differ.
To avoid the creation of graphs where the predict shapes are incompatible with the train shapes,
the hypothetical predict shapes are already calculated during $train()
(this is why the parameters that are set
during train are also used during predict) and the PipeOpTorchModel
will check the train and predict shapes for
compatibility before starting the training.
Otherwise, this mechanism is very similar to the ModelDescriptor
construct.
mlr3pipelines::PipeOp
-> mlr3pipelines::PipeOpTaskPreproc
-> PipeOpTaskPreprocTorch
fn
The preprocessing function.
rowwise
Whether the preprocessing is applied rowwise.
new()
Creates a new instance of this R6
class.
PipeOpTaskPreprocTorch$new( fn, id = "preproc_torch", param_vals = list(), param_set = ps(), packages = character(0), rowwise = FALSE, stages_init = NULL, tags = NULL )
fn
(function
or character(2)
)
The preprocessing function. Must not modify its input in-place.
If it is a character(2)
, the first element should be the namespace and the second element the name.
When the preprocessing function is applied to the tensor, the tensor will be passed by position as the first argument.
If the param_set
is inferred (left as NULL
) it is assumed that the first argument is the torch_tensor
.
id
(character(1)
)
The id for of the new object.
param_vals
(named list()
)
Parameter values to be set after construction.
param_set
(ParamSet
)
In case the function fn
takes additional parameter besides a torch_tensor
they can be
specfied as parameters. None of the parameters can have the "predict"
tag.
All tags should include "train"
.
packages
(character()
)
The packages the preprocessing function depends on.
rowwise
(logical(1)
)
Whether the preprocessing function is applied rowwise (and then concatenated by row) or directly to the whole
tensor. In the first case there is no batch dimension.
stages_init
(character(1)
)
Initial value for the stages
parameter.
tags
(character()
)
Tags for the pipeop.
shapes_out()
Calculates the output shapes that would result in applying the preprocessing to one or more lazy tensor columns with the provided shape. Names are ignored and only order matters. It uses the parameter values that are currently set.
PipeOpTaskPreprocTorch$shapes_out(shapes_in, stage = NULL, task = NULL)
shapes_in
(list()
of (integer()
or NULL
))
The input input shapes of the lazy tensors.
NULL
indicates that the shape is unknown.
First dimension must be NA
(if it is not NULL
).
stage
(character(1)
)
The stage: either "train"
or "predict"
.
task
(Task
or NULL
)
The task, which is very rarely needed.
list()
of (integer()
or NULL
)
clone()
The objects of this class are cloneable with this method.
PipeOpTaskPreprocTorch$clone(deep = FALSE)
deep
Whether to make a deep clone.
# Creating a simple task
d = data.table(
x1 = as_lazy_tensor(rnorm(10)),
x2 = as_lazy_tensor(rnorm(10)),
x3 = as_lazy_tensor(as.double(1:10)),
y = rnorm(10)
)
taskin = as_task_regr(d, target = "y")
# Creating a simple preprocessing pipeop
po_simple = po("preproc_torch",
# get rid of environment baggage
fn = mlr3misc::crate(function(x, a) x + a),
param_set = paradox::ps(a = paradox::p_int(tags = c("train", "required")))
)
po_simple$param_set$set_values(
a = 100,
affect_columns = selector_name(c("x1", "x2")),
stages = "both" # use during train and predict
)
taskout_train = po_simple$train(list(taskin))[[1L]]
materialize(taskout_train$data(cols = c("x1", "x2")), rbind = TRUE)
taskout_predict_noaug = po_simple$predict(list(taskin))[[1L]]
materialize(taskout_predict_noaug$data(cols = c("x1", "x2")), rbind = TRUE)
po_simple$param_set$set_values(
stages = "train"
)
# transformation is not applied
taskout_predict_aug = po_simple$predict(list(taskin))[[1L]]
materialize(taskout_predict_aug$data(cols = c("x1", "x2")), rbind = TRUE)
# Creating a more complex preprocessing PipeOp
PipeOpPreprocTorchPoly = R6::R6Class("PipeOpPreprocTorchPoly",
inherit = PipeOpTaskPreprocTorch,
public = list(
initialize = function(id = "preproc_poly", param_vals = list()) {
param_set = paradox::ps(
n_degree = paradox::p_int(lower = 1L, tags = c("train", "required"))
)
param_set$set_values(
n_degree = 1L
)
fn = mlr3misc::crate(function(x, n_degree) {
torch::torch_cat(
lapply(seq_len(n_degree), function(d) torch::torch_pow(x, d)),
dim = 2L
)
})
super$initialize(
fn = fn,
id = id,
packages = character(0),
param_vals = param_vals,
param_set = param_set,
stages_init = "both"
)
}
),
private = list(
.shapes_out = function(shapes_in, param_vals, task) {
# shapes_in is a list of length 1 containing the shapes
checkmate::assert_true(length(shapes_in[[1L]]) == 2L)
if (shapes_in[[1L]][2L] != 1L) {
stop("Input shape must be (NA, 1)")
}
list(c(NA, param_vals$n_degree))
}
)
)
po_poly = PipeOpPreprocTorchPoly$new(
param_vals = list(n_degree = 3L, affect_columns = selector_name("x3"))
)
po_poly$shapes_out(list(c(NA, 1L)), stage = "train")
taskout = po_poly$train(list(taskin))[[1L]]
materialize(taskout$data(cols = "x3"), rbind = TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.