Graph: Graph Base Class

GraphR Documentation

Graph Base Class

Description

A Graph is a representation of a machine learning pipeline graph. It can be trained, and subsequently used for prediction.

A Graph is most useful when used together with Learner objects encapsulated as PipeOpLearner. In this case, the Graph produces Prediction data during its ⁠$predict()⁠ phase and can be used as a Learner itself (using the GraphLearner wrapper). However, the Graph can also be used without Learner objects to simply perform preprocessing of data, and, in principle, does not even need to handle data at all but can be used for general processes with dependency structure (although the PipeOps for this would need to be written).

Format

R6Class.

Construction

Graph$new()

Internals

A Graph is made up of a list of PipeOps, and a data.table of edges. Both for training and prediction, the Graph performs topological sorting of the PipeOps and executes their respective ⁠$train()⁠ or ⁠$predict()⁠ functions in order, moving the PipeOp results along the edges as input to other PipeOps.

Fields

  • pipeops :: named list of PipeOp
    Contains all PipeOps in the Graph, named by the PipeOp's ⁠$id⁠s.

  • edges :: data.table with columns src_id (character), src_channel (character), dst_id (character), dst_channel (character)
    Table of connections between the PipeOps. A data.table. src_id and dst_id are ⁠$id⁠s of PipeOps that must be present in the ⁠$pipeops⁠ list. src_channel and dst_channel must respectively be ⁠$output⁠ and ⁠$input⁠ channel names of the respective PipeOps.

  • is_trained :: logical(1)
    Is the Graph, i.e. are all of its PipeOps, trained, and can the Graph be used for prediction?

  • lhs :: character
    Ids of the 'left-hand-side' PipeOps that have some unconnected input channels and therefore act as Graph input layer.

  • rhs :: character
    Ids of the 'right-hand-side' PipeOps that have some unconnected output channels and therefore act as Graph output layer.

  • input :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
    Input channels of the Graph. For each channel lists the name, input type during training, input type during prediction, PipeOp ⁠$id⁠ of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.

  • output :: data.table with columns name (character), train (character), predict (character), op.id (character), channel.name (character)
    Output channels of the Graph. For each channel lists the name, output type during training, output type during prediction, PipeOp ⁠$id⁠ of the PipeOp the channel pertains to, and channel name as the PipeOp knows it.

  • packages :: character
    Set of all required packages for the various methods in the Graph, a set union of all required packages of all contained PipeOp objects.

  • state :: named list
    Get / Set the ⁠$state⁠ of each of the members of PipeOp.

  • param_set :: ParamSet
    Parameters and parameter constraints. Parameter values are in ⁠$param_set$values⁠. These are the union of ⁠$param_set⁠s of all PipeOps in the Graph. Parameter names as seen by the Graph have the naming scheme ⁠<PipeOp$id>.<PipeOp original parameter name>⁠. Changing ⁠$param_set$values⁠ also propagates the changes directly to the contained PipeOps and is an alternative to changing a PipeOps ⁠$param_set$values⁠ directly.

  • hash :: character(1)
    Stores a checksum calculated on the Graph configuration, which includes all PipeOp hashes (and therefore their ⁠$param_set$values⁠) and a hash of ⁠$edges⁠.

  • phash :: character(1)
    Stores a checksum calculated on the Graph configuration, which includes all PipeOp hashes except their ⁠$param_set$values⁠, and a hash of ⁠$edges⁠.

  • keep_results :: logical(1)
    Whether to store intermediate results in the PipeOp's ⁠$.result⁠ slot, mostly for debugging purposes. Default FALSE.

  • man :: character(1)
    Identifying string of the help page that shows with help().

Methods

  • ids(sorted = FALSE)
    (logical(1)) -> character
    Get IDs of all PipeOps. This is in order that PipeOps were added if sorted is FALSE, and topologically sorted if sorted is TRUE.

  • add_pipeop(op, clone = TRUE)
    (PipeOp | Learner | Filter | ..., logical(1)) -> self
    Mutates Graph by adding a PipeOp to the Graph. This does not add any edges, so the new PipeOp will not be connected within the Graph at first.
    Instead of supplying a PipeOp directly, an object that can naturally be converted to a PipeOp can also be supplied, e.g. a Learner or a Filter; see as_pipeop(). The argument given as op is cloned if clone is TRUE (default); to access a Graph's PipeOps by-reference, use ⁠$pipeops⁠.
    Note that ⁠$add_pipeop()⁠ is a relatively low-level operation, it is recommended to build graphs using %>>%.

  • add_edge(src_id, dst_id, src_channel = NULL, dst_channel = NULL)
    (character(1), character(1), character(1) | numeric(1) | NULL, character(1) | numeric(1) | NULL) -> self
    Add an edge from PipeOp src_id, and its channel src_channel (identified by its name or number as listed in the PipeOp's ⁠$output⁠), to PipeOp dst_id's channel dst_channel (identified by its name or number as listed in the PipeOp's ⁠$input⁠). If source or destination PipeOp have only one input / output channel and src_channel / dst_channel are therefore unambiguous, they can be omitted (i.e. left as NULL).

  • chain(gs, clone = TRUE)
    (list of Graphs, logical(1)) -> self
    Takes a list of Graphs or PipeOps (or objects that can be automatically converted into Graphs or PipeOps, see as_graph() and as_pipeop()) as inputs and joins them in a serial Graph coming after self, as if connecting them using %>>%.

  • plot(html)
    (logical(1)) -> NULL
    Plot the Graph, using either the igraph package (for html = FALSE, default) or the visNetwork package for html = TRUE producing a htmlWidget. The htmlWidget can be rescaled using visOptions.

  • print(dot = FALSE, dotname = "dot", fontsize = 24L)
    (logical(1), character(1), integer(1)) -> NULL
    Print a representation of the Graph on the console. If dot is FALSE, output is a table with one row for each contained PipeOp and columns ID (⁠$id⁠ of PipeOp), State (short representation of ⁠$state⁠ of PipeOp), sccssors (PipeOps that take their input directly from the PipeOp on this line), and prdcssors (the PipeOps that produce the data that is read as input by the PipeOp on this line). If dot is TRUE, print a DOT representation of the Graph on the console. The DOT output can be named via the argument dotname and the fontsize can also be specified.

  • set_names(old, new)
    (character, character) -> self
    Rename PipeOps: Change ID of each PipeOp as identified by old to the corresponding item in new. This should be used instead of changing a PipeOp's ⁠$id⁠ value directly!

  • update_ids(prefix = "", postfix = "")
    (character, character) -> self
    Pre- or postfix PipeOp's existing ids. Both prefix and postfix default to "", i.e. no changes.

  • train(input, single_input = TRUE)
    (any, logical(1)) -> named list
    Train Graph by traversing the Graphs' edges and calling all the PipeOp's ⁠$train⁠ methods in turn. Return a named list of outputs for each unconnected PipeOp out-channel, named according to the Graph's ⁠$output⁠ name column. During training, the ⁠$state⁠ member of each PipeOps will be set and the ⁠$is_trained⁠ slot of the Graph (and each individual PipeOp) will consequently be set to TRUE.
    If single_input is TRUE, the input value will be sent to each unconnected PipeOp's input channel (as listed in the Graph's ⁠$input⁠). Typically, input should be a Task, although this is dependent on the PipeOps in the Graph. If single_input is FALSE, then input should be a list with the same length as the Graph's ⁠$input⁠ table has rows; each list item will be sent to a corresponding input channel of the Graph. If input is a named list, names must correspond to input channel names (⁠$input$name⁠) and inputs will be sent to the channels by name; otherwise they will be sent to the channels in order in which they are listed in ⁠$input⁠.

  • predict(input, single_input = TRUE)
    (any, logical(1)) -> list of any
    Predict with the Graph by calling all the PipeOp's ⁠$train⁠ methods. Input and output, as well as the function of the single_input argument, are analogous to ⁠$train()⁠.

  • help(help_type)
    (character(1)) -> help file
    Displays the help file of the concrete PipeOp instance. help_type is one of "text", "html", "pdf" and behaves as the help_type argument of R's help().

See Also

Other mlr3pipelines backend related: PipeOp, PipeOpTargetTrafo, PipeOpTaskPreproc, PipeOpTaskPreprocSimple, mlr_graphs, mlr_pipeops, mlr_pipeops_updatetarget

Examples

library("mlr3")

g = Graph$new()$
  add_pipeop(PipeOpScale$new(id = "scale"))$
  add_pipeop(PipeOpPCA$new(id = "pca"))$
  add_edge("scale", "pca")
g$input
g$output

task = tsk("iris")
trained = g$train(task)
trained[[1]]$data()

task$filter(1:10)
predicted = g$predict(task)
predicted[[1]]$data()

mlr3pipelines documentation built on Sept. 30, 2024, 9:37 a.m.