In goldingn/greta: Simple and Scalable Statistical Modelling in R

knitr::opts_chunk$set(echo = TRUE,
                      eval = greta:::check_tf_version("message"),
                      cache = TRUE,
                      comment = NA,
                      progress = FALSE)
set.seed(1)
library(greta)

This page provides technical implementation details for potential contributors or the curious. You don't need to read this to be able to use greta.

greta arrays, nodes and tensors

There are three layers to how greta defines a model: users manipulate greta arrays, these define nodes, and nodes then define Tensors.

greta arrays

greta arrays are the user-facing representation of the model. greta arrays extend R arrays and have the classes greta_array and array.

x <- ones(3, 3)
class(x)

nodes

The main difference between greta arrays and R arrays is that greta array has a node attribute; an R6 object inheriting from the R6 class 'node', as well as one of the three node types: 'data', 'operation' or 'variable'.

x_node <- attr(x, "node")
class(x_node)

There is a fourth node type: 'distribution', but these are never directly associated with greta arrays.

These R6 node objects are where the magic happens. When created, each node points to its 'parent' nodes - the nodes for the greta arrays that were used to create this one.

# data nodes have no parents
length(x_node$parents)

# operation nodes have one or more parents
z <- x * 3
z_node <- attr(z, "node")
length(z_node$parents)

Each node also has a list of its children, the nodes that have been created from this one.

When model() is called, that inheritance information is used to construct the directed acyclic graph (DAG) that defines the model. The inheritance also preserves intermediate nodes, such as those creates in multi-part operations, but not assigned as greta arrays.

Nodes also have a value member, which is an array for data nodes or an 'unknowns' object for other node types. The unknowns class is a thin wrapper around arrays, which prints the question marks. Generic functions for working on arrays (e.g. dim, length, print) use these node values to return something familiar to the user.

x_node$value()
# R calls this a matrix because it's 2d, but it's an array
class(x_node$value())
z_node$value()
class(z_node$value())

Tensors

In addition to remembering their shape and size and where they are in the DAG, each node has methods to define a corresponding TensorFlow Tensor object in a specified environment. That doesn't happen until the user runs model(), which creates a 'dag_class' object to store the relevant nodes, the environment for the tensors, and methods to talk to the TensorFlow graph.

The node tf() method takes the DAG as an argument, and defines a tensor representing itself in the tensorflow environment, with a name determined by the dag object.

x_node$tf

Because R6 objects are pass-by-reference (rather than pass-by-value), the dag accumulates all of the defined tensors, rather than being re-written each time. Similarly, because nodes are R6 objects and know which are their parents, they can make sure those parents are defined as tensors before they are. The define_tf() member function ensures that that happens, enabling operation nodes to use the tensors for their parents when defining their own tensor.

x_node$define_tf

variables and free states

Hamiltonian Monte Carlo (HMC) requires all of the parameters to be transformed to a continuous scale for sampling. Variable nodes are therefore converted to tensors by first defining a 'free' (unconstrained) version of themselves as a tensor, and then applying a transformation function to convert them back to the scale of interest.

a <- variable(lower = 0)
a_node <- attr(a, "node")
class(a_node)
a_node$tf_from_free

distributions

distribution nodes are node objects just like the others, but they are not directly associated with greta arrays. Instead, greta arrays may have a distribution node in their distribution slot to indicate that their values are assumed to follow that distribution. The distribution node will also be listed as a child node, and likewise the 'target node' will be listed as a child of the distribution. Distribution nodes also have child nodes (data, variables or operations) representing their parameters.

b <- normal(0, 1)
b_node <- attr(b, "node")
class(b_node)
class(b_node$distribution)
# b is the target of its own distribution
class(b_node$distribution$target)

When they define themselves as tensors, distribution nodes define the log density of their target node/tensor given the tensors representing their parameters.

b_node$distribution$tf

If the distribution was truncated, the log density is normalised using the cumulative distribution function.

Joint density

Those log-densities for these distributions are summed on the TensorFlow graph to create a Tensor for the joint log-density of the model. TensorFlow's automatic gradient capabilities are then used to define a Tensor for the gradient of the log-density with respect to each parameter in the model.

The dag R6 object contained within the model then exposes methods to send parameters to the TensorFlow graph and return the joint density and gradient.

model <- model(b)
model$dag$send_parameters
model$dag$log_density
model$dag$gradients

These methods are used to check the validity of initial values, sampling is now done using samplers from tensorflow probability, which require a function mapping from the overall free state to the joint log density. That's created with the generate_log_prob_function method: