shrinkGPR: Gaussian Process Regression with Shrinkage and Normalizing...

View source: R/shrinkGPR.R

shrinkGPRR Documentation

Gaussian Process Regression with Shrinkage and Normalizing Flows

Description

Fits a Gaussian process regression (GPR) model to an N \times 1 response vector Y. The joint distribution is multivariate normal Y \sim \mathcal{N}(0,\, K + \sigma^2 I), where K is the GP kernel matrix with triple-gamma shrinkage priors on the inverse length-scales. Covariates whose length-scales are shrunk to zero have no influence on the covariance structure. An optional linear mean x_\mu^\top \beta can be added via formula_mean. The joint posterior is approximated by normalizing flows trained to maximize the ELBO.

Usage

shrinkGPR(
  formula,
  data,
  a = 0.5,
  c = 0.5,
  formula_mean,
  a_mean = 0.5,
  c_mean = 0.5,
  sigma2_rate = 10,
  kernel_func = kernel_se,
  n_layers = 10,
  n_latent = 10,
  flow_func = sylvester,
  flow_args,
  n_epochs = 1000,
  auto_stop = TRUE,
  cont_model,
  device,
  display_progress = TRUE,
  optim_control
)

Arguments

formula

object of class "formula": a symbolic representation of the model for the covariance equation, as in lm. The response variable and covariates are specified here.

data

optional data frame containing the response variable and the covariates. If not found in data, the variables are taken from environment(formula). No NAs are allowed in the response variable or covariates.

a

positive real number controlling the behavior at the origin of the shrinkage prior for the covariance structure. The default is 0.5.

c

positive real number controlling the tail behavior of the shrinkage prior for the covariance structure. The default is 0.5.

formula_mean

optional formula for the linear mean equation. If provided, the covariates for the mean structure are specified separately from the covariance structure. A response variable is not required in this formula.

a_mean

positive real number controlling the behavior at the origin of the shrinkage for the mean structure. The default is 0.5.

c_mean

positive real number controlling the tail behavior of the shrinkage prior for the mean structure. The default is 0.5.

sigma2_rate

positive real number controlling the prior rate parameter for the residual variance. The default is 10.

kernel_func

function specifying the covariance kernel. The default is kernel_se, a squared exponential kernel. For guidance on how to provide a custom kernel function, see Details.

n_layers

positive integer specifying the number of flow layers in the normalizing flow. The default is 10.

n_latent

positive integer specifying the dimensionality of the latent space for the normalizing flow. The default is 10.

flow_func

function specifying the normalizing flow transformation. The default is sylvester. For guidance on how to provide a custom flow function, see Details.

flow_args

optional named list containing arguments for the flow function. If not provided, default arguments are used. For guidance on how to provide a custom flow function, see Details.

n_epochs

positive integer specifying the number of training epochs. The default is 1000.

auto_stop

logical value indicating whether to enable early stopping based on convergence. The default is TRUE.

cont_model

optional object returned from a previous shrinkGPR call, enabling continuation of training from the saved state.

device

optional device to run the model on, e.g., torch_device("cuda") for GPU or torch_device("cpu") for CPU. Defaults to GPU if available; otherwise, CPU.

display_progress

logical value indicating whether to display progress bars and messages during training. The default is TRUE.

optim_control

optional named list containing optimizer parameters. If not provided, default settings are used.

Details

Model Specification

Given N observations with d-dimensional covariates x_i, the model is

y_i = f(x_i) + \varepsilon_i, \quad \varepsilon_i \overset{iid}{\sim} \mathcal{N}(0, \sigma^2),

f \sim \mathcal{GP}(\mu(\cdot),\; k(\cdot, \cdot\,;\, \theta, \tau)).

The default squared exponential kernel is

k(x, x';\, \theta, \tau) = \frac{1}{\tau} \exp\!\left(-\frac{1}{2} \sum_{j=1}^d \theta_j (x_j - x'_j)^2\right),

where \theta_j \ge 0 are inverse squared length-scales and \tau > 0 is the output scale. Users can specify custom kernels by following the guidelines below, or use one of the other provided kernel functions in kernel_functions. If formula_mean is provided, the GP mean becomes x_{\mu,i}^\top \beta instead of zero.

Priors

The triple-gamma (TG) prior (Cadonna et al. 2020) induces sparsity on inactive length-scales:

\theta_j \mid \tau \sim \mathrm{TG}(a, c, \tau), \quad j = 1, \ldots, d,

\tau \sim F(2c, 2a),

\sigma^2 \sim \mathrm{Exp}(\sigma^2_\mathrm{rate}).

With a mean structure, the regression coefficients receive a normal-gamma-gamma (NGG) prior:

\beta_k \mid \tau_\mu \sim \mathrm{NGG}(a_\mu, c_\mu, \tau_\mu), \quad \tau_\mu \sim F(2c_\mu, 2a_\mu).

Parameters a and c jointly control the spike at zero (a) and the tail decay (c) of the prior.

Inference

The posterior is approximated by a normalizing flow q_\phi trained to maximize the ELBO. The default flow is a Sylvester flow (van den Berg et al. 2018), but users can specify custom flows by following the guidelines below. auto_stop triggers early stopping when the ELBO shows no significant improvement over the last 100 iterations.

Custom Kernel Functions

Users can define custom kernel functions by passing them to the kernel_func argument. A valid kernel function must follow the same structure as kernel_se. The function should:

  1. Accept the following arguments:

    • thetas: A torch_tensor of dimensions n_latent x d, representing latent length-scale parameters.

    • tau: A torch_tensor of length n_latent, representing latent scaling factors.

    • x: A torch_tensor of dimensions N x d, containing the input data points.

    • x_star: Either NULL or a torch_tensor of dimensions N_new x d. If NULL, the kernel is computed for x against itself. Otherwise, it computes the kernel between x and x_star.

  2. Return:

    • If x_star = NULL: a torch_tensor of dimensions n_latent x N x N.

    • If x_star is provided: a torch_tensor of dimensions n_latent x N_new x N.

  3. Requirements:

    • The kernel must produce a valid positive semi-definite covariance matrix.

    • Use torch tensor operations to ensure GPU/CPU compatibility.

See kernel_functions for documented examples.

Custom Flow Functions

Users can define custom flow functions by implementing an nn_module in torch. The module must have a forward method that accepts a tensor z of shape n_latent x D and returns a list with:

  • zk: the transformed samples, shape n_latent x D.

  • log_diag_j: the log-absolute-determinant of the Jacobian, shape n_latent.

See sylvester for a documented example.

Value

A list object of class shrinkGPR, containing:

model

The best-performing trained model.

loss

The best loss value (ELBO) achieved during training.

loss_stor

A numeric vector storing the ELBO values at each training iteration.

last_model

The model state at the final iteration.

optimizer

The optimizer object used during training.

model_internals

Internal objects required for predictions and further training, such as model matrices and formulas.

Author(s)

Peter Knaus peter.knaus@wu.ac.at

Examples


if (torch::torch_is_installed()) {
  # Simulate data
  set.seed(123)
  torch::torch_manual_seed(123)
  n <- 100
  x <- matrix(runif(n * 2), n, 2)
  y <- sin(2 * pi * x[, 1]) + rnorm(n, sd = 0.1)
  data <- data.frame(y = y, x1 = x[, 1], x2 = x[, 2])

  # Fit GPR model
  res <- shrinkGPR(y ~ x1 + x2, data = data)

  # Check convergence
  plot(res$loss_stor, type = "l", main = "Loss Over Iterations")

  # Check posterior
  samps <- gen_posterior_samples(res, nsamp = 1000)
  boxplot(samps$thetas) # Second theta is pulled towards zero

  # Predict
  x1_new <- seq(from = 0, to = 1, length.out = 100)
  x2_new <- runif(100)
  y_new <- predict(res, newdata = data.frame(x1 = x1_new, x2 = x2_new), nsamp = 2000)

  # Plot
  quants <- apply(y_new, 2, quantile, c(0.025, 0.5, 0.975))
  plot(x1_new, quants[2, ], type = "l", ylim = c(-1.5, 1.5),
        xlab = "x1", ylab = "y", lwd = 2)
  polygon(c(x1_new, rev(x1_new)), c(quants[1, ], rev(quants[3, ])),
        col = adjustcolor("skyblue", alpha.f = 0.5), border = NA)
  points(x[,1], y)
  curve(sin(2 * pi * x), add = TRUE, col = "forestgreen", lwd = 2, lty = 2)



  # Add mean equation
  res2 <- shrinkGPR(y ~ x1 + x2, formula_mean = ~ x1, data = data)
  }


shrinkGPR documentation built on March 30, 2026, 5:06 p.m.