| shrinkMVTPR | R Documentation |
Fits a multivariate Student-t process regression (MVTPR) model to an N \times M response matrix Y. The joint
distribution is matrix-variate Student-t, Y \sim \mathcal{MT}(\nu,\, 0,\, K + \sigma^2 I,\, \Omega), where K is
the GP kernel matrix with triple-gamma shrinkage priors on the inverse length-scales, \Omega is the M \times M
output covariance, and \nu is the degrees of freedom parameter. Compared to shrinkMVGPR, the heavier tails
provide greater robustness to outliers. The joint posterior is approximated by normalizing flows trained to maximize the ELBO.
shrinkMVTPR(
formula,
data,
a = 0.5,
c = 0.5,
eta = 4,
a_Om = 0.5,
c_Om = 0.5,
sigma2_rate = 10,
nu_alpha = 0.5,
nu_beta = 2,
kernel_func = kernel_se,
n_layers = 10,
n_latent = 10,
flow_func = sylvester,
flow_args,
n_epochs = 1000,
auto_stop = TRUE,
cont_model,
device,
display_progress = TRUE,
optim_control
)
formula |
object of class "formula": a symbolic representation of the model for the covariance equation, as in |
data |
optional data frame containing the response variable and the covariates. If not found in |
a |
positive real number controlling the behavior at the origin of the shrinkage prior for the covariance structure. The default is 0.5. |
c |
positive real number controlling the tail behavior of the shrinkage prior for the covariance structure. The default is 0.5. |
eta |
positive real number controlling the concentration of the LKJ prior on the correlation matrix of the output covariance. Higher values push the prior towards the identity matrix. The default is 4. |
a_Om |
positive real number controlling the behavior at the origin of the shrinkage prior for the output covariance scale parameters. The default is 0.5. |
c_Om |
positive real number controlling the tail behavior of the shrinkage prior for the output covariance scale parameters. The default is 0.5. |
sigma2_rate |
positive real number controlling the prior rate parameter for the residual variance. The default is 10. |
nu_alpha |
positive real number controlling the shape parameter of the gamma prior for the degrees of freedom of the matrix-t process. The default is 0.5. |
nu_beta |
positive real number controlling the rate parameter of the shifted gamma prior for the degrees of freedom of the matrix-t process. The default is 2. |
kernel_func |
function specifying the covariance kernel. The default is |
n_layers |
positive integer specifying the number of flow layers in the normalizing flow. The default is 10. |
n_latent |
positive integer specifying the dimensionality of the latent space for the normalizing flow. The default is 10. |
flow_func |
function specifying the normalizing flow transformation. The default is |
flow_args |
optional named list containing arguments for the flow function. If not provided, default arguments are used. For guidance on how to provide a custom flow function, see Details. |
n_epochs |
positive integer specifying the number of training epochs. The default is 1000. |
auto_stop |
logical value indicating whether to enable early stopping based on convergence. The default is |
cont_model |
optional object returned from a previous |
device |
optional device to run the model on, e.g., |
display_progress |
logical value indicating whether to display progress bars and messages during training. The default is |
optim_control |
optional named list containing optimizer parameters. If not provided, default settings are used. |
Model Specification
Given N observations with d-dimensional covariates and M response variables, the response matrix
Y \in \mathbb{R}^{N \times M} follows a matrix-variate Student-t distribution:
Y \sim \mathcal{MT}_{N,M}(\nu,\; 0,\; K(\theta, \tau) + \sigma^2 I_N,\; \Omega),
which is equivalent to
\mathrm{vec}(Y) \sim t_{NM}\!\left(\nu,\; \mathbf{0},\; \Omega \otimes (K + \sigma^2 I_N)\right).
Here K_{ij} = k(x_i, x_j;\, \theta, \tau) is the kernel matrix and \Omega is the M \times M
between-response covariance. The output covariance is parameterized as \Omega = S D S, where
D is a correlation matrix and S = \mathrm{diag}(s_1, \ldots, s_M) contains the marginal standard deviations.
The product of the diagonal elements of S is constrained to equal 1 to ensure identifiability.
The default squared exponential kernel is
k(x, x';\, \theta, \tau) = \frac{1}{\tau} \exp\!\left(-\frac{1}{2} \sum_{j=1}^d \theta_j (x_j - x'_j)^2\right),
where \theta_j \ge 0 are inverse squared length-scales and \tau > 0 is the output scale.
Users can specify custom kernels by following the guidelines below, or use one of the other provided kernel functions in
kernel_functions.
Priors
\theta_j \mid \tau \sim \mathrm{TG}(a, c, \tau), \quad j = 1, \ldots, d,
\tau \sim F(2c, 2a),
\sigma^2 \sim \mathrm{Exp}(\sigma^2_\mathrm{rate}),
D \sim \mathrm{LKJ}(\eta),
s_m \mid \tau_\Omega \sim \mathrm{TG}(a_\Omega, c_\Omega, \tau_\Omega), \quad m = 1, \ldots, M,
\tau_\Omega \sim F(2c_\Omega, 2a_\Omega),
\nu - 2 \sim \mathrm{Gamma}(\nu_\alpha, \nu_\beta).
The shift by 2 ensures \nu > 2 so that the process covariance is finite.
Inference
The posterior is approximated by a normalizing flow q_\phi trained to maximize the ELBO.
auto_stop triggers early stopping when the ELBO shows no significant improvement over the last 100 iterations.
Custom Kernel Functions
Users can define custom kernel functions by passing them to the kernel_func argument.
A valid kernel function must follow the same structure as kernel_se. The function must:
Accept arguments thetas (n_latent x d), tau (length n_latent),
x (N x d), and optionally x_star (N_new x d).
Return a torch_tensor of dimensions n_latent x N x N (if x_star = NULL)
or n_latent x N_new x N (if x_star is provided).
Produce a valid positive semi-definite covariance matrix using torch tensor operations.
See kernel_functions for documented examples.
Custom Flow Functions
Users can define custom flow functions by implementing an nn_module in torch.
The module must have a forward method that accepts a tensor z of shape n_latent x D
and returns a list with:
zk: the transformed samples, shape n_latent x D.
log_diag_j: log-absolute-determinant of the Jacobian, shape n_latent.
See sylvester for a documented example.
A list object of classes shrinkMVGPR and shrinkMVTPR, containing:
model |
The best-performing trained model. |
loss |
The best loss value (ELBO) achieved during training. |
loss_stor |
A numeric vector storing the ELBO values at each training iteration. |
last_model |
The model state at the final iteration. |
optimizer |
The optimizer object used during training. |
model_internals |
Internal objects required for predictions and further training, such as model matrices and formulas. |
Peter Knaus peter.knaus@wu.ac.at
if (torch::torch_is_installed()) {
# Simulate multivariate data
torch::torch_manual_seed(123)
sim <- simMVGPR(N = 100, M = 2, d = 2)
# Fit MVTPR model
res <- shrinkMVTPR(cbind(y.1, y.2) ~ x.1 + x.2, data = sim$data)
# Check convergence
plot(res$loss_stor, type = "l", main = "Loss Over Iterations")
# Check posterior of length-scale parameters
samps <- gen_posterior_samples(res, nsamp = 1000)
boxplot(samps$thetas)
# Predict at new covariate values
newdata <- data.frame(x.1 = runif(10), x.2 = runif(10))
y_new <- predict(res, newdata = newdata, nsamp = 500)
# y_new is an array of shape nsamp x N_new x M
}
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.