set_prediction_data: Set prediction data for a 'GPModel'
In gpboost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

set_prediction_data

R Documentation

Set prediction data for a `GPModel`

Description

Set the data required for making predictions with a GPModel

Usage

set_prediction_data(gp_model, vecchia_pred_type = NULL,
  num_neighbors_pred = NULL, cg_delta_conv_pred = NULL,
  nsim_var_pred = NULL, rank_pred_approx_matrix_lanczos = NULL,
  group_data_pred = NULL, group_rand_coef_data_pred = NULL,
  gp_coords_pred = NULL, gp_rand_coef_data_pred = NULL,
  cluster_ids_pred = NULL, X_pred = NULL)

Arguments

`gp_model`	A `GPModel`
`vecchia_pred_type`	A `string` specifying the type of Vecchia approximation used for making predictions. Default value if vecchia_pred_type = NULL: "order_obs_first_cond_obs_only". Available options: "order_obs_first_cond_obs_only": Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are only observed training data points "order_obs_first_cond_all": Vecchia approximation for the observable process and observed training data is ordered first and the neighbors are selected among all points (training + prediction) "latent_order_obs_first_cond_obs_only": Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points "latent_order_obs_first_cond_all": Vecchia approximation for the latent process and observed data is ordered first and neighbors are selected among all points "order_pred_first": Vecchia approximation for the observable process and prediction data is ordered first for making predictions. This option is only available for Gaussian likelihoods
`num_neighbors_pred`	an `integer` specifying the number of neighbors for the Vecchia approximation for making predictions. Default value if NULL: num_neighbors_pred = 2 * num_neighbors
`cg_delta_conv_pred`	a `numeric` specifying the tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithms when being used for prediction Default value if NULL: 1e-3
`nsim_var_pred`	an `integer` specifying the number of samples when simulation is used for calculating predictive variances Internal default values if NULL: 500 for grouped random effects 1000 for gp_approx = "vecchia" and gp_approx = "full_scale_tapering" 100 for gp_approx = "full_scale_vecchia"
`rank_pred_approx_matrix_lanczos`	an `integer` specifying the rank of the matrix for approximating predictive covariances obtained using the Lanczos algorithm Default value if NULL: 1000
`group_data_pred`	A `vector` or `matrix` with elements being group levels for which predictions are made (if there are grouped random effects in the `GPModel`)
`group_rand_coef_data_pred`	A `vector` or `matrix` with covariate data for grouped random coefficients (if there are some in the `GPModel`)
`gp_coords_pred`	A `matrix` with prediction coordinates (=features) for Gaussian process (if there is a GP in the `GPModel`)
`gp_rand_coef_data_pred`	A `vector` or `matrix` with covariate data for Gaussian process random coefficients (if there are some in the `GPModel`)
`cluster_ids_pred`	A `vector` with elements indicating the realizations of random effects / Gaussian processes for which predictions are made (set to NULL if you have not specified this when creating the `GPModel`)
`X_pred`	A `matrix` with prediction covariate data for the fixed effects linear regression term (if there is one in the `GPModel`)

Author(s)

Fabio Sigrist

Examples


data(GPBoost_data, package = "gpboost")
set.seed(1)
train_ind <- sample.int(length(y),size=250)
gp_model <- GPModel(group_data = group_data[train_ind,1], likelihood="gaussian")
set_prediction_data(gp_model, group_data_pred = group_data[-train_ind,1])

gpboost documentation built on June 8, 2025, 1:23 p.m.