predict.gpb.Booster: Prediction function for 'gpb.Booster' objects
In gpboost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

predict.gpb.Booster

R Documentation

Prediction function for `gpb.Booster` objects

Description

Prediction function for gpb.Booster objects

Usage

## S3 method for class 'gpb.Booster'
predict(object, data, start_iteration = NULL,
  num_iteration = NULL, pred_latent = FALSE, predleaf = FALSE,
  predcontrib = FALSE, header = FALSE, reshape = FALSE,
  group_data_pred = NULL, group_rand_coef_data_pred = NULL,
  gp_coords_pred = NULL, gp_rand_coef_data_pred = NULL,
  cluster_ids_pred = NULL, predict_cov_mat = FALSE, predict_var = FALSE,
  cov_pars = NULL, ignore_gp_model = FALSE, rawscore = NULL,
  vecchia_pred_type = NULL, num_neighbors_pred = NULL, ...)

Arguments

`object`	Object of class `gpb.Booster`
`data`	a `matrix` object, a `dgCMatrix` object or a character representing a filename
`start_iteration`	int or NULL, optional (default=NULL) Start index of the iteration to predict. If NULL or <= 0, starts from the first iteration.
`num_iteration`	int or NULL, optional (default=NULL) Limit number of iterations in the prediction. If NULL, if the best iteration exists and start_iteration is NULL or <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used. If <= 0, all iterations from start_iteration are used (no limits).
`pred_latent`	If TRUE latent variables, both fixed effects (tree-ensemble) and random effects (`gp_model`) are predicted. Otherwise, the response variable (label) is predicted. Depending on how the argument 'pred_latent' is set, different values are returned from this function; see the 'Value' section for more details. If there is no `gp_model`, this argument corresponds to 'raw_score' in LightGBM.
`predleaf`	whether predict leaf index instead.
`predcontrib`	return per-feature contributions for each record.
`header`	only used for prediction for text file. True if text file has header
`reshape`	whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case.
`group_data_pred`	A `vector` or `matrix` with elements being group levels for which predictions are made (if there are grouped random effects in the `GPModel`)
`group_rand_coef_data_pred`	A `vector` or `matrix` with covariate data for grouped random coefficients (if there are some in the `GPModel`)
`gp_coords_pred`	A `matrix` with prediction coordinates (=features) for Gaussian process (if there is a GP in the `GPModel`)
`gp_rand_coef_data_pred`	A `vector` or `matrix` with covariate data for Gaussian process random coefficients (if there are some in the `GPModel`)
`cluster_ids_pred`	A `vector` with elements indicating the realizations of random effects / Gaussian processes for which predictions are made (set to NULL if you have not specified this when creating the `GPModel`)
`predict_cov_mat`	A `boolean`. If TRUE, the (posterior) predictive covariance is calculated in addition to the (posterior) predictive mean
`predict_var`	A `boolean`. If TRUE, the (posterior) predictive variances are calculated
`cov_pars`	A `vector` containing covariance parameters which are used if the `gp_model` has not been trained or if predictions should be made for other parameters than the trained ones
`ignore_gp_model`	A `boolean`. If TRUE, predictions are only made for the tree ensemble part and the `gp_model` is ignored
`rawscore`	This is discontinued. Use the renamed equivalent argument `pred_latent` instead
`vecchia_pred_type`	A `string` specifying the type of Vecchia approximation used for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this
`num_neighbors_pred`	an `integer` specifying the number of neighbors for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this
`...`	Additional named arguments passed to the `predict()` method of the `gpb.Booster` object passed to `object`.

Value

either a list with vectors or a single vector / matrix depending on whether there is a gp_model or not

If there is a gp_model, the result dict contains the following entries.
- 1. If pred_latent is FALSE (=default), the dict contains the following 2 entries:
  - result["response_mean"] are the predictive means of the response variable (Label) taking into account both the fixed effects (tree-ensemble) and the random effects (gp_model)
  - result["response_var"] are the predictive covariances or variances of the response variable (only if 'predict_var' or 'predict_cov' is TRUE)
- 2. If pred_latent is TRUE, the dict contains the following 3 entries:
  - result["fixed_effect"] are the predictions from the tree-ensemble.
  - result["random_effect_mean"] are the predictive means of the gp_model.
  - result["random_effect_cov"] are the predictive covariances or variances of the gp_model (only if 'predict_var' or 'predict_cov' is TRUE).
If there is no gp_model or predcontrib or ignore_gp_model are TRUE, the result contains predictions from the tree-booster only.

Author(s)

Fabio Sigrist, authors of the LightGBM R package

Examples


# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples


library(gpboost)
data(GPBoost_data, package = "gpboost")

#--------------------Combine tree-boosting and grouped random effects model----------------
# Create random effects model
gp_model <- GPModel(group_data = group_data[,1], likelihood = "gaussian")
# The default optimizer for covariance parameters (hyperparameters) is 
# Nesterov-accelerated gradient descent.
# This can be changed to, e.g., Nelder-Mead as follows:
# re_params <- list(optimizer_cov = "nelder_mead")
# gp_model$set_optim_params(params=re_params)
# Use trace = TRUE to monitor convergence:
# re_params <- list(trace = TRUE)
# gp_model$set_optim_params(params=re_params)

# Train model
bst <- gpboost(data = X, label = y, gp_model = gp_model, nrounds = 16,
               learning_rate = 0.05, max_depth = 6, min_data_in_leaf = 5,
               verbose = 0)
# Estimated random effects model
summary(gp_model)

# Make predictions
# Predict latent variables
pred <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
                predict_var = TRUE, pred_latent = TRUE)
pred$random_effect_mean # Predicted latent random effects mean
pred$random_effect_cov # Predicted random effects variances
pred$fixed_effect # Predicted fixed effects from tree ensemble
# Predict response variable
pred_resp <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
                     predict_var = TRUE, pred_latent = FALSE)
pred_resp$response_mean # Predicted response mean
# For Gaussian data: pred$random_effect_mean + pred$fixed_effect = pred_resp$response_mean
pred$random_effect_mean + pred$fixed_effect - pred_resp$response_mean

#--------------------Combine tree-boosting and Gaussian process model----------------
# Create Gaussian process model
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
                    likelihood = "gaussian")
# Train model
bst <- gpboost(data = X, label = y, gp_model = gp_model, nrounds = 8,
               learning_rate = 0.1, max_depth = 6, min_data_in_leaf = 5,
               verbose = 0)
# Estimated random effects model
summary(gp_model)
# Make predictions
pred <- predict(bst, data = X_test, gp_coords_pred = coords_test,
                predict_var = TRUE, pred_latent = TRUE)
pred$random_effect_mean # Predicted latent random effects mean
pred$random_effect_cov # Predicted random effects variances
pred$fixed_effect # Predicted fixed effects from tree ensemble
# Predict response variable
pred_resp <- predict(bst, data = X_test, gp_coords_pred = coords_test,
                     predict_var = TRUE, pred_latent = FALSE)
pred_resp$response_mean # Predicted response mean

gpboost documentation built on June 8, 2025, 1:23 p.m.