predict.gpb.Booster: Predict method for GPBoost model

Description Usage Arguments Value Author(s) Examples

View source: R/gpb.Booster.R

Description

Predicted values based on class gpb.Booster

Usage

1
2
3
4
5
6
7
8
9
## S3 method for class 'gpb.Booster'
predict(object, data, start_iteration = NULL,
  num_iteration = NULL, rawscore = FALSE, predleaf = FALSE,
  predcontrib = FALSE, header = FALSE, reshape = FALSE,
  group_data_pred = NULL, group_rand_coef_data_pred = NULL,
  gp_coords_pred = NULL, gp_rand_coef_data_pred = NULL,
  cluster_ids_pred = NULL, vecchia_pred_type = NULL,
  num_neighbors_pred = -1, predict_cov_mat = FALSE, predict_var = FALSE,
  ignore_gp_model = FALSE, ...)

Arguments

object

Object of class gpb.Booster

data

a matrix object, a dgCMatrix object or a character representing a filename

start_iteration

int or None, optional (default=None) Start index of the iteration to predict. If None or <= 0, starts from the first iteration.

num_iteration

int or None, optional (default=None) Limit number of iterations in the prediction. If None, if the best iteration exists and start_iteration is None or <= 0, the best iteration is used; otherwise, all iterations from start_iteration are used. If <= 0, all iterations from start_iteration are used (no limits).

rawscore

whether the prediction should be returned in the for of original untransformed sum of predictions from boosting iterations' results. E.g., setting rawscore=TRUE for logistic regression would result in predictions for log-odds instead of probabilities.

predleaf

whether predict leaf index instead.

predcontrib

return per-feature contributions for each record.

header

only used for prediction for text file. True if text file has header

reshape

whether to reshape the vector of predictions to a matrix form when there are several prediction outputs per case.

group_data_pred

A vector or matrix with labels of group levels for which predictions are made (if there are grouped random effects in the GPModel)

group_rand_coef_data_pred

A vector or matrix with covariate data for grouped random coefficients (if there are some in the GPModel)

gp_coords_pred

A matrix with prediction coordinates (features) for Gaussian process (if there is a GP in the GPModel)

gp_rand_coef_data_pred

A vector or matrix with covariate data for Gaussian process random coefficients (if there are some in the GPModel)

cluster_ids_pred

A vector with IDs / labels indicating the realizations of random effects / Gaussian processes for which predictions are made (set to NULL if you have not specified this when creating the GPModel)

vecchia_pred_type

A string specifying the type of Vecchia approximation used for making predictions. "order_obs_first_cond_obs_only" = observed data is ordered first and the neighbors are only observed points, "order_obs_first_cond_all" = observed data is ordered first and the neighbors are selected among all points (observed + predicted), "order_pred_first" = predicted data is ordered first for making predictions, "latent_order_obs_first_cond_obs_only" = Vecchia approximation for the latent process and observed data is ordered first and neighbors are only observed points, "latent_order_obs_first_cond_all" = Vecchia approximation for the latent process and observed data is ordered first and neighbors are selected among all points

num_neighbors_pred

an integer specifying the number of neighbors for the Vecchia approximation for making predictions

predict_cov_mat

A boolean. If TRUE, the (posterior / conditional) predictive covariance is calculated in addition to the (posterior / conditional) predictive mean

predict_var

A boolean. If TRUE, the (posterior / conditional) predictive variances are calculated

ignore_gp_model

A boolean. If TRUE, predictions are only made for the tree ensemble part and the gp_model is ignored

...

Additional named arguments passed to the predict() method of the gpb.Booster object passed to object.

Value

For regression or binary classification, it returns a vector of length nrows(data). For multiclass classification, either a num_class * nrows(data) vector or a (nrows(data), num_class) dimension matrix is returned, depending on the reshape value.

When predleaf = TRUE, the output is a matrix object with the number of columns corresponding to the number of trees.

Author(s)

Authors of the LightGBM R package, Fabio Sigrist

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples

library(gpboost)
data(GPBoost_data, package = "gpboost")

#--------------------Combine tree-boosting and grouped random effects model----------------
# Create random effects model
gp_model <- GPModel(group_data = group_data[,1], likelihood = "gaussian")
# The default optimizer for covariance parameters for Gaussian data is Fisher scoring.
# For non-Gaussian data, gradient descent is used.
# Optimizer properties can be changed as follows:
# re_params <- list(optimizer_cov = "gradient_descent", use_nesterov_acc = TRUE)
# gp_model$set_optim_params(params=re_params)
# Use trace = TRUE to monitor convergence:
# re_params <- list(trace = TRUE)
# gp_model$set_optim_params(params=re_params)

# Train model
bst <- gpboost(data = X,
               label = y,
               gp_model = gp_model,
               nrounds = 16,
               learning_rate = 0.05,
               max_depth = 6,
               min_data_in_leaf = 5,
               objective = "regression_l2",
               verbose = 0)
# Estimated random effects model
summary(gp_model)

# Make predictions
pred <- predict(bst, data = X_test, group_data_pred = group_data_test[,1],
                predict_var= TRUE)
pred$random_effect_mean # Predicted mean
pred$random_effect_cov # Predicted variances
pred$fixed_effect # Predicted fixed effect from tree ensemble
# Sum them up to otbain a single prediction
pred$random_effect_mean + pred$fixed_effect


#--------------------Combine tree-boosting and Gaussian process model----------------
# Create Gaussian process model
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
                    likelihood = "gaussian")
# Train model
bst <- gpboost(data = X,
               label = y,
               gp_model = gp_model,
               nrounds = 8,
               learning_rate = 0.1,
               max_depth = 6,
               min_data_in_leaf = 5,
               objective = "regression_l2",
               verbose = 0)
# Estimated random effects model
summary(gp_model)
# Make predictions
pred <- predict(bst, data = X_test, gp_coords_pred = coords_test,
                predict_cov_mat =TRUE)
pred$random_effect_mean # Predicted (posterior) mean of GP
pred$random_effect_cov # Predicted (posterior) covariance matrix of GP
pred$fixed_effect # Predicted fixed effect from tree ensemble
# Sum them up to otbain a single prediction
pred$random_effect_mean + pred$fixed_effect

gpboost documentation built on July 14, 2021, 9:06 a.m.