fitGPModel: Fits a 'GPModel'
In gpboost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

fitGPModel

R Documentation

Fits a `GPModel`

Description

Estimates the parameters of a GPModel by maximizing the marginal likelihood

Usage

fitGPModel(likelihood = "gaussian", group_data = NULL,
  group_rand_coef_data = NULL, ind_effect_group_rand_coef = NULL,
  drop_intercept_group_rand_effect = NULL, gp_coords = NULL,
  gp_rand_coef_data = NULL, cov_function = "matern", cov_fct_shape = 1.5,
  gp_approx = "none", num_parallel_threads = NULL,
  cov_fct_taper_range = 1, cov_fct_taper_shape = 1, num_neighbors = NULL,
  vecchia_ordering = "random", ind_points_selection = "kmeans++",
  num_ind_points = NULL, cover_tree_radius = 1,
  matrix_inversion_method = "cholesky", seed = 0L, cluster_ids = NULL,
  free_raw_data = FALSE, y, X = NULL, params = list(),
  vecchia_approx = NULL, vecchia_pred_type = NULL,
  num_neighbors_pred = NULL, offset = NULL, fixed_effects = NULL,
  likelihood_additional_param = NULL)

Arguments

`likelihood`	A `string` specifying the likelihood function (distribution) of the response variable. Available options: "gaussian" "bernoulli_probit": binary data with Bernoulli likelihood and a probit link function "bernoulli_logit": binary data with Bernoulli likelihood and a logit link function "gamma": gamma distribution with a with log link function "poisson": Poisson distribution with a with log link function "negative_binomial": negative binomial distribution with a with log link function "t": t-distribution (e.g., for robust regression) "t_fix_df": t-distribution with the degrees-of-freedom (df) held fixed and not estimated. The df can be set via the `likelihood_additional_param` parameter "gaussian_heteroscedastic": Gaussian likelihood where both the mean and the variance are related to fixed and random effects. This is currently only implemented for GPs with a 'vecchia' approximation Note: other likelihoods could be implemented upon request
`group_data`	A `vector` or `matrix` whose columns are categorical grouping variables. The elements being group levels defining grouped random effects. The elements of 'group_data' can be integer, double, or character. The number of columns corresponds to the number of grouped (intercept) random effects
`group_rand_coef_data`	A `vector` or `matrix` with numeric covariate data for grouped random coefficients
`ind_effect_group_rand_coef`	A `vector` with integer indices that indicate the corresponding categorical grouping variable (=columns) in 'group_data' for every covariate in 'group_rand_coef_data'. Counting starts at 1. The length of this index vector must equal the number of covariates in 'group_rand_coef_data'. For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data' have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data', and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient corresponding to the second grouping variable (=second column) in 'group_data'
`drop_intercept_group_rand_effect`	A `vector` of type `logical` (boolean). Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included. Only random effects with random slopes can be dropped.
`gp_coords`	A `matrix` with numeric coordinates (= inputs / features) for defining Gaussian processes
`gp_rand_coef_data`	A `vector` or `matrix` with numeric covariate data for Gaussian process random coefficients
`cov_function`	A `string` specifying the covariance function for the Gaussian process. Available options: "matern": Matern covariance function with the smoothness specified by the `cov_fct_shape` parameter (using the parametrization of Rasmussen and Williams, 2006) "matern_estimate_shape": same as "matern" but the smoothness parameter is also estimated "matern_space_time": Spatio-temporal Matern covariance function with different range parameters for space and time. Note that the first column in `gp_coords` must correspond to the time dimension "matern_ard": anisotropic Matern covariance function with Automatic Relevance Determination (ARD), i.e., with a different range parameter for every coordinate dimension / column of `gp_coords` "matern_ard_estimate_shape": same as "matern_ard" but the smoothness parameter is also estimated "exponential": Exponential covariance function (using the parametrization of Diggle and Ribeiro, 2007) "gaussian": Gaussian, aka squared exponential, covariance function (using the parametrization of Diggle and Ribeiro, 2007) "gaussian_ard": anisotropic Gaussian, aka squared exponential, covariance function with Automatic Relevance Determination (ARD), i.e., with a different range parameter for every coordinate dimension / column of `gp_coords` "powered_exponential": powered exponential covariance function with the exponent specified by the `cov_fct_shape` parameter (using the parametrization of Diggle and Ribeiro, 2007) "wendland": Compactly supported Wendland covariance function (using the parametrization of Bevilacqua et al., 2019, AOS)
`cov_fct_shape`	A `numeric` specifying the shape parameter of the covariance function (e.g., smoothness parameter for Matern and Wendland covariance) This parameter is irrelevant for some covariance functions such as the exponential or Gaussian
`gp_approx`	A `string` specifying the large data approximation for Gaussian processes. Available options: "none": No approximation "vecchia": Vecchia approximation; see Sigrist (2022, JMLR) for more details "full_scale_vecchia": Vecchia-inducing points full-scale (VIF) approximation; see Gyger, Furrer, and Sigrist (2025) for more details "tapering": The covariance function is multiplied by a compactly supported Wendland correlation function "fitc": Fully Independent Training Conditional approximation aka modified predictive process approximation; see Gyger, Furrer, and Sigrist (2024) for more details "full_scale_tapering": Full-scale approximation combining an inducing point / predictive process approximation with tapering on the residual process; see Gyger, Furrer, and Sigrist (2024) for more details "vecchia_latent": similar as "vecchia" but a Vecchia approximation is applied to the latent Gaussian process for likelihood == "gaussian". For likelihood != "gaussian", "vecchia" and "vecchia_latent" are equivalent
`num_parallel_threads`	An `integer` specifying the number of parallel threads for OMP. If num_parallel_threads = NULL, all available threads are used
`cov_fct_taper_range`	A `numeric` specifying the range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)
`cov_fct_taper_shape`	A `numeric` specifying the shape (=smoothness) parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)
`num_neighbors`	An `integer` specifying the number of neighbors for the Vecchia and VIF approximations. Internal default values if NULL: 20 for gp_approx = "vecchia" 30 for gp_approx = "full_scale_vecchia" Note: for prediction, the number of neighbors can be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data' function. By default, num_neighbors_pred = 2 * num_neighbors. Further, the type of Vecchia approximation used for making predictions is set through the 'vecchia_pred_type' parameter in the 'set_prediction_data' function
`vecchia_ordering`	A `string` specifying the ordering used in the Vecchia approximation. Available options: "none": the default ordering in the data is used "random": a random ordering "time": ordering accorrding to time (only for space-time models) "time_random_space": ordering according to time and randomly for all spatial points with the same time points (only for space-time models)
`ind_points_selection`	A `string` specifying the method for choosing inducing points Available options: "kmeans++: the k-means++ algorithm "cover_tree": the cover tree algorithm "random": random selection from data points
`num_ind_points`	An `integer` specifying the number of inducing points / knots for FITC, full_scale_tapering, and VIF approximations. Internal default values if NULL: 500 for gp_approx = "FITC" and gp_approx = "full_scale_tapering" 200 for gp_approx = "full_scale_vecchia"
`cover_tree_radius`	A `numeric` specifying the radius (= "spatial resolution") for the cover tree algorithm
`matrix_inversion_method`	A `string` specifying the method used for inverting covariance matrices. Available options: "cholesky": Cholesky factorization "iterative": iterative methods. A combination of the conjugate gradient, the Lanczos algorithm, and other methods. This is currently only supported for the following cases: grouped random effects with more than one level likelihood != "gaussian" and gp_approx == "vecchia" (non-Gaussian likelihoods with a Vecchia-Laplace approximation) likelihood != "gaussian" and gp_approx == "full_scale_vecchia" (non-Gaussian likelihoods with a VIFapproximation) likelihood == "gaussian" and gp_approx == "full_scale_tapering" (Gaussian likelihood with a full-scale tapering approximation)
`seed`	An `integer` specifying the seed used for model creation (e.g., random ordering in Vecchia approximation)
`cluster_ids`	A `vector` with elements indicating independent realizations of random effects / Gaussian processes (same values = same process realization). The elements of 'cluster_ids' can be integer, double, or character.
`free_raw_data`	A `boolean`. If TRUE, the data (groups, coordinates, covariate data for random coefficients) is freed in R after initialization
`y`	A `vector` with response variable data
`X`	A `matrix` with numeric covariate data for the fixed effects linear regression term (if there is one)
`params`	A `list` with parameters for the estimation / optimization optimizer_cov: `string` (default = "lbfgs"). Optimizer used for estimating covariance parameters. Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead". If there are additional auxiliary parameters for non-Gaussian likelihoods, 'optimizer_cov' is also used for those optimizer_coef: `string` (default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods). Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: "gradient_descent", "lbfgs", "wls", "nelder_mead". Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. "wls" refers to doing coordinate descent for the regression coefficients using weighted least squares. If 'optimizer_cov' is set to "nelder_mead" or "lbfgs", 'optimizer_coef' is automatically also set to the same value. maxit: `integer` (default = 1000). Maximal number of iterations for optimization algorithm delta_rel_conv: `numeric` (default = 1E-6 except for "nelder_mead" for which the default is 1E-8). Convergence tolerance. The algorithm stops if the relative change in either the (approximate) log-likelihood or the parameters is below this value. If < 0, internal default values are used convergence_criterion: `string` (default = "relative_change_in_log_likelihood"). The convergence criterion used for terminating the optimization algorithm. Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters" init_coef: `vector` with `numeric` elements (default = NULL). Initial values for the regression coefficients (if there are any, can be NULL) init_cov_pars: `vector` with `numeric` elements (default = NULL). Initial values for covariance parameters of Gaussian process and random effects (can be NULL). The order it the same as the order of the parameters in the summary function: first is the error variance (only for "gaussian" likelihood), next follow the variances of the grouped random effects (if there are any, in the order provided in 'group_data'), and then follow the marginal variance and the range of the Gaussian process. If there are multiple Gaussian processes, then the variances and ranges follow alternatingly. If 'init_cov_pars = NULL', an internal choice is used that depends on the likelihood and the random effects type and covariance function. If you select the option 'trace = TRUE' in the 'params' argument, you will see the first initial covariance parameters in iteration 0. lr_coef: `numeric` (default = 0.1). Learning rate for fixed effect regression coefficients if gradient descent is used lr_cov: `numeric` (default = 0.1 for "gradient_descent" and 1. otherwise). Initial learning rate for covariance parameters if a gradient-based optimization method is used If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise) If there are additional auxiliary parameters for non-Gaussian likelihoods, 'lr_cov' is also used for those For "lbfgs", this is divided by the norm of the gradient in the first iteration use_nesterov_acc: `boolean` (default = TRUE). If TRUE Nesterov acceleration is used. This is used only for gradient descent acc_rate_coef: `numeric` (default = 0.5). Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration acc_rate_cov: `numeric` (default = 0.5). Acceleration rate for covariance parameters for Nesterov acceleration momentum_offset: `integer` (Default = 2). Number of iterations for which no momentum is applied in the beginning. trace: `boolean` (default = FALSE). If TRUE, information on the progress of the parameter optimization is printed std_dev: `boolean` (default = TRUE). If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters (= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods) init_aux_pars: `vector` with `numeric` elements (default = NULL). Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative_binomial likelihood) estimate_aux_pars: `boolean` (default = TRUE). If TRUE, additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood) cg_max_num_it: `integer` (default = 1000). Maximal number of iterations for conjugate gradient algorithms cg_max_num_it_tridiag: `integer` (default = 1000). Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization cg_delta_conv: `numeric` (default = 1E-2). Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation num_rand_vec_trace: `integer` (default = 50). Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix reuse_rand_vec_trace: `boolean` (default = TRUE). If true, random vectors (e.g., Rademacher) for stochastic approximations of the trace of a matrix are sampled only once at the beginning of the parameter estimation and reused in later trace approximations. Otherwise they are sampled every time a trace is calculated seed_rand_vec_trace: `integer` (default = 1). Seed number to generate random vectors (e.g., Rademacher) cg_preconditioner_type (`string`): Type of preconditioner used for conjugate gradient algorithms. Options for grouped random effects: "ssor" (= default): SSOR preconditioner "incomplete_cholesky": zero fill-in incomplete Cholesky factorization Options for likelihood != "gaussian" and gp_approx == "vecchia" or likelihood == "gaussian" and gp_approx == "vecchia_latent": "vadu" (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1 "fitc": FITC / modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1) "pivoted_cholesky": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma "incomplete_cholesky": zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1 Options for likelihood != "gaussian" and gp_approx == "full_scale_vecchia": "fitc" ( = default): FITC / modified predictive process preconditioner "vifdu": VIF with diagonal update preconditioner Options for likelihood == "gaussian" and gp_approx == "full_scale_tapering": "fitc" (= default): modified predictive process preconditioner "none": no preconditioner fitc_piv_chol_preconditioner_rank (`integer` ): Rank of the FITC and pivoted Cholesky decomposition preconditioners for iterative methods for Vecchia and VIF approximations (for full_scale_tapering, the same inducing points as in the approximation as used). Internal default values if NULL or < 0: 200 for the FITC preconditioner 50 for the pivoted Cholesky decomposition preconditioner
`vecchia_approx`	Discontinued. Use the argument `gp_approx` instead
`vecchia_pred_type`	A `string` specifying the type of Vecchia approximation used for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this
`num_neighbors_pred`	an `integer` specifying the number of neighbors for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this
`offset`	A `numeric` `vector` with additional fixed effects contributions that are added to the linear predictor (= offset). The length of this vector needs to equal the number of training data points.
`fixed_effects`	This is discontinued. Use the renamed equivalent argument `offset` instead
`likelihood_additional_param`	A `numeric` specifying an additional parameter for the `likelihood` which cannot be estimated for this `likelihood` (e.g., degrees of freedom for `likelihood = "t_fix_df"`). This is not to be confused with any auxiliary parameters that can be estimated and accessed through the function `get_aux_pars` after estimation. Note that this `likelihood_additional_param` parameter is irrelevant for many likelihoods. If `likelihood_additional_param = NULL`, the following internal default values are used: df = 2 for likelihood = "t_fix_df"

Value

A fitted GPModel

Author(s)

Fabio Sigrist

Examples

# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples


data(GPBoost_data, package = "gpboost")
# Add intercept column
X1 <- cbind(rep(1,dim(X)[1]),X)
X_test1 <- cbind(rep(1,dim(X_test)[1]),X_test)

#--------------------Grouped random effects model: single-level random effect----------------
gp_model <- fitGPModel(group_data = group_data[,1], y = y, X = X1,
                       likelihood="gaussian", params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, group_data_pred = group_data_test[,1], 
                X_pred = X_test1, predict_var = TRUE)
pred$mu # Predicted mean
pred$var # Predicted variances
# Also predict covariance matrix
pred <- predict(gp_model, group_data_pred = group_data_test[,1], 
                X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted mean
pred$cov # Predicted covariance

#--------------------Two crossed random effects and a random slope----------------
gp_model <- fitGPModel(group_data = group_data, likelihood="gaussian",
                       group_rand_coef_data = X[,2],
                       ind_effect_group_rand_coef = 1,
                       y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)

#--------------------Gaussian process model----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
                       likelihood="gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, gp_coords_pred = coords_test, 
                X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted (posterior) mean of GP
pred$cov # Predicted (posterior) covariance matrix of GP

#--------------------Gaussian process model with Vecchia approximation----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
                       gp_approx = "vecchia", num_neighbors = 20,
                       likelihood="gaussian", y = y)
summary(gp_model)

#--------------------Gaussian process model with random coefficients----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
                       gp_rand_coef_data = X[,2], y=y,
                       likelihood = "gaussian", params = list(std_dev = TRUE))
summary(gp_model)

#--------------------Combine Gaussian process with grouped random effects----------------
gp_model <- fitGPModel(group_data = group_data,
                       gp_coords = coords, cov_function = "matern", cov_fct_shape = 1.5,
                       likelihood = "gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)

gpboost documentation built on June 8, 2025, 1:23 p.m.