fitGPModel: Fits a 'GPModel'

View source: R/GPModel.R

fitGPModelR Documentation

Fits a GPModel

Description

Estimates the parameters of a GPModel using maximum likelihood estimation

Usage

fitGPModel(likelihood = "gaussian", group_data = NULL,
  group_rand_coef_data = NULL, ind_effect_group_rand_coef = NULL,
  drop_intercept_group_rand_effect = NULL, gp_coords = NULL,
  gp_rand_coef_data = NULL, cov_function = "exponential",
  cov_fct_shape = 0.5, gp_approx = "none", cov_fct_taper_range = 1,
  cov_fct_taper_shape = 0, num_neighbors = 20L,
  vecchia_ordering = "random", num_ind_points = 500L,
  matrix_inversion_method = "cholesky", seed = 0L, cluster_ids = NULL,
  free_raw_data = FALSE, y, X = NULL, params = list(),
  vecchia_approx = NULL, vecchia_pred_type = NULL,
  num_neighbors_pred = NULL)

Arguments

likelihood

A string specifying the likelihood function (distribution) of the response variable. Available options:

  • "gaussian"

  • "bernoulli_probit": binary data with Bernoulli likelihood and a probit link function

  • "bernoulli_logit": binary data with Bernoulli likelihood and a logit link function

  • "gamma"

  • "poisson"

group_data

A vector or matrix whose columns are categorical grouping variables. The elements being group levels defining grouped random effects. The elements of 'group_data' can be integer, double, or character. The number of columns corresponds to the number of grouped (intercept) random effects

group_rand_coef_data

A vector or matrix with numeric covariate data for grouped random coefficients

ind_effect_group_rand_coef

A vector with integer indices that indicate the corresponding categorical grouping variable (=columns) in 'group_data' for every covariate in 'group_rand_coef_data'. Counting starts at 1. The length of this index vector must equal the number of covariates in 'group_rand_coef_data'. For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data' have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data', and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient corresponding to the second grouping variable (=second column) in 'group_data'

drop_intercept_group_rand_effect

A vector of type logical (boolean). Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included. Only random effects with random slopes can be dropped.

gp_coords

A matrix with numeric coordinates (= inputs / features) for defining Gaussian processes

gp_rand_coef_data

A vector or matrix with numeric covariate data for Gaussian process random coefficients

cov_function

A string specifying the covariance function for the Gaussian process. Available options: "exponential", "gaussian", "matern", "powered_exponential", "wendland"

  • For "exponential", "gaussian", and "powered_exponential", we use parametrization of Diggle and Ribeiro (2007)

  • For "matern", we use the parametrization of Rasmussen and Williams (2006)

  • For "wendland", we use the parametrization of Bevilacqua et al. (2019, AOS)

cov_fct_shape

A numeric specifying the shape parameter of the covariance function (=smoothness parameter for Matern covariance) This parameter is irrelevant for some covariance functions such as the exponential or Gaussian

gp_approx

A string specifying the large data approximation for Gaussian processes. Available options:

  • "none": No approximation

  • "vecchia": A Vecchia approximation; see Sigrist (2022, JMLR for more details)

  • "tapering": The covariance function is multiplied by a compactly supported Wendland correlation function

cov_fct_taper_range

A numeric specifying the range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

cov_fct_taper_shape

A numeric specifying the shape (=smoothness) parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

num_neighbors

An integer specifying the number of neighbors for the Vecchia approximation. Note: for prediction, the number of neighbors can be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data' function. By default, num_neighbors_pred = 2 * num_neighbors. Further, the type of Vecchia approximation used for making predictions is set through the 'vecchia_pred_type' parameter in the 'set_prediction_data' function

vecchia_ordering

A string specifying the ordering used in the Vecchia approximation. Available options:

  • "none": the default ordering in the data is used

  • "random": a random ordering

num_ind_points

An integer specifying the number of inducing points / knots for, e.g., a predictive process approximation

matrix_inversion_method

A string specifying the method used for inverting covariance matrices. Available options:

  • "cholesky": Cholesky factorization

  • "iterative": iterative methods. Only supported for non-Gaussian likelihoods with a Vecchia-Laplace approximation. This a combination of conjugate gradient, Lanczos algorithm, and other methods

seed

An integer specifying the seed used for model creation (e.g., random ordering in Vecchia approximation)

cluster_ids

A vector with elements indicating independent realizations of random effects / Gaussian processes (same values = same process realization). The elements of 'cluster_ids' can be integer, double, or character.

free_raw_data

A boolean. If TRUE, the data (groups, coordinates, covariate data for random coefficients) is freed in R after initialization

y

A vector with response variable data

X

A matrix with numeric covariate data for the fixed effects linear regression term (if there is one)

params

A list with parameters for the estimation / optimization

  • optimizer_cov: string (default = "gradient_descent"). Optimizer used for estimating covariance parameters. Options: "gradient_descent", "fisher_scoring", "nelder_mead", "bfgs", "adam". If there are additional auxiliary parameters for non-Gaussian likelihoods, 'optimizer_cov' is also used for those

  • optimizer_coef: string (default = "wls" for Gaussian likelihoods and "gradient_descent" for other likelihoods). Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: "gradient_descent", "wls", "nelder_mead", "bfgs", "adam". Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. "wls" refers to doing coordinate descent for the regression coefficients using weighted least squares. If 'optimizer_cov' is set to "nelder_mead", "bfgs", or "adam", 'optimizer_coef' is automatically also set to the same value.

  • maxit: integer (default = 1000). Maximal number of iterations for optimization algorithm

  • delta_rel_conv: numeric (default = 1E-6 except for "nelder_mead" for which the default is 1E-8). Convergence tolerance. The algorithm stops if the relative change in either the (approximate) log-likelihood or the parameters is below this value. For "bfgs" and "adam", the L2 norm of the gradient is used instead of the relative change in the log-likelihood. If < 0, internal default values are used

  • convergence_criterion: string (default = "relative_change_in_log_likelihood"). The convergence criterion used for terminating the optimization algorithm. Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters"

  • init_coef: vector with numeric elements (default = NULL). Initial values for the regression coefficients (if there are any, can be NULL)

  • init_cov_pars: vector with numeric elements (default = NULL). Initial values for covariance parameters of Gaussian process and random effects (can be NULL)

  • lr_coef: numeric (default = 0.1). Learning rate for fixed effect regression coefficients if gradient descent is used

  • lr_cov: numeric (default = 0.1 for "gradient_descent" and 1. for "fisher_scoring"). Initial learning rate for covariance parameters. If lr_cov < 0, internal default values are used. If there are additional auxiliary parameters for non-Gaussian likelihoods, 'lr_cov' is also used for those

  • use_nesterov_acc: boolean (default = TRUE). If TRUE Nesterov acceleration is used. This is used only for gradient descent

  • acc_rate_coef: numeric (default = 0.5). Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration

  • acc_rate_cov: numeric (default = 0.5). Acceleration rate for covariance parameters for Nesterov acceleration

  • momentum_offset: integer (Default = 2). Number of iterations for which no momentum is applied in the beginning.

  • trace: boolean (default = FALSE). If TRUE, information on the progress of the parameter optimization is printed

  • std_dev: boolean (default = TRUE). If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters (= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods)

  • init_aux_pars: vector with numeric elements (default = NULL). Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of gamma likelihood)

  • estimate_aux_pars: boolean (default = TRUE). If TRUE, additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of gamma likelihood)

  • cg_max_num_it: integer (default = 1000). Maximal number of iterations for conjugate gradient algorithms

  • cg_max_num_it_tridiag: integer (default = 1000). Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization

  • cg_delta_conv: numeric (default = 1E-2). Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation

  • num_rand_vec_trace: integer (default = 50). Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix

  • reuse_rand_vec_trace: boolean (default = TRUE). If true, random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix are sampled only once at the beginning of Newton's method for finding the mode in the Laplace approximation and are then reused in later trace approximations. Otherwise they are sampled every time a trace is calculated

  • seed_rand_vec_trace: integer (default = 1). Seed number to generate random vectors (e.g., Rademacher)

  • piv_chol_rank: integer (default = 50). Rank of the pivoted Cholesky decomposition used as preconditioner in conjugate gradient algorithms

  • cg_preconditioner_type: string (default = "Sigma_inv_plus_BtWB" for non-Gaussian likelihoods and a Vecchia-Laplace approximation). Type of preconditioner used for conjugate gradient algorithms. Options for non-Gaussian likelihoods and a Vecchia-Laplace approximation:

    • "piv_chol_on_Sigma": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma

    • "Sigma_inv_plus_BtWB": (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1

vecchia_approx

Discontinued. Use the argument gp_approx instead

vecchia_pred_type

A string specifying the type of Vecchia approximation used for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this

num_neighbors_pred

an integer specifying the number of neighbors for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this

Value

A fitted GPModel

Author(s)

Fabio Sigrist

Examples

# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples


data(GPBoost_data, package = "gpboost")
# Add intercept column
X1 <- cbind(rep(1,dim(X)[1]),X)
X_test1 <- cbind(rep(1,dim(X_test)[1]),X_test)

#--------------------Grouped random effects model: single-level random effect----------------
gp_model <- fitGPModel(group_data = group_data[,1], y = y, X = X1,
                       likelihood="gaussian", params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, group_data_pred = group_data_test[,1], 
                X_pred = X_test1, predict_var = TRUE)
pred$mu # Predicted mean
pred$var # Predicted variances
# Also predict covariance matrix
pred <- predict(gp_model, group_data_pred = group_data_test[,1], 
                X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted mean
pred$cov # Predicted covariance

#--------------------Two crossed random effects and a random slope----------------
gp_model <- fitGPModel(group_data = group_data, likelihood="gaussian",
                       group_rand_coef_data = X[,2],
                       ind_effect_group_rand_coef = 1,
                       y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)

#--------------------Gaussian process model----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "exponential",
                       likelihood="gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)
# Make predictions
pred <- predict(gp_model, gp_coords_pred = coords_test, 
                X_pred = X_test1, predict_cov_mat = TRUE)
pred$mu # Predicted (posterior) mean of GP
pred$cov # Predicted (posterior) covariance matrix of GP

#--------------------Gaussian process model with Vecchia approximation----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "exponential",
                       gp_approx = "vecchia", num_neighbors = 20,
                       likelihood="gaussian", y = y)
summary(gp_model)

#--------------------Gaussian process model with random coefficients----------------
gp_model <- fitGPModel(gp_coords = coords, cov_function = "exponential",
                       gp_rand_coef_data = X[,2], y=y,
                       likelihood = "gaussian", params = list(std_dev = TRUE))
summary(gp_model)

#--------------------Combine Gaussian process with grouped random effects----------------
gp_model <- fitGPModel(group_data = group_data,
                       gp_coords = coords, cov_function = "exponential",
                       likelihood = "gaussian", y = y, X = X1, params = list(std_dev = TRUE))
summary(gp_model)



gpboost documentation built on Oct. 24, 2023, 9:09 a.m.