GPModel: Create a 'GPModel' object

View source: R/GPModel.R

GPModelR Documentation

Create a GPModel object

Description

Create a GPModel which contains a Gaussian process and / or mixed effects model with grouped random effects

Usage

GPModel(likelihood = "gaussian", group_data = NULL,
  group_rand_coef_data = NULL, ind_effect_group_rand_coef = NULL,
  drop_intercept_group_rand_effect = NULL, gp_coords = NULL,
  gp_rand_coef_data = NULL, cov_function = "exponential",
  cov_fct_shape = 0.5, gp_approx = "none", cov_fct_taper_range = 1,
  cov_fct_taper_shape = 0, num_neighbors = 20L,
  vecchia_ordering = "random", num_ind_points = 500L,
  matrix_inversion_method = "cholesky", seed = 0L, cluster_ids = NULL,
  free_raw_data = FALSE, vecchia_approx = NULL, vecchia_pred_type = NULL,
  num_neighbors_pred = NULL)

Arguments

likelihood

A string specifying the likelihood function (distribution) of the response variable. Available options:

  • "gaussian"

  • "bernoulli_probit": binary data with Bernoulli likelihood and a probit link function

  • "bernoulli_logit": binary data with Bernoulli likelihood and a logit link function

  • "gamma"

  • "poisson"

group_data

A vector or matrix whose columns are categorical grouping variables. The elements being group levels defining grouped random effects. The elements of 'group_data' can be integer, double, or character. The number of columns corresponds to the number of grouped (intercept) random effects

group_rand_coef_data

A vector or matrix with numeric covariate data for grouped random coefficients

ind_effect_group_rand_coef

A vector with integer indices that indicate the corresponding categorical grouping variable (=columns) in 'group_data' for every covariate in 'group_rand_coef_data'. Counting starts at 1. The length of this index vector must equal the number of covariates in 'group_rand_coef_data'. For instance, c(1,1,2) means that the first two covariates (=first two columns) in 'group_rand_coef_data' have random coefficients corresponding to the first categorical grouping variable (=first column) in 'group_data', and the third covariate (=third column) in 'group_rand_coef_data' has a random coefficient corresponding to the second grouping variable (=second column) in 'group_data'

drop_intercept_group_rand_effect

A vector of type logical (boolean). Indicates whether intercept random effects are dropped (only for random coefficients). If drop_intercept_group_rand_effect[k] is TRUE, the intercept random effect number k is dropped / not included. Only random effects with random slopes can be dropped.

gp_coords

A matrix with numeric coordinates (= inputs / features) for defining Gaussian processes

gp_rand_coef_data

A vector or matrix with numeric covariate data for Gaussian process random coefficients

cov_function

A string specifying the covariance function for the Gaussian process. Available options: "exponential", "gaussian", "matern", "powered_exponential", "wendland"

  • For "exponential", "gaussian", and "powered_exponential", we use parametrization of Diggle and Ribeiro (2007)

  • For "matern", we use the parametrization of Rasmussen and Williams (2006)

  • For "wendland", we use the parametrization of Bevilacqua et al. (2019, AOS)

cov_fct_shape

A numeric specifying the shape parameter of the covariance function (=smoothness parameter for Matern covariance) This parameter is irrelevant for some covariance functions such as the exponential or Gaussian

gp_approx

A string specifying the large data approximation for Gaussian processes. Available options:

  • "none": No approximation

  • "vecchia": A Vecchia approximation; see Sigrist (2022, JMLR for more details)

  • "tapering": The covariance function is multiplied by a compactly supported Wendland correlation function

cov_fct_taper_range

A numeric specifying the range parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

cov_fct_taper_shape

A numeric specifying the shape (=smoothness) parameter of the Wendland covariance function and Wendland correlation taper function. We follow the notation of Bevilacqua et al. (2019, AOS)

num_neighbors

An integer specifying the number of neighbors for the Vecchia approximation. Note: for prediction, the number of neighbors can be set through the 'num_neighbors_pred' parameter in the 'set_prediction_data' function. By default, num_neighbors_pred = 2 * num_neighbors. Further, the type of Vecchia approximation used for making predictions is set through the 'vecchia_pred_type' parameter in the 'set_prediction_data' function

vecchia_ordering

A string specifying the ordering used in the Vecchia approximation. Available options:

  • "none": the default ordering in the data is used

  • "random": a random ordering

num_ind_points

An integer specifying the number of inducing points / knots for, e.g., a predictive process approximation

matrix_inversion_method

A string specifying the method used for inverting covariance matrices. Available options:

  • "cholesky": Cholesky factorization

  • "iterative": iterative methods. Only supported for non-Gaussian likelihoods with a Vecchia-Laplace approximation. This a combination of conjugate gradient, Lanczos algorithm, and other methods

seed

An integer specifying the seed used for model creation (e.g., random ordering in Vecchia approximation)

cluster_ids

A vector with elements indicating independent realizations of random effects / Gaussian processes (same values = same process realization). The elements of 'cluster_ids' can be integer, double, or character.

free_raw_data

A boolean. If TRUE, the data (groups, coordinates, covariate data for random coefficients) is freed in R after initialization

vecchia_approx

Discontinued. Use the argument gp_approx instead

vecchia_pred_type

A string specifying the type of Vecchia approximation used for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this

num_neighbors_pred

an integer specifying the number of neighbors for making predictions. This is discontinued here. Use the function 'set_prediction_data' to specify this

Value

A GPModel containing ontains a Gaussian process and / or mixed effects model with grouped random effects

Author(s)

Fabio Sigrist

Examples

# See https://github.com/fabsig/GPBoost/tree/master/R-package for more examples

data(GPBoost_data, package = "gpboost")

#--------------------Grouped random effects model: single-level random effect----------------
gp_model <- GPModel(group_data = group_data[,1], likelihood="gaussian")

#--------------------Gaussian process model----------------
gp_model <- GPModel(gp_coords = coords, cov_function = "exponential",
                    likelihood="gaussian")

#--------------------Combine Gaussian process with grouped random effects----------------
gp_model <- GPModel(group_data = group_data,
                    gp_coords = coords, cov_function = "exponential",
                    likelihood="gaussian")

gpboost documentation built on Oct. 24, 2023, 9:09 a.m.