fit: Generic 'fit' method for a 'GPModel'
In gpboost: Combining Tree-Boosting with Gaussian Process and Mixed Effects Models

fit	R Documentation

Generic 'fit' method for a `GPModel`

Description

Generic 'fit' method for a GPModel

Usage

fit(gp_model, y, X, params, offset = NULL, fixed_effects = NULL)

Arguments

`gp_model`	a `GPModel`
`y`	A `vector` with response variable data
`X`	A `matrix` with numeric covariate data for the fixed effects linear regression term (if there is one)
`params`	A `list` with parameters for the estimation / optimization optimizer_cov: `string` (default = "lbfgs"). Optimizer used for estimating covariance parameters. Options: "gradient_descent", "lbfgs", "fisher_scoring", "newton", "nelder_mead". If there are additional auxiliary parameters for non-Gaussian likelihoods, 'optimizer_cov' is also used for those optimizer_coef: `string` (default = "wls" for Gaussian likelihoods and "lbfgs" for other likelihoods). Optimizer used for estimating linear regression coefficients, if there are any (for the GPBoost algorithm there are usually none). Options: "gradient_descent", "lbfgs", "wls", "nelder_mead". Gradient descent steps are done simultaneously with gradient descent steps for the covariance parameters. "wls" refers to doing coordinate descent for the regression coefficients using weighted least squares. If 'optimizer_cov' is set to "nelder_mead" or "lbfgs", 'optimizer_coef' is automatically also set to the same value. maxit: `integer` (default = 1000). Maximal number of iterations for optimization algorithm delta_rel_conv: `numeric` (default = 1E-6 except for "nelder_mead" for which the default is 1E-8). Convergence tolerance. The algorithm stops if the relative change in either the (approximate) log-likelihood or the parameters is below this value. If < 0, internal default values are used convergence_criterion: `string` (default = "relative_change_in_log_likelihood"). The convergence criterion used for terminating the optimization algorithm. Options: "relative_change_in_log_likelihood" or "relative_change_in_parameters" init_coef: `vector` with `numeric` elements (default = NULL). Initial values for the regression coefficients (if there are any, can be NULL) init_cov_pars: `vector` with `numeric` elements (default = NULL). Initial values for covariance parameters of Gaussian process and random effects (can be NULL). The order it the same as the order of the parameters in the summary function: first is the error variance (only for "gaussian" likelihood), next follow the variances of the grouped random effects (if there are any, in the order provided in 'group_data'), and then follow the marginal variance and the range of the Gaussian process. If there are multiple Gaussian processes, then the variances and ranges follow alternatingly. If 'init_cov_pars = NULL', an internal choice is used that depends on the likelihood and the random effects type and covariance function. If you select the option 'trace = TRUE' in the 'params' argument, you will see the first initial covariance parameters in iteration 0. lr_coef: `numeric` (default = 0.1). Learning rate for fixed effect regression coefficients if gradient descent is used lr_cov: `numeric` (default = 0.1 for "gradient_descent" and 1. otherwise). Initial learning rate for covariance parameters if a gradient-based optimization method is used If lr_cov < 0, internal default values are used (0.1 for "gradient_descent" and 1. otherwise) If there are additional auxiliary parameters for non-Gaussian likelihoods, 'lr_cov' is also used for those For "lbfgs", this is divided by the norm of the gradient in the first iteration use_nesterov_acc: `boolean` (default = TRUE). If TRUE Nesterov acceleration is used. This is used only for gradient descent acc_rate_coef: `numeric` (default = 0.5). Acceleration rate for regression coefficients (if there are any) for Nesterov acceleration acc_rate_cov: `numeric` (default = 0.5). Acceleration rate for covariance parameters for Nesterov acceleration momentum_offset: `integer` (Default = 2). Number of iterations for which no momentum is applied in the beginning. trace: `boolean` (default = FALSE). If TRUE, information on the progress of the parameter optimization is printed std_dev: `boolean` (default = TRUE). If TRUE, approximate standard deviations are calculated for the covariance and linear regression parameters (= square root of diagonal of the inverse Fisher information for Gaussian likelihoods and square root of diagonal of a numerically approximated inverse Hessian for non-Gaussian likelihoods) init_aux_pars: `vector` with `numeric` elements (default = NULL). Initial values for additional parameters for non-Gaussian likelihoods (e.g., shape parameter of a gamma or negative_binomial likelihood) estimate_aux_pars: `boolean` (default = TRUE). If TRUE, additional parameters for non-Gaussian likelihoods are also estimated (e.g., shape parameter of a gamma or negative_binomial likelihood) cg_max_num_it: `integer` (default = 1000). Maximal number of iterations for conjugate gradient algorithms cg_max_num_it_tridiag: `integer` (default = 1000). Maximal number of iterations for conjugate gradient algorithm when being run as Lanczos algorithm for tridiagonalization cg_delta_conv: `numeric` (default = 1E-2). Tolerance level for L2 norm of residuals for checking convergence in conjugate gradient algorithm when being used for parameter estimation num_rand_vec_trace: `integer` (default = 50). Number of random vectors (e.g., Rademacher) for stochastic approximation of the trace of a matrix reuse_rand_vec_trace: `boolean` (default = TRUE). If true, random vectors (e.g., Rademacher) for stochastic approximations of the trace of a matrix are sampled only once at the beginning of the parameter estimation and reused in later trace approximations. Otherwise they are sampled every time a trace is calculated seed_rand_vec_trace: `integer` (default = 1). Seed number to generate random vectors (e.g., Rademacher) cg_preconditioner_type (`string`): Type of preconditioner used for conjugate gradient algorithms. Options for grouped random effects: "ssor" (= default): SSOR preconditioner "incomplete_cholesky": zero fill-in incomplete Cholesky factorization Options for likelihood != "gaussian" and gp_approx == "vecchia" or likelihood == "gaussian" and gp_approx == "vecchia_latent": "vadu" (= default): (B^T * (D^-1 + W) * B) as preconditioner for inverting (B^T * D^-1 * B + W), where B^T * D^-1 * B approx= Sigma^-1 "fitc": FITC / modified predictive process preconditioner for inverting (B^-1 * D * B^-T + W^-1) "pivoted_cholesky": (Lk * Lk^T + W^-1) as preconditioner for inverting (B^-1 * D * B^-T + W^-1), where Lk is a low-rank pivoted Cholesky approximation for Sigma and B^-1 * D * B^-T approx= Sigma "incomplete_cholesky": zero fill-in incomplete (reverse) Cholesky factorization of (B^T * D^-1 * B + W) using the sparsity pattern of B^T * D^-1 * B approx= Sigma^-1 Options for likelihood != "gaussian" and gp_approx == "full_scale_vecchia": "fitc" ( = default): FITC / modified predictive process preconditioner "vifdu": VIF with diagonal update preconditioner Options for likelihood == "gaussian" and gp_approx == "full_scale_tapering": "fitc" (= default): modified predictive process preconditioner "none": no preconditioner fitc_piv_chol_preconditioner_rank (`integer` ): Rank of the FITC and pivoted Cholesky decomposition preconditioners for iterative methods for Vecchia and VIF approximations (for full_scale_tapering, the same inducing points as in the approximation as used). Internal default values if NULL or < 0: 200 for the FITC preconditioner 50 for the pivoted Cholesky decomposition preconditioner
`offset`	A `numeric` `vector` with additional fixed effects contributions that are added to the linear predictor (= offset). The length of this vector needs to equal the number of training data points.
`fixed_effects`	This is discontinued. Use the renamed equivalent argument `offset` instead