fitGP: Fit a GP model
In tanyalrogers/GPEDM: Gaussian Process regression for Empirical Dynamic Modeling

fitGP

R Documentation

Fit a GP model

Description

Fits a GP model with separable length scales and automatic relevance determination (ARD). Also fits a hierarchical version of the GP model if distinct populations are indicated using pop. There are several ways to specify the training data:

A. supply data frame data, plus column names or indices for y, x, pop, and time.
B. supply vectors for y, pop, and time, and a matrix or vector for x.

For each of the above 2 options, there are 3 options for specifying the predictors.

supplying y and x (omitting E and tau) will use the columns of x as predictors. This allows for the most customization.
supplying y, E, and tau (omitting x) will use E lags of y (with spacing tau) as predictors. This is equivalent to option 3 with x=y.
supplying y, x, E, and tau will use E lags of each column of x (with spacing tau) as predictors. Do not use this option if x already contains lags, in that case use option 1.

Arguments pop and time are optional in all of the above cases. If pop and time are supplied, they should NOT contain missing values. This function will also, optionally, produce predictions. See Details for more information.

Usage

fitGP(
  data = NULL,
  y,
  x = NULL,
  pop = NULL,
  time = NULL,
  E = NULL,
  tau = NULL,
  scaling = c("global", "local", "none"),
  initpars = NULL,
  modeprior = 1,
  fixedpars = NULL,
  rhofixed = NULL,
  rhomatrix = NULL,
  augdata = NULL,
  predictmethod = NULL,
  newdata = NULL,
  xnew = NULL,
  popnew = NULL,
  timenew = NULL,
  ynew = NULL,
  returnGPgrad = FALSE,
  exclradius = 0,
  linprior = c("none", "local", "global")
)

Arguments

`data`	A data frame, or matrix with named columns.
`y`	The response variable (required). If `data` is supplied, a column name (character) or index (numeric). If `data` is not supplied, a numeric vector.
`x`	Predictor variables. If `data` is supplied, column names (character vector) or indices (numeric vector). If `data` is not supplied, a numeric matrix (or vector, if there is only one predictor variable). If `x` is not supplied, values for `E` and `tau` must be provided to construct it internally.
`pop`	Identifies separate populations (optional, if not supplied, defaults to 1 population). Population values can be either numeric, character, or factor. If `data` is supplied, a column name (character) or index (numeric). If `data` is not supplied, a vector (numeric, character, or factor).
`time`	A time index (optional, if not supplied, defaults to a numeric index). Important: The time index is not used for model fitting (timesteps are assumed to be evenly spaced) but supplying `time` will be add these values to the output table, which may be useful for later plotting purposes. If `data` is supplied, a column name (character) or index (numeric). If `data` is not supplied, a numeric vector.
`E`	Embedding dimension. If supplied, will be used to constuct lags of `x` (or lags of `y` if `x` is not supplied).
`tau`	Time delay. If supplied, will be used to constuct lags of `x` (or lags of `y` if `x` is not supplied).
`scaling`	How the variables should be scaled (see Details). Scaling can be `"global"` (default), `"local"` (only applicable if more than 1 pop), or `"none"`. Scaling is applied to `y` and each `x`. For a mix of scalings, do it manually beforehand and set scaling to `"none"`. All outputs are backtransformed to original scale.
`initpars`	Starting values for hyperparameters (see Details) in the order `c(phi,ve,sigma2,rho)`, in constrained (non-transformed) space. Should be a numeric vector with length `ncol(x)+3`. Defaults used if not supplied: `c(rep(0.1,ncol(x)), 0.001, 1-0.001, 0.5)`
`modeprior`	This value is used by the phi prior and sets the expected number of modes over the unit interval. Defaults to 1.
`fixedpars`	Fixes values of the hyperparameters phi, ve, and sigma2 (if desired). Should be a numeric vector with length `ncol(x)+2` in the order `c(phi,ve,sigma2)` in constrained (non-transformed) space. Hyperparameters to estimate should be input as NA. The value of rho is fixed separately using argument `rhofixed`.
`rhofixed`	Fixes the rho parameter, if desired (see Details).
`rhomatrix`	Symmetrical square matrix of fixed pairwise rho values to use, with 1's on the diagonal. The rows and columns should be named with the population identifier. The output of `getrhomatrix` can be used here. All populations in the dataset must appear in `rhomatrix`. Populations in `rhomatrix` that are not in the dataset are allowed and will not be used.
`augdata`	A data frame with augmentation data (see Details).
`predictmethod`	Using the training data, `loo` does leave-one-out prediction, `lto` does leave-timepoint-out prediction, and `sequential` does sequential (leave-future-out) prediction.
`newdata`	Data frame containing the same columns supplied in the original model.
`xnew`	New predictor matrix or vector. Not required if `newdata` is supplied.
`popnew`	New population vector. Not required if `newdata` is supplied.
`timenew`	New time vector. Not required if `newdata` is supplied.
`ynew`	New response vector. Optional, unless `E` and `tau` were supplied in lieu of `x`. Not required if `newdata` is supplied.
`returnGPgrad`	Return the gradient (derivative) of the GP model at each time point with respect to each input. This is only computed for out-of-sample predictions using `newdata`, `loo`, or `lto`. If you want the in-sample gradient, pass the original dataset as `newdata`. Defaults to FALSE.
`exclradius`	For `predictmethod="loo"` and `predictmethod="lto"`, the number of points on either side of the focal point to leave out of the training data. Defaults to 0.
`linprior`	Fit GP model to the residuals of a linear relationship between `y` and the first variable of `x` (prior to scaling) and backtransforms as appropriate. If `y` is growth rate, acts like a Ricker prior. Defaults to `"none"`. Option `"local"` fits a separate regression for each population, `"global"` fits a single regression.

Details

The data are assumed to be in time order. This is particularly important if E and tau are supplied or predictmethod="sequential" is used.

Missing values:

Missing values for y and x are allowed. Any rows containing missing values will be excluded from the model fitting procedure (reinserted as NAs in the output). If pop and time are supplied, they should NOT contain missing values.

Hyperparameters:

The model uses a squared exponential covariance function. See the "Extended introduction" vignette for mathematical details.

There is one inverse length scale phi estimated for each predictor variable (input dimension). Values near 0 indicate that the predictor has little influence on the response variable, and it was likely dropped by ARD. Higher values of phi indicate greater nonlinearity.

The estimated variance terms are ve (process variance) and sigma2 (pointwise prior variance).

Hyperparameter rho is the dynamic correlation in the hierarchical GP model, indicating similarity of dynamics across populations. If there is only 1 population (e.g. if pop is not supplied), rho is irrelevant (not used by the model) and will revert to the mode of its prior (~0.5). The value of rho can be fixed to any value from 0.0001 to 0.9999 using rhofixed, otherwise it is estimated from the data. Alternatively, a matrix of fixed pairwise rho values can be supplied using rhomatrix. In this case, the single rho parameter will also not be used and will revert to the mode of the prior (~0.5). A pairwise rho matrix can be generated using getrhomatrix, or you can create a custom one (e.g. based on geographical distance).

Scaling:

The model priors assume the response variable and all predictor variables have approximately zero mean and unit variance, therefore it is important to scale the data properly when fitting these models. For convenience, this function will scale the input data automatically by default, and backtransform output to the original scale. Automatic scaling can be "global" (default), or "local". The latter will scale variables within (as opposed to across) populations, so is obviously only applicable if there is more than 1 population. You can also scale the data yourself beforehand in whatever manner you wish and set scaling="none". In this case, you will obviously have to do the backtransformation yourself.

Predictions:

There are several options for producing predictions:

Use predictmethod="loo" for leave-one-out prediction using the training data.
Use predictmethod="lto" for leave-timepoint-out prediction using the training data. This will leave out values with the same time index across multiple populations, rather than each individual datapoint. If there is only one population, "lto" will be equivalent to "loo".
Use predictmethod="sequential" for sequential (leave-future-out) prediction using the training data.
If data frame data was supplied, supply data frame newdata containing same column names. Column for y is optional, unless E and tau were supplied in lieu of x.
If vectors/matrices were supplied for y, x, etc, equivalent vector/matrices xnew, popnew (if pop was supplied), and timenew (optional). ynew is optional, unless E and tau were supplied in lieu of x.

After fitting, predictions can also be made using predict.GP. Predictions can plotted using plot.GPpred. Get conditional responses using getconditionals.

It should be noted that "loo" is not a "true" leave-one-out, for although each prediction is made by removing one of the training points, the hyperparameters are fit using all of the training data. The same goes for "sequential".

For the out-of-sample prediction methods "loo", "lto", and newdata, the partial derivatives of the fitted GP function at each time point with respect to each input can be obtained by setting returnGPgrad = T. If you want the in-sample gradient, pass the original (training) data as back in as newdata. It automatic scaling was used, the gradient will also be backtransformed to the original units.

Fit statistics:

The R2 and other fit statistics are always computed for y in the units in which it was supplied to the function. The R2 is specifically computed as:

R^2=1-SS_{err}/SS_{y}

which is equivalent to

R^2=1-MSE/Var_{y}

As a result, the returned R-squared may be negative, particularly for out-of-sample predictions, which are not guaranteed to pass through the mean. A negative R2 indicates that the model predictions are worse than just using the mean (MSE is larger than the variance).

If there are multiple populations, there will be a total R2, which is the R2 across populations, as well as within-population R2 values. For the total R2, the denominator is the variance across all datapoints, ignoring population. Note that if the populations have very different local means, the total R2 can be potentially misleading because the across-population variance will be much larger than the within population variance, increasing R2 even if MSE is constant. Put another way, simply getting the local means right can explain a lot of the across-population variance even if there is little prediction accuracy within a population. Very different local variances can cause similar issues. In this case, we would recommend looking at the within-population R2 values, or at one of the alternative R2 values provided in the fit stats: R2centered (total R2 with local means removed) or R2scaled (total R2 with local centering and scaling), which may be more accurate measures of performance than total R2 if the populations have very different local means and/or variances.

Value

A list (class GP and GPpred) with the following elements:

`pars`	Posterior mode of hyperparameters (named vector).
`parst`	Posterior mode of hyperparameters (named vector), transformed to real line (used for optimization).
`grad`	Likelihood gradients of hyperparameters at posterior mode. Can be used to assess convergence.
`covm`	List containing various covariance matrices and the inverse covariance matrix.
`iter`	Number of iterations until convergence was reached. Currently fixed at a max of 200.
`inputs`	Inputs and scaled inputs. Note that if you use `E` and `tau`, the names of the predictors in the input data frame will be stored under `x_names`, and the names of the lagged predictors corresponding to the inverse length scales will be stored under `x_names2`, provided these names exist.
`scaling`	Scaling information.
`linprior`	If `linprior` is not `"none"`, linear regression information. `linprior_mod` is an `lm` object.
`insampresults`	Data frame with in-sample predictions. `predfsd` is the standard deviation of the GP function, `predsd` includes process error.
`insampfitstats`	Fit statistics for in-sample predictions. Includes R2, rmse, ln_post (log posterior likelihood evaluated at the mode), lnL_LOO (generalized cross-validation approximate leave-one-out negative log likelihood), and df (estimated degrees of freedom, equal to the trace of the smoother matrix). lnL_LOO does not include the prior, so is not directly comparable to ln_post. For diagnostics, also includes likelihood components ln_prior, SS, logdet. If there are multiple populations, also includes R2centered and R2scaled.
`insampfitstatspop`	If >1 population, fit statistics (R2 and rmse) for in-sample predictions by population.
`outsampresults`	Data frame with out-of-sample predictions (if requested). `predfsd` is the standard deviation of the GP function, `predsd` includes process error.
`outsampfitstats`	Fit statistics for out-of-sample predictions (if requested). Only computed if using `"loo"` or `"sequential"`, if `y` is found in `newdata`, or if `ynew` supplied (i.e. if the observed values are known).
`outsampfitstatspop`	If >1 population, fit statistics for out-of-sample predictions (if requested) by population.
`GPgrad`	If `returnGPgrad=T`, a data frame with the partial derivatives of the function with respect to each input.
`call`	Function call. Allows use of `update`.

References

Munch, S. B., Poynor, V., and Arriaza, J. L. 2017. Circumventing structural uncertainty: a Bayesian perspective on nonlinear forecasting for ecology. Ecological Complexity, 32: 134.

Examples

yrand=rnorm(20)
testgp=fitGP(y=yrand,E=2,tau=1,predictmethod = "loo")

tanyalrogers/GPEDM documentation built on June 1, 2025, 8:10 p.m.

tanyalrogers/GPEDM index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

tanyalrogers/GPEDM
Gaussian Process regression for Empirical Dynamic Modeling

fitGP: Fit a GP model
In tanyalrogers/GPEDM: Gaussian Process regression for Empirical Dynamic Modeling

Fit a GP model

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to fitGP in tanyalrogers/GPEDM...

R Package Documentation

Browse R Packages

We want your feedback!

tanyalrogers/GPEDM Gaussian Process regression for Empirical Dynamic Modeling

fitGP: Fit a GP model In tanyalrogers/GPEDM: Gaussian Process regression for Empirical Dynamic Modeling

Fit a GP model

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to fitGP in tanyalrogers/GPEDM...

R Package Documentation

Browse R Packages

We want your feedback!

tanyalrogers/GPEDM
Gaussian Process regression for Empirical Dynamic Modeling

fitGP: Fit a GP model
In tanyalrogers/GPEDM: Gaussian Process regression for Empirical Dynamic Modeling