block_gp: (generalized) Block forecasting using Gaussian Processes
In ha0ye/rEDM: Applications of Empirical Dynamic Modeling from Time Series

Description Usage Arguments Details Value Examples

block_gp uses multiple time series given as input to generate an attractor reconstruction, and then applies Gaussian process regression to approximate the dynamics and make forecasts. This method is the generalized version of tde_gp, which constructs the block from lags of a time series to pass into this function.

block_gp(block, lib = c(1, NROW(block)), pred = lib, tp = 1,
  phi = 0, v_e = 0, eta = 0, fit_params = TRUE, columns = NULL,
  target_column = 1, stats_only = TRUE,
  save_covariance_matrix = FALSE, first_column_time = FALSE,
  silent = FALSE, ...)

`block`	either a vector to be used as the time series, or a data.frame or matrix where each column is a time series
`lib`	a 2-column matrix (or 2-element vector) where each row specifies the first and last rows of the time series to use for attractor reconstruction
`pred`	(same format as lib), but specifying the sections of the time series to forecast.
`tp`	the prediction horizon (how far ahead to forecast)
`phi`	length-scale parameter. see 'Details'
`v_e`	noise-variance parameter. see 'Details'
`eta`	signal-variance parameter. see 'Details'
`fit_params`	specify whether to use MLE to estimate params over the lib
`columns`	either a vector with the columns to use (indices or names), or a list of such columns
`target_column`	the index (or name) of the column to forecast
`stats_only`	specify whether to output just the forecast statistics or the raw predictions for each run
`save_covariance_matrix`	specifies whether to include the full covariance matrix with the output (and forces the full output as if stats_only were set to FALSE)
`first_column_time`	indicates whether the first column of the given block is a time column (and therefore excluded when indexing)
`silent`	prevents warning messages from being printed to the R console
`...`	other parameters. see 'Details'

The default parameters are set so that passing a vector as the only argument will use that vector to predict itself one time step ahead. If a matrix or data.frame is given as the only argument, the first column will be predicted (one time step ahead), using the remaining columns as the embedding. Rownames will be converted to numeric if possible to be used as the time index, otherwise 1:NROW will be used instead. The default lib and pred are to perform maximum likelihood estimation of the phi, v_e, and eta parameters over the whole time series, and return just the forecast statistics.

If phi, v_e, and eta parameters are given, all combinations of their values will be tried. If fit_params is also set to TRUE, these values will be the initial values for subsequent optimization of likelihood.

The basic model is:

y = f(x) + noise

in which the function f(x) is modeled using a Gaussian process prior:

f ~ GP(0, C)

with mean = 0, and covariance function, C, which is given by the squared-exponential kernel:

C_{ij} = eta * exp(-phi^2 * ||x_i - x_j||^2)

y is a realization from process f with normally-distributed i.i.d. process noise,

noise ~ N(0, v_e)

such that the covariance of observations y_i and y_j is

K_{ij} = C_{ij} + v_e * δ_{ij}

where δ_{ij} is the kronecker delta (i.e. it is 1 if i = j and 0 otherwise)

From the model definition, the variance in y, after marginalizing over f, is given by eta + v_e. Thus to simplify specification of priors for the hyperparameters eta and v_e, the outputs y are normalized to zero mean and unit variance prior to fitting. This allows us to set (0, 1) bounds on eta and v_e which facilitates parameter estimation. We set Beta(2, 2) priors for both eta and v_e to partition prior uncertainty equally across structural and process uncertainty.

For a scalar input, the length-scale parameter phi controls the expected number of zero crossings on the unit interval as

E(crossings) = √(2)/π φ \approx 0.45 φ

Thus to facilitate interpretation and prior specification, the distances in C are scaled by the max distance so that a model with φ = 2 would have roughly one zero crossing over the range of the data. We assign phi a half-Normal prior with variance π / 2 so that the prior mean phi is 1, which tends to avoid overfitting. To fit the GP we estimate eta, v_e, and phi by maximizing the posterior after marginalizing over f(x). This is given by the multivariate normal likelihood

logL = -1/2 log(|K_d|)^(-1/2) y_d^T [K_d]^(-1) y_d

where K_d is the matrix obtained by evaluating the covariance function at all pairs of inputs and y_d is the column vector of outputs. Predictions for new values of x are obtained by setting eta, v_e, and phi to the Maximum a Posteriori (MAP) estimates and using the GP conditional on the observed data. Specifically, given x_d and y_d, the mean and variance for y evaluated at a new value of x are

E(y) = C(x, x_d) [K_d]^(-1) y_d

V(y) = eta + v_e - C(x, x_d)[K_d]^(-1) C(x_d, x)

where the vector C(x, x_d) is obtained by evaluating C at x and each of the observed inputs while holding eta, phi, and v_e at the MAP estimates.

If stats_only, then a data.frame with components for the parameters and forecast statistics:

`embedding`	embedding
`tp`	prediction horizon
`phi`	length-scale parameter
`v_e`	noise-variance parameter
`eta`	signal-variance parameter
`fit_params`	whether params were fitted or not
`num_pred`	number of predictions
`rho`	correlation coefficient between observations and predictions
`mae`	mean absolute error
`rmse`	root mean square error
`perc`	percent correct sign
`p_val`	p-value that rho is significantly greater than 0 using Fisher's z-transformation
`model_output`	data.frame with columns for the time index, observations, mean-value for predictions, and independent variance for predictions (if `stats_only == FALSE` or `save_covariance_matrix == TRUE`)
`covariance_matrix`	the full covariance matrix for predictions (if `save_covariance_matrix == TRUE`)

1
2
3

data("two_species_model")
block <- two_species_model[1:200,]
block_gp(block, columns = c("x", "y"), first_column_time = TRUE)

ha0ye/rEDM documentation built on March 30, 2021, 11:21 p.m.

ha0ye/rEDM index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ha0ye/rEDM
Applications of Empirical Dynamic Modeling from Time Series

block_gp: (generalized) Block forecasting using Gaussian Processes
In ha0ye/rEDM: Applications of Empirical Dynamic Modeling from Time Series

Description

Usage

Arguments

Details

Value

Examples

Related to block_gp in ha0ye/rEDM...

R Package Documentation

Browse R Packages

We want your feedback!

ha0ye/rEDM Applications of Empirical Dynamic Modeling from Time Series

block_gp: (generalized) Block forecasting using Gaussian Processes In ha0ye/rEDM: Applications of Empirical Dynamic Modeling from Time Series

Description

Usage

Arguments

Details

Value

Examples

Related to block_gp in ha0ye/rEDM...

R Package Documentation

Browse R Packages

We want your feedback!

ha0ye/rEDM
Applications of Empirical Dynamic Modeling from Time Series

block_gp: (generalized) Block forecasting using Gaussian Processes
In ha0ye/rEDM: Applications of Empirical Dynamic Modeling from Time Series