block_gp: (generalized) Block forecasting using Gaussian Processes

Description Usage Arguments Details Value Examples

View source: R/block_GP.R

Description

block_gp uses multiple time series given as input to generate an attractor reconstruction, and then applies Gaussian process regression to approximate the dynamics and make forecasts. This method is the generalized version of tde_gp, which constructs the block from lags of a time series to pass into this function.

Usage

1
2
3
4
5
block_gp(block, lib = c(1, NROW(block)), pred = lib, tp = 1,
  phi = 0, v_e = 0, eta = 0, fit_params = TRUE, columns = NULL,
  target_column = 1, stats_only = TRUE,
  save_covariance_matrix = FALSE, first_column_time = FALSE,
  silent = FALSE, ...)

Arguments

block

either a vector to be used as the time series, or a data.frame or matrix where each column is a time series

lib

a 2-column matrix (or 2-element vector) where each row specifies the first and last *rows* of the time series to use for attractor reconstruction

pred

(same format as lib), but specifying the sections of the time series to forecast.

tp

the prediction horizon (how far ahead to forecast)

phi

length-scale parameter. see 'Details'

v_e

noise-variance parameter. see 'Details'

eta

signal-variance parameter. see 'Details'

fit_params

specify whether to use MLE to estimate params over the lib

columns

either a vector with the columns to use (indices or names), or a list of such columns

target_column

the index (or name) of the column to forecast

stats_only

specify whether to output just the forecast statistics or the raw predictions for each run

save_covariance_matrix

specifies whether to include the full covariance matrix with the output (and forces the full output as if stats_only were set to FALSE)

first_column_time

indicates whether the first column of the given block is a time column (and therefore excluded when indexing)

silent

prevents warning messages from being printed to the R console

...

other parameters. see 'Details'

Details

The default parameters are set so that passing a vector as the only argument will use that vector to predict itself one time step ahead. If a matrix or data.frame is given as the only argument, the first column will be predicted (one time step ahead), using the remaining columns as the embedding. Rownames will be converted to numeric if possible to be used as the time index, otherwise 1:NROW will be used instead. The default lib and pred are to perform maximum likelihood estimation of the phi, v_e, and eta parameters over the whole time series, and return just the forecast statistics.

If phi, v_e, and eta parameters are given, all combinations of their values will be tried. If fit_params is also set to TRUE, these values will be the initial values for subsequent optimization of likelihood.

The basic model is:

y = f(x) + noise

in which the function f(x) is modeled using a Gaussian process prior:

f ~ GP(0, C)

with mean = 0, and covariance function, C, which is given by the squared-exponential kernel:

C_{ij} = eta * exp(-phi^2 * ||x_i - x_j||^2)

y is a realization from process f with normally-distributed i.i.d. process noise,

noise ~ N(0, v_e)

such that the covariance of observations y_i and y_j is

K_{ij} = C_{ij} + v_e * δ_{ij}

where δ_{ij} is the kronecker delta (i.e. it is 1 if i = j and 0 otherwise)

From the model definition, the variance in y, after marginalizing over f, is given by eta + v_e. Thus to simplify specification of priors for the hyperparameters eta and v_e, the outputs y are normalized to zero mean and unit variance prior to fitting. This allows us to set (0, 1) bounds on eta and v_e which facilitates parameter estimation. We set Beta(2, 2) priors for both eta and v_e to partition prior uncertainty equally across structural and process uncertainty.

For a scalar input, the length-scale parameter phi controls the expected number of zero crossings on the unit interval as

E(crossings) = √(2)/π φ \approx 0.45 φ

Thus to facilitate interpretation and prior specification, the distances in C are scaled by the max distance so that a model with φ = 2 would have roughly one zero crossing over the range of the data. We assign phi a half-Normal prior with variance π / 2 so that the prior mean phi is 1, which tends to avoid overfitting. To fit the GP we estimate eta, v_e, and phi by maximizing the posterior after marginalizing over f(x). This is given by the multivariate normal likelihood

logL = -1/2 log(|K_d|)^(-1/2) y_d^T [K_d]^(-1) y_d

where K_d is the matrix obtained by evaluating the covariance function at all pairs of inputs and y_d is the column vector of outputs. Predictions for new values of x are obtained by setting eta, v_e, and phi to the Maximum a Posteriori (MAP) estimates and using the GP conditional on the observed data. Specifically, given x_d and y_d, the mean and variance for y evaluated at a new value of x are

E(y) = C(x, x_d) [K_d]^(-1) y_d

V(y) = eta + v_e - C(x, x_d)[K_d]^(-1) C(x_d, x)

where the vector C(x, x_d) is obtained by evaluating C at x and each of the observed inputs while holding eta, phi, and v_e at the MAP estimates.

Value

If stats_only, then a data.frame with components for the parameters and forecast statistics:

embedding embedding
tp prediction horizon
phi length-scale parameter
v_e noise-variance parameter
eta signal-variance parameter
fit_params whether params were fitted or not
num_pred number of predictions
rho correlation coefficient between observations and predictions
mae mean absolute error
rmse root mean square error
perc percent correct sign
p_val p-value that rho is significantly greater than 0 using Fisher's z-transformation
model_output data.frame with columns for the time index, observations, mean-value for predictions, and independent variance for predictions (if stats_only == FALSE or save_covariance_matrix == TRUE)
covariance_matrix the full covariance matrix for predictions (if save_covariance_matrix == TRUE)

Examples

1
2
3
data("two_species_model")
block <- two_species_model[1:200,]
block_gp(block, columns = c("x", "y"), first_column_time = TRUE)

ha0ye/rEDM documentation built on March 30, 2021, 11:21 p.m.