SMap: SMap forecasting

View source: R/EDM.R

SMapR Documentation

SMap forecasting

Description

SMap performs time series forecasting based on localised (or global) nearest neighbor projection in the time series phase space as described in Sugihara 1994.

Usage

SMap(pathIn = "./", dataFile = "", dataFrame = NULL, 
  lib = "", pred = "", E = 0, Tp = 1, knn = 0, tau = -1, 
  theta = 0, exclusionRadius = 0, columns = "", target = "", 
  embedded = FALSE, verbose = FALSE,
  validLib = vector(), ignoreNan = TRUE,
  generateSteps = 0, parameterList = FALSE,
  showPlot = FALSE, noTime = FALSE)  

Arguments

pathIn

path to dataFile.

dataFile

.csv format data file name. The first column must be a time index or time values unless noTime is TRUE. The first row must be column names.

dataFrame

input data.frame. The first column must be a time index or time values unless noTime is TRUE. The columns must be named.

lib

string or vector with start and stop indices of input data rows used to create the library from observations. Mulitple row index pairs can be specified with each pair defining the first and last rows of time series observation segments used to create the library.

pred

string with start and stop indices of input data rows used for predictions. A single contiguous range is supported.

E

embedding dimension.

Tp

prediction horizon (number of time column rows).

knn

number of nearest neighbors. If knn=0, knn is set to the library size.

tau

lag of time delay embedding specified as number of time column rows.

theta

neighbor localisation exponent.

exclusionRadius

excludes vectors from the search space of nearest neighbors if their relative time index is within exclusionRadius.

columns

string of whitespace separated column name(s) in the input data used to create the library.

target

column name in the input data used for prediction.

embedded

logical specifying if the input data are embedded.

verbose

logical to produce additional console reporting.

validLib

logical vector the same length as the number of data rows. Any data row represented in this vector as FALSE, will not be included in the library.

ignoreNan

logical to internally redefine library to avoid nan.

generateSteps

number of predictive feedback generative steps.

parameterList

logical to add list of invoked parameters.

showPlot

logical to plot results.

noTime

logical to allow input data with no time column.

Details

If embedded is FALSE, the data column(s) are embedded to dimension E with time lag tau. This embedding forms an n-columns * E-dimensional phase space for the SMap projection. If embedded is TRUE, the data are assumed to contain an E-dimensional embedding with E equal to the number of columns. See the Note below for proper use of multivariate data (number of columns > 1).

If ignoreNan is TRUE, the library (lib) is internally redefined to exclude nan embedding vectors. If ignoreNan is FALSE no library adjustment is made. The (lib) can be explicitly specified to exclude nan library vectors.

Predictions are made using leave-one-out cross-validation, i.e. observation rows are excluded from the prediction regression.

In contrast to Simplex, SMap uses all available neighbors and weights them with an exponential decay in phase space distance with exponent theta. theta=0 uses all neighbors corresponding to a global autoregressive model. As theta increases, neighbors closer in vicinity to the observation are considered.

Value

A named list with three data.frames [[predictions, coefficients, singularValues]]. predictions has columns Observations, Predictions. The first column contains time or index values.

coefficients data.frame has time or index values in the first column. Columns 2 through E+2 (E+1 columns) are the SMap coefficients.

singularValues data.frame has time or index values in the first column. Columns 2 through E+2 (E+1 columns) are the SVD singularValues. The first value corresponds to the SVD bias (intercept) term.

If parameterList = TRUE a named list "parameters" is added.

Note

SMap should be called with columns explicitly corresponding to dimensions E. In the univariate case (number of columns = 1) with default embedded = FALSE, the time series will be time-delay embedded to dimension E, SMap coefficients correspond to each dimension.

If a multivariate data set is used (number of columns > 1) it must use embedded = TRUE with E equal to the number of columns. This prevents the function from internally time-delay embedding the multiple columns to dimension E. If the internal time-delay embedding is performed, then state-space columns will not correspond to the intended dimensions in the matrix inversion, coefficient assignment, and prediction. In the multivariate case, the user should first prepare the embedding (using Embed for time-delay embedding), then pass this embedding to SMap with appropriately specified columns, E, and embedded = TRUE.

References

Sugihara G. 1994. Nonlinear forecasting for the classification of natural time series. Philosophical Transactions: Physical Sciences and Engineering, 348 (1688):477-495.

Examples

data(circle)
L = SMap( dataFrame = circle, lib="1 100", pred="110 190", theta = 4,
E = 2, embedded = TRUE, columns = "x y", target = "x" )

rEDM documentation built on Nov. 10, 2023, 5:08 p.m.