knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 6,
  fig.height = 6
)
options(rmarkdown.html_vignette.check_title = FALSE)
library(pbsEDM)

This vignette shows how to use the simplex algorithm in pbsEDM to analyse a simple time series and forecast the next value, and shows resulting graphics that help understand the algorithm (as described in the manuscript). The tibble NY_lags_example is saved in the package and contains several columns, though we are only going to use the original time series of simulated population numbers, $N_t$, which has column name N_t. See ?NY_lags_example for full details.

Plot the data in different ways

We want to plot the data in different ways, for which it is easiest to first run the simplex algorithm (to create a pbsEDM object for which code automatically makes the figures).

Use only the N_t column from NY_lags_example and apply the simplex algorithm using lags of 0 and 1 ($E = 2$) with first differencing:

E_2_res <- pbsEDM(NY_lags_example,
                  lags = list(N_t = c(0:1)),
                  first_difference = TRUE)

This creates a pbsEDM object which is a list of results (see ?pbsEDM for full details):

summary(E_2_res)

For now we'll just use that to plot the data in various useful ways:

plot(E_2_res,
     portrait = FALSE)

The figure shows, before using the results of the simplex algorithm, the data in various ways: time series of $N_t$ and $Y_t$, plus phase plots of $N_t$ vs $N_{t-1}$, $Y_t$ vs $Y_{t-1}$ (as used for $E=2$) and $Y_t$ vs $Y_{t-1}$ vs $Y_{t-2}$ (as used for $E=3$). The last 3 values of $N_t$ are in red, with the final one as a star. In the $Y_t$ time series panel, the values of $Y_t$ are also plotted in a vertical line (to the left of $t=0$) which is the one-dimensional equivalent of a phase plot for $E=1$.

An animated movie (with simplex results) is shown later below; to create the movie without the results use this code in an Rmarkdown document:

{r, animation.hook = 'gifski', interval = 1.5, fig.width = 9, fig.height = 5.36}
## or for portrait use fig.width = 5.36, fig.height = 8
for(iiii in 1:length(NY_lags_example$N_t)){
  plot(E_2_res,
       last.time.to.plot = iiii,
       portrait = FALSE)
}

A lot of information is saved in the E_2_res list. See ?pbsEDM::pbsEDM for full details, but, for example, useful summary statistics are given by

E_2_res$results

giving the embedding dimension $E$ (which is 2 here because we used lags of 0 and 1), forecast skill (correlation coefficient $\rho$) and root-mean-square error between the observations and the predictions.

Make predictions using different values of E

Now use the simplex algorithm for different values of embedding dimension $E$, where, for example $E=2$ uses first-difference values $Y_t$ and $Y_{t-1}$:

E_results <- pbsEDM_Evec(NY_lags_example$N_t)

The result is a list containing a pbsEDM object (as above) for each value of $E$, the default being values $E = 2, 3, ..., 11$:

summary(E_results)

There is a plotting function for the results, showing the data (as above) plus, in the final panel, the predicted value of each data point and the $\rho$ for each different value of $E$:

plot_pbsEDM_Evec(E_results,
                 portrait = FALSE)

A movie of that is shown here: pbsEDM_movie_1.gif

Brief technical aside for full reproducibility. That is a saved version to avoid R package building issues. To make the movie afresh, install the gifski package, delete the first line in the next chunk of code, uncomment the commented line and save the movie produced here directly from the html as pbsEDM_movie_1.gif; see the text above the movie on this page if you want more details of this.

# ```r
for(iiii in 1:length(NY_lags_example$N_t)){
  plot_pbsEDM_Evec(E_results,
       last.time.to.plot = iiii,
       portrait = FALSE)
}

Now see how the forecast skill (correlation coefficient $\rho$) varies with $E$:

plot_rho_Evec(E_results)

Each E_results component corresponds to a value of $E$, for example

E_2_res$results
E_results[[1]]$results
E_results[[2]]$results
E_results[[3]]$results

The conclusion from the figure would be that $E = 3$ gives the highest value of $\rho$, and so is best used for forecasting $\hat{N}_{101}$. This is given by:

E_results[[2]]$results           # [[2]] refers to E=3
E_results[[2]]$Y_forecast[100]   # the forecast for Y_{100}
E_results[[2]]$N_forecast[101]   # the forecast for N_{101}

In this case we obtain a negative forecast of $\hat{N}_{101}$ (which is explored as Aspect 3).

Movie to demonstrate the simplex algorithm for $E=2$

Understanding of the simplex algorithm can be enhanced with a specific animated example. With $t^* = 39$ we have Figure S.2 from the manuscript:

# ```r
plot_explain_edm_movie(E_2_res,
                       tstar = 39)

pbsEDM_movie_tstar_39.gif.

Such a movie is useful for checking any single prediction. For example, with $t^* = 15$ (see the technical aside above to make a new movie):

# ```r
plot_explain_edm_movie(E_2_res,
                       tstar = 15)

pbsEDM_movie_tstar_15.gif.

And changing to $t^* = 75$ shows the removal of an invalid candidate nearest neighbour from the library (right next to the blue solid circle) that would clearly have been used as a nearest neighbour if not removed:

# ```r
plot_explain_edm_movie(E_2_res,
                       tstar = 75)

pbsEDM_movie_tstar_75.gif.

For the prediction of $Y_{100}$ we set $t^* = 99$ to give:

# ```r
plot_explain_edm_movie(E_2_res,
                       tstar = 99)

pbsEDM_movie_tstar_99.gif.

In this case there are no candidate nearest neighbours to cross out, because $t^ = T-1$ and all defined points are candidates. This can be seen in matrix (6) in the manuscript by setting $E=2$, $T=100$ and $t^ = 99$ (besides ${\bf x}{t^*}$, only rows with a $\times$ get crossed out and since these are undefined they do not show up on the above plot anyway). Also there is no known true value of $Y{100}$ in the final plot.

In the R console you can create a pdf version of any such movie with, for example:

plot_explain_edm_movie_save(E_2_res,
                            pdf.filename = "movie_tstar_75.pdf",
                            tstar = 75)

Further details of the calculations

The intermediate calculations of the simplex algorithm are saved in the output for perusal (see ?pbsEDM). For example, to see the nearest neighbours for each valid $t^*$:

E_2_res$neighbour_index


luke-a-rogers/pbsedm documentation built on June 3, 2024, 5:20 a.m.