knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 6 ) options(rmarkdown.html_vignette.check_title = FALSE)
library(pbsEDM)
This vignette shows how to use the simplex algorithm in pbsEDM
to analyse a
simple time series and forecast the next value, and shows resulting graphics that help understand the
algorithm (as described in the manuscript).
The tibble NY_lags_example
is saved in the package and contains several columns,
though we are only going to use the original time series of simulated population numbers,
$N_t$, which has column name N_t
. See ?NY_lags_example
for full details.
We want to plot the data in different ways, for which it is easiest to first run
the simplex algorithm (to create a pbsEDM
object for which code automatically
makes the figures).
Use only the N_t
column from NY_lags_example
and apply the simplex algorithm
using lags of 0 and 1 ($E = 2$) with first differencing:
E_2_res <- pbsEDM(NY_lags_example, lags = list(N_t = c(0:1)), first_difference = TRUE)
This creates a pbsEDM
object which is a list of results (see ?pbsEDM
for full details):
summary(E_2_res)
For now we'll just use that to plot the data in various useful ways:
plot(E_2_res, portrait = FALSE)
The figure shows, before using the results of the simplex algorithm, the data in various ways: time series of $N_t$ and $Y_t$, plus phase plots of $N_t$ vs $N_{t-1}$, $Y_t$ vs $Y_{t-1}$ (as used for $E=2$) and $Y_t$ vs $Y_{t-1}$ vs $Y_{t-2}$ (as used for $E=3$). The last 3 values of $N_t$ are in red, with the final one as a star. In the $Y_t$ time series panel, the values of $Y_t$ are also plotted in a vertical line (to the left of $t=0$) which is the one-dimensional equivalent of a phase plot for $E=1$.
An animated movie (with simplex results) is shown later below; to create the movie without the results use this code in an Rmarkdown document:
{r, animation.hook = 'gifski', interval = 1.5, fig.width = 9, fig.height = 5.36} ## or for portrait use fig.width = 5.36, fig.height = 8 for(iiii in 1:length(NY_lags_example$N_t)){ plot(E_2_res, last.time.to.plot = iiii, portrait = FALSE) }
A lot of information is saved in the E_2_res
list. See ?pbsEDM::pbsEDM
for full details, but, for example, useful summary statistics are given by
E_2_res$results
giving the embedding dimension $E$ (which is 2 here because we used lags of 0 and 1), forecast skill (correlation coefficient $\rho$) and root-mean-square error between the observations and the predictions.
Now use the simplex algorithm for different values of embedding dimension $E$, where, for example $E=2$ uses first-difference values $Y_t$ and $Y_{t-1}$:
E_results <- pbsEDM_Evec(NY_lags_example$N_t)
The result is a list containing a pbsEDM
object (as above) for each value of $E$, the
default being values $E = 2, 3, ..., 11$:
summary(E_results)
There is a plotting function for the results, showing the data (as above) plus, in the final panel, the predicted value of each data point and the $\rho$ for each different value of $E$:
plot_pbsEDM_Evec(E_results, portrait = FALSE)
A movie of that is shown here:
Brief technical aside for full reproducibility. That is a saved version to avoid R package building issues.
To make the movie afresh, install the gifski
package, delete the first line in
the next chunk of code,
uncomment the commented line and save the movie produced here directly from the html as
pbsEDM_movie_1.gif
; see the text above the movie on
this page
if you want more details of this.
# ```r for(iiii in 1:length(NY_lags_example$N_t)){ plot_pbsEDM_Evec(E_results, last.time.to.plot = iiii, portrait = FALSE) }
Now see how the forecast skill (correlation coefficient $\rho$) varies with $E$:
plot_rho_Evec(E_results)
Each E_results
component corresponds to a value of $E$, for example
E_2_res$results E_results[[1]]$results E_results[[2]]$results E_results[[3]]$results
The conclusion from the figure would be that $E = 3$ gives the highest value of $\rho$, and so is best used for forecasting $\hat{N}_{101}$. This is given by:
E_results[[2]]$results # [[2]] refers to E=3 E_results[[2]]$Y_forecast[100] # the forecast for Y_{100} E_results[[2]]$N_forecast[101] # the forecast for N_{101}
In this case we obtain a negative forecast of $\hat{N}_{101}$ (which is explored as Aspect 3).
Understanding of the simplex algorithm can be enhanced with a specific animated example. With $t^* = 39$ we have Figure S.2 from the manuscript:
# ```r plot_explain_edm_movie(E_2_res, tstar = 39)
.
Such a movie is useful for checking any single prediction. For example, with $t^* = 15$ (see the technical aside above to make a new movie):
# ```r plot_explain_edm_movie(E_2_res, tstar = 15)
.
And changing to $t^* = 75$ shows the removal of an invalid candidate nearest neighbour from the library (right next to the blue solid circle) that would clearly have been used as a nearest neighbour if not removed:
# ```r plot_explain_edm_movie(E_2_res, tstar = 75)
.
For the prediction of $Y_{100}$ we set $t^* = 99$ to give:
# ```r plot_explain_edm_movie(E_2_res, tstar = 99)
.
In this case there are no candidate nearest neighbours to cross out, because $t^ = T-1$ and all defined points are candidates. This can be seen in matrix (6) in the manuscript by setting $E=2$, $T=100$ and $t^ = 99$ (besides ${\bf x}{t^*}$, only rows with a $\times$ get crossed out and since these are undefined they do not show up on the above plot anyway). Also there is no known true value of $Y{100}$ in the final plot.
In the R console you can create a pdf version of any such movie with, for example:
plot_explain_edm_movie_save(E_2_res, pdf.filename = "movie_tstar_75.pdf", tstar = 75)
The intermediate calculations of the simplex algorithm are saved in the output
for perusal (see ?pbsEDM
). For example, to see the nearest neighbours for each valid $t^*$:
E_2_res$neighbour_index
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.