knitr::opts_chunk$set( collapse = TRUE, comment = "", fig.width = 5, fig.height = 5 ) options(rmarkdown.html_vignette.check_title = FALSE)
library(pbsEDM)
In the manuscript, the library contains the candidate points that can be used to determine
the nearest neighbours of the focal point from which we are making a
prediction. Aspect 2 concerns which points can be considered
as being in the library, and which should be excluded. Here we show how our
suggested approach differs slightly from that in rEDM
, giving minor changes in
results. We also provide options to test alternative assumptions about which
candidate neighours to exclude.
We use the simulated population time series from the manuscript, with
first-differenced values denoted $Y_t$ at $t = 1, 2, 3, ..., 99, 100$. We
extract them from the saved tibble NY_lags_example_3
:
input <- dplyr::select(NY_lags_example_3, t, Y_t) %>% dplyr::rename(Time = t) input
Note that NY_lags_example_3
already contains results. At the end of this
vignette we verify that results using rEDM
or pbsEDM
have not changed (for
example due to package updates) from those saved in NY_lags_example_3
.
We will do the simplex calculations for embedding dimension $E = 2$ using
rEDM
and pbsEDM
(and also compare with some earlier independent code).
First, use the rEDM
package:
# library(rEDM) packageVersion("rEDM") # We used '1.14.0' was published 7th January 2023 rEDM_res <- rEDM::Simplex(dataFrame = input, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE) # lib and pred overlap, so leave-one-out # cross-validation is used. rEDM_res <- rbind(c(1, input$Y_t[1], NA, NA), rEDM_res) # Add Time = 1 values, to give t = 1:100 rEDM_pred <- rEDM_res$Predictions # Just the predicted values
See the analyse_simple_time_series
vignette
for general details on using the code. Here do the simplex calculation using pbsEDM
:
pbsEDM_res_full <- pbsEDM::pbsEDM(input, lags = list(Y_t = c(0:1))) # Gives full calculations pbsEDM_pred <- pbsEDM_res_full$X_forecast[-length(pbsEDM_res_full$X_forecast)] # Just the predictions, excluding # t=101, so for t = 1:100
rEDM
results with those from pbsEDM
Now to plot the pbsEDM
predictions again those from rEDM
, and highlight
in red those that differ:
plot(rEDM_pred, pbsEDM_pred, xlab = "rEDM predictions", ylab = "pbsEDM predictions") abline(a=0, b=1, col="grey") # Work out the ones more than epsilon away epsilon = 0.0001 # Small value for how different the predictions can be different <- dplyr::tibble(t = input$Time, rEDM_pred, pbsEDM_pred) %>% dplyr::filter(abs(rEDM_pred - pbsEDM_pred) > epsilon) points(different$rEDM_pred, different$pbsEDM_pred, col = "red", pch = 20)
The time indices correspond to the predicted times, so represent $t^+1$ values, where $t^$ is the focal time from which we make predictions:
different
So there are two time indices for which results differ between pbsEDM
and
rEDM
, corresponding to $t^* =$ r different$t[1] - 1
and r different$t[2] - 1
.
As defined in the manuscript, vector $\bf{\tilde{x}}t$ has components defined by the lagged values of scalars $Y_t$. For $E=2$ we have $$\bf{\tilde{x}}_t = [Y_t, ~Y{t-1}].$$ For focal time $t^$, we know $Y_{t^}$ and are trying to predict $Y_{t^*+1}$.
In the manuscript we presented exclusion condition (d), namely that the library of allowable nearest neighbours cannot include any $\bf{x}t$ that includes $Y{t^+1}$, since we are trying to predict $Y_{t^+1}$.
Thus, the library does not include $$\bf{x}{t^+1} = [Y_{t^+1}, ~Y{t^}]$$ or $$\bf{x}_{t^+2} = [Y_{t^+2}, ~Y_{t^+1}]$$ since these both contain $Y_{t^*+1}$.
For $t^*=94$, we have the aforementioned discrepancy between calculations:
t_star <- 94 rEDM_pred[t_star + 1] pbsEDM_pred[t_star + 1]
With pbsEDM
, the nearest neighbours are saved as output, and for $t^*=94$ have
indices and weights:
pbsEDM_res_full$neighbour_index[t_star,] pbsEDM_res_full$neighbour_weight[t_star,]
giving the above prediction for $Y_{95}$ of
pbsEDM_pred[t_star+1]
We can verify this calculation directly from equation (13) in the manuscript using the just-calculated weights and neighbour indices:
sum(pbsEDM_res_full$neighbour_weight[t_star,] * input[pbsEDM_res_full$neighbour_index[t_star,] + 1, "Y_t"]) / sum(pbsEDM_res_full$neighbour_weight[t_star,])
For rEDM
the full details of nearest neighbours do not appear to be an
available output. However, here we show that $\bf{x}{96}$ was used by rEDM
as a nearest
neighbour of $\bf{x}{94}$. We first
simply change the value of
$Y_{95}$ so that rEDM::Simplex()
will not pick
$$\bf{x}{95} = [Y{95},~Y_{94}]$$
or
$$\bf{x}{96} = [Y{96}, ~Y_{95}]$$
as a nearest neighbour of $\bf{x}_{94}$, and see if we get the same result as
for pbsEDM
.
input[t_star+1, ] input_change <- input input_change[t_star+1, "Y_t"] = 7 # Move it far out of the way so it will not be a # nearest neighbour # Do Simplex calculation with same options as earlier rEDM_change_res <- rEDM::Simplex(dataFrame = input_change, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE) rEDM_change_res <- rbind(c(1, input_change$Y_t[1], NA, NA), rEDM_change_res) # Add Time = 1 values, to give t = 1:100 rEDM_change_pred <- rEDM_change_res$Predictions # Just the predicted values rEDM_change_pred[t_star+1]
That is the same value as obtained
earlier using pbsEDM
.
We now do the same calculation but changing $Y_{96}$ so that $$\bf{x}{96} = [Y{96}, ~Y_{95}]$$ cannot be a nearest neighbour of $\bf{x}{94}$ (but $\bf{x}{95}$ still can be):
input_change <- input input_change[t_star+2, "Y_t"] = 7 # Move it far out of the way so it will not be a # nearest neighbour # Do Simplex calculation with same options as earlier rEDM_change_res <- rEDM::Simplex(dataFrame = input_change, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE) rEDM_change_res <- rbind(c(1, input_change$Y_t[1], NA, NA), rEDM_change_res) # Add Time = 1 values, to give t = 1:100 rEDM_change_pred <- rEDM_change_res$Predictions # Just the predicted values rEDM_change_pred[t_star+1]
Since this again agrees with the
pbsEDM
calculation, we conclude
that
$$\bf{x}{96} = [Y{96}, ~Y_{95}]$$
was
being included in the library of
candidate nearest neighbours by
rEDM
, since by moving $Y_{95}$ or
$Y_{96}$ far away it is no longer a
nearest neighbour.
As a further check, we test our new exclusion_radius
option in
pbsEDM::pbsEDM()
with a value of 0,
which should match the rEDM
default results.
pbsEDM_res_excl_0_full <- pbsEDM::pbsEDM(input, lags = list(Y_t = c(0:1)), exclusion_radius = 0) pbsEDM_pred_excl_0 <- pbsEDM_res_excl_0_full$X_forecast[-length(pbsEDM_res_excl_0_full$X_forecast)] plot(rEDM_pred, pbsEDM_pred_excl_0, xlab = "rEDM predictions", ylab = "pbsEDM predictions") abline(a=0, b=1, col="grey") testthat::expect_equal(rEDM_pred, pbsEDM_pred_excl_0, tolerance = epsilon) # No error means they match
This shows that we can reproduce the rEDM
results using pbsEDM
with exclusion_radius = 0
.
We now repeat the above calculations for $t^*=75$. Skipping some of the explanatory steps, we have:
t_star <- 75 rEDM_pred[t_star + 1] pbsEDM_pred[t_star + 1]
as noted above (below the figure).
Using pbsEDM
and $t^*=75$, the indices and weights of the nearest neighbours are:
pbsEDM_res_full$neighbour_index[t_star,] pbsEDM_res_full$neighbour_weight[t_star,]
Change the value of $Y_{76}$ so that rEDM::Simplex()
will not pick
$\bf{x}{76}$ or $\bf{x}{77}$ as a nearest neighbour for $\bf{x}_{75}$:
input[t_star+1, ] input_change <- input input_change[t_star+1, "Y_t"] = 7 # Move it far out of the way so it will not be a # nearest neighbour # Do Simplex calculation with same options as earlier rEDM_change_res <- rEDM::Simplex(dataFrame = input_change, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE) rEDM_change_res <- rbind(c(1, input_change$Y_t[1], NA, NA), rEDM_change_res) # Add Time = 1 values, to give t = 1:100 rEDM_change_pred <- rEDM_change_res$Predictions # Just the predicted values rEDM_change_pred[t_star+1]
We indeed get the same result as
obtained earlier using pbsEDM
.
Now instead change $Y_{77}$ so that $\bf{x}{77}$ cannot be a nearest neighbour (but $\bf{x}{76}$ still can be):
input[t_star+2, ] input_change <- input input_change[t_star+2, "Y_t"] = 7 # Move it far out of the way so it will not be a # nearest neighbour # Do Simplex calculation with same options as earlier rEDM_change_res <- rEDM::Simplex(dataFrame = input_change, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE) rEDM_change_res <- rbind(c(1, input_change$Y_t[1], NA, NA), rEDM_change_res) # Add Time = 1 values, to give t = 1:100 rEDM_change_pred <- rEDM_change_res$Predictions # Just the predicted values rEDM_change_pred[t_star+1]
This does not match the pbsEDM
result. Thus,
since only replacing $Y_{t^+1} = Y_{76}$ gives the matching result to
pbsEDM
, it must be the case that
rEDM
was considering
$$\bf{x}_{t^+1} =
[Y_{t^+1},~Y_{t^}]$$
i.e.
$$\bf{x}{76} = [Y{76}, ~Y_{75}]$$ to be a nearest neighbour of $\bf{x}{75}$ (and not $\bf{x}{77}$).
Note that rEDM
prediction of r rEDM_change_pred[t_star+1]
is now
very large. This is because rEDM
is still finding $\bf{x}{76}$ to be a
nearest neighbour. In the next time
step this moves to $\bf{x}{77} =
[Y_{77}, ~Y_{76}]$, but we have just
made $Y_{77}$ very large and this
value gets used in the calculation of
$\hat{Y}_{76}$ (equation 13 in the
manuscript), which is therefore very large.
Thus, we have demonstrated two examples for which rEDM
has used $Y_{t^ + 1}$
as a nearest neighbours (through
$\bf{x}_{t^+2}$ for $t^=94$ and
$\bf{x}_{t^+1}$ for $t^*=75$).
Since the results differ for pbsEDM
and rEDM
, we expect
a slight difference in the overall Pearson correlation coefficient:
rEDM_rho <- rEDM::ComputeError(rEDM_res$Observations, rEDM_res$Predictions)$rho rEDM_rho pbsEDM_rho <- pbsEDM_res_full$results$X_rho pbsEDM_rho
So the correlation coefficient is actually slight higher for the pbsEDM
results compared to the rEDM
results. Thus, for this example there is an
improved fit from accommodating our suggested neighbour exclusions.
exclusionRadius
settings for rEDM
We now try changing the exclusionRadius
input value in rEDM::Simplex
, which (from ?rEDM::Simplex
)
"excludes vectors
from the search space of nearest neighbors if their relative time
index is within exclusionRadius". This excludes temporal neighbours both
forward and backwards in time, whereas we suggest (our equation (6)) to just
remove the $E$ forward neighbours.
So, re-using the above code and trying exclusionRadius
=1 (the
default is 0):
rEDM_res_excl_1 <- rEDM::Simplex(dataFrame = input, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE, exclusionRadius = 1) rEDM_res_excl_1 <- rbind(c(1, input$Y_t[1], NA, NA), rEDM_res_excl_1) # Add Time = 1 values, to give t = 1:100 rEDM_pred_excl_1 <- rEDM_res_excl_1$Predictions # Just the predicted values different_excl <- dplyr::tibble(t = input$Time, pbsEDM_pred, rEDM_pred, rEDM_pred_excl_1) # See what has changed from original rEDM results dplyr::filter(different_excl, abs(rEDM_pred - rEDM_pred_excl_1) > epsilon)
The $t$ values shown are for the predictions at $t = t^ + 1$. So the rEDM
results have changed for $t^ = 75$ (which is good) and for 76 (which was not
expected), but not for the desired $t^* = 94$.
Now check what is different to pbsEDM
results:
dplyr::filter(different_excl, abs(pbsEDM_pred - rEDM_pred_excl_1) > epsilon)
So $t^ = 75$, which we just found had changed with the exclusionRadius
=1, is
not shown here meaning that it now matches the pbsEDM
result.
But $t^ = 94$ is still different to the pbsEDM
results, so the
correction is not global, and $t^* = 76$ is now different.
So we now try setting exclusionRadius
=2:
rEDM_res_excl_2 <- rEDM::Simplex(dataFrame = input, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE, exclusionRadius = 2) rEDM_res_excl_2 <- rbind(c(1, input$Y_t[1], NA, NA), rEDM_res_excl_2) # Add Time = 1 values, to give t = 1:100 rEDM_pred_excl_2 <- rEDM_res_excl_2$Predictions # Just the predicted values different_excl <- dplyr::mutate(different_excl, rEDM_pred_excl_2) # See what has changed from original rEDM results dplyr::filter(different_excl, abs(rEDM_pred - rEDM_pred_excl_2) > epsilon)
So with exclusionRadius
=2, the rEDM
results have changed from the original
rEDM
predictions for $t^* = 75, 76, 94$ and $96$. Checking what is now different to the
pbsEDM
results:
dplyr::filter(different_excl, abs(pbsEDM_pred - rEDM_pred_excl_2) > epsilon)
So $t^ = 76$ and $96$ are different to the pbsEDM
results. So while setting
exclusionRadius
equal to 2
seems to have led to results agreeing for $t^ = 75$ and $94$, predictions now
differ for $t^* = 76$ and $96$.
Repeating the above with exclusionRadius
=3 leads to five differences with the pbsEDM
results (results not shown).
So, as expected, changing exclusionRadius
in the call to rEDM::Simplex()
does not give exactly the same results as our suggested default.
exclusion_radius
settingWe also try exclusion_radius = 7
in pbsEDM::pbsEDM()
, just picking 7 as a
new example, as a further check of our code, comparing with rEDM
calculations:
pbsEDM_res_excl_7_full <- pbsEDM::pbsEDM(input, lags = list(Y_t = c(0:1)), exclusion_radius = 7) pbsEDM_pred_excl_7 <- pbsEDM_res_excl_7_full$X_forecast[-length(pbsEDM_res_excl_7_full$X_forecast)] rEDM_res_7 <- rEDM::Simplex(dataFrame = input, columns = "Y_t", target = "Y_t", lib = "1 99", pred = "1 99", E = 2, verbose = TRUE, exclusionRadius = 7) rEDM_res_7 <- rbind(c(1, input$Y_t[1], NA, NA), rEDM_res_7) # Add Time = 1 values, to give t = 1:100 rEDM_pred_7 <- rEDM_res_7$Predictions # Just the predicted values testthat::expect_equal(rEDM_pred_7, pbsEDM_pred_excl_7, tolerance = epsilon) # No error means they match
Thus, the calculations agree (and we checke they also do for all values of
exclusion_radius
up to 10).
Some of the above differences were originally figured out using other
independent code (written by Andrew
Edwards) to help understand EDM. The results are saved as
NY_lags_example_3$my.pred
. The predictions agree with those calculated here
using pbsEDM
:
testthat::expect_equal(NY_lags_example_3$my.pred, pbsEDM_pred)
The original code also allowed independent verification of the above results (for example
by explicitly allowing $Y_{t^* + 1}$ to be used as a nearest neighbour). The
original code is available on request, but is not functionalised or
generalisable for $E>2$, and is now superceded by the functions in pbsEDM
. Note
that writing of the pbsEDM
code was led by Luke Rogers, using functions that were
independently written from the original code. Hence the original code also served as an
independent check of some of the pbsEDM
code.
Here we run some tests to check that if any of the above results from pbsEDM
or rEDM
have changed from when this vignette was written (some results are
saved in NY_lags_example_3
). The tests will give errors if they fail (and any
such failures should be investigated and this vignette updated).
Check the default rEDM
predictions have not changed.
testthat::expect_equal(NY_lags_example_3$rEDM.pred, rEDM_res$Predictions) # If no error then they match
Also check that the results with different values of exclusionsRadius
have not changed:
testthat::expect_equal(check_rEDM_excl_1, rEDM_pred_excl_1) testthat::expect_equal(check_rEDM_excl_2, rEDM_pred_excl_2)
The test in the Independent code
section has already verified that the pbsEDM
results have not changed (since they match the my.pred
results from the original
independent code).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.