Keywords {-}

causal inference, empirical dynamics, forecasting, nonlinear dynamics, time series

Running Title {-}

rEDM: Empirical Dynamic Modeling

\newpage

knitr::opts_chunk$set(fig.width = 4.5, fig.height = 3)

Introduction

In contrast to other scientific fields, ecology lacks governing equations. Moreover, ecosystems are inherently complex, with behavior arising from nonlinear interactions among components and across scales. Thus, it is challenging to model ecological time series with the conventional approach of fitting parametric equations---these equations can be impractical when the exact mechanisms are unknown or when data are too short, noisy, or incomplete to effectively parameterize them. Empirical approaches, which infer patterns and associations from the data (instead of using hypothesized functional forms), represent an alternative and highly flexible approach.

Empirical dynamic modeling (EDM) is an emerging framework for modeling time series empirically, by reconstructing the behavior of the underlying dynamic systems from time series observations, and is based on the mathematical theory of attractor reconstruction [@Takens_1981; @Sauer_1991; @Casdagli_1991; @Deyle_2011]. Because EDM operates with minimal assumptions, it is well-suited for modeling ecological time series. The rEDM package is an implementation of EDM in R, and builds on software developed in the Sugihara Lab [@Ye_2018]. Here, we review the theoretical background for empirical dynamic modeling (EDM) and the functionality of the rEDM package.

Empirical Dynamic Modeling

Time Series as Observations of a Dynamic System

We review the basic concepts underlying EDM, but see @Sauer_1991 and @Chang_2017 for more detailed treatments. Briefly, time series are sequential observations of a dynamic system, which consists of a state space and dynamical rules for how the state changes over time. At any point in time, the state is a point in the state space (Figure \@ref(fig:time-series-projection)), whose axes are the fundamental state variables. The dynamical rules describe how the system state changes (i.e. how the point moves in the state space), and are often (but not always) represented as governing equations.

For example, an experimental chemostat might be described with state variables for plankton abundance and the environmental factors of temperature, nutrients, and light. The dynamical rules can be represented as equations for growth and nutrient uptake, and the experimental settings for temperature, light, and nutrient input (e.g. constant or following a diel cycle).

Time series can be generated by projecting from the state space to the corresponding coordinate axis. For example, in Figure \@ref(fig:time-series-projection), states of the canonical Lorenz Attractor [-@Lorenz_1963] are projected to the $x$-axis, creating a time series of $x$. Time series frequently correspond to individual state variables; more generally, a time series can be any function of one or more state variables, such as a time series of total plant cover constructed from the sum of state variables for the cover of each species.

(ref:time-series-projection-caption) Projecting the system state of the Lorenz Attractor to the $x$-axis produces a time series of $x$.

knitr::include_graphics("figure_1.pdf")

The conventional approach to modeling a dynamic system is to simulate the governing equations, and fit them to observations to determine parameter values. Uncertainty about the form of the equations is typically addressed through model selection, or is incorporated into an error term. However, this requires that the state variables are known, that they are observed sufficiently, and that the governing equations are known or can be reasonably approximated from theory. These assumptions can be problematic when dealing with time series observed from complex, nonlinear systems.

Attractor Reconstruction / Takens' Theorem

The EDM framework resolves these issues using the mathematical theory of attractor reconstruction [@Takens_1981; @Sauer_1991; @Casdagli_1991; @Deyle_2011]. The essential idea is to use an alternative representation of the system state, composed of lags of a single time series. For example, in the canonical Lorenz Attractor, the system state at time $t$ is conventionally represented as $\left(x_t, y_t, z_t \right)$, where $x$, $y$, and $z$ are the three state variables that make up the system (Figure \@ref(fig:time-series-projection)). However, we can use an alternative representation consisting only of lags of $x$: $\mathbf{x}t = \left( x_t, x{t-\tau}, \dots, x_{t-(E-1)\tau} \right)$, where $E$ is the embedding dimension, or the number of lags.

(ref:attractor-reconstruction-caption) Reconstruction of the Lorenz Attractor using 3 lags of x: $\mathbf{x}t = \left( x_t, x{t-\tau}, \dots, x_{t-(E-1)\tau} \right)$. The reconstruction maps smoothly to the original Lorenz Attractor (Figure \@ref(fig:time-series-projection)).

knitr::include_graphics("figure_2.pdf")

This approach preserves the mathematical properties of the original system, provided that a sufficient number lags are used [@Takens_1981]. More specifically, points in the reconstructed space, $\mathbf{x}_t$, map smoothly to points in the original state space. This relationship makes it possible to model the system using just a single time series, even without direct observations of all three variables x, y, and z.

For example, in the Lorenz Attractor, we would ordinarily need to model the behavior of $x$ as a function of all of the state variables: $x_{t+1} = F\left(x_t, y_t, z_t\right)$. However, we can instead use the reconstructed state space: $x_{t+1} = G\left(\mathbf{x}t\right) = G\left(x_t, x{t-\tau}, \dots, x_{t-(E-1)\tau} \right)$. Figure \@ref(fig:attractor-reconstruction) demonstrates a reconstruction that uses 3 lags of $x$, which shows a visual correspondence with the original Lorenz attractor.

In practice, the application of this technique also requires selecting an appropriate time lag ($\tau$), embedding dimension ($E$), and methods for inferring $G$. We demonstrate how the rEDM software package can be used to accomplish these tasks, in the contexts of forecasting [@Sugihara_1990], testing for nonlinear behavior [@Sugihara_1994], and testing whether two time series variables belong to the same system and might be causally related [@Sugihara_2012].

The rEDM package

Selecting the embedding dimension using simplex projection

As mentioned previously, attractor reconstruction requires a sufficient number of lags [@Takens_1981]. If the number of lags is too few, then the reconstructed space ($\textbf{x}_t$) will not map smoothly to the original state space. Intuitively, we need enough lags to capture the influence of all of the state variables (see also [@Whitney_1936]). Practically, the number of lags, or embedding dimension, is determined empirically as a property of the data; we use forecast skill as the metric for selecting an optimal embedding dimension.

Example

In this example, time series are generated from a simulation of the tent map, a discrete-time dynamic system where a sequence, $x_t$, on the interval $[0, 1]$ is iterated according to:

\begin{equation} x_{t+1} = \begin{cases} 2x_t, & x_t < \frac{1}{2}\ 2(1-x_t), & x_t \ge \frac{1}{2} \end{cases} \end{equation}

The "first differences" ($x_{t+1}-x_{t}$) of the tentmap time series are included in rEDM:

library(rEDM)

data(tentmap_del)
str(tentmap_del)

To select the portions of the time series to be used for fitting the model and testing the model, we define lib and pred variables:

lib <- c(1, 100)
pred <- c(201, 500)

lib <- c(1, 100) indicates that rows 1 through 100 of the data are used for fitting the model, and pred <- c(201, 500) indicates that rows 201 through 500 of the input are used for testing the model.

Since the time series come from a discrete map, a time lag of tau = 1 is appropriate (and is the default parameter value). In addition, because we know the dynamics are fairly simple, the default range of E = 1:10 for the embedding dimension will suffice. More generally, in the absence of prior knowledge about the system, it is reasonable to test values up to $\sim sqrt(n)$ with $n$ being the length of the time series [Cheng_1994]).

simplex_output <- simplex(time_series = tentmap_del, 
                          lib = lib, pred = pred, 
                          tau = 1, E = 1:10)

The simplex() function makes forecasts, employing Simplex Projection to infer the dynamical function $G$ from the data using nearest neighbor approximation [@Sugihara_1990].

str(simplex_output)

The output is a data.frame with one row for each individual model run. Here, there are 10 rows because we chose to vary the embedding dimension from 1 to 10. The first four columns are the model parameters: E, embedding dimension; tau, time lag between successive dimensions; tp, time to prediction; and nn, number of nearest neighbors. Next are the forecast statistics: num_pred, the number of predictions made; rho, Pearson's correlation coefficient between predictions and observations; mae, mean absolute error of predictions; rmse, root mean squared error of predictions; perc, the percent of predictions that are the same sign as observations; and p_val, the p-value for rho being significantly greater than 0, using Fisher's transformation [-@Fisher_1915]. For the purpose of comparison, the same forecast statistics are provided for a naive constant predictor, $\hat{x}_{t+tp} = x_t$.

(ref:simplex-caption) Forecast skill (rho) vs. Embedding Dimension (E) for the tentmap_del time series. A peak at E = 2 indicates that 2 total lags are optimal for reconstructing the dynamics.

par(mar = c(4, 4, 1, 4), mgp = c(2.5, 1, 0))
plot(rho ~ E, data = simplex_output, type = "l",  
     xlab = "Embedding Dimension (E)", ylab = "Forecast Skill (rho)")

Figure \@ref(fig:tentmap-simplex) shows that forecast skill peaks at E = 2, indicating that the dynamics of the tentmap_del time series are best represented using 2 lags. Note that the optimal embedding dimension may not correspond to the dimensionality of the underlying system. Indeed, because forecast skill is affected by observational noise, process error, time series length, etc., the optimal embedding dimension is best viewed as an emergent property of the data.

Identifying Nonlinearity

One concern is that the ability to forecast may result from temporal autocorrelation in the time series. Because we infer a function, $G$ that relates lags of a time series to its next value, this will also work when modeling temporally autocorrelated data (i.e. red noise). To distinguish between time series that contain deterministic nonlinear dynamics and red noise, we employ S-maps [@Sugihara_1994].

Whereas Simplex Projection approximates $G$ using nearest neighbors in the reconstruction space [@Sugihara_1990], the S-map uses local linear maps [@Sugihara_1994]. For each individual forecast, a local linear map is fitted with the data points assigned weights based on their distance (typically the Euclidean distance) to the point being forecast and the nonlinear tuning parameter, $\theta$. This allows the S-map to approximate any potential function, with $\theta$ controlling how nonlinear $G$ can be.

When $\theta = 0$, all points receive equal weights, and the local linear map is the same across the reconstruction space, making $G$ equivalent to a single global linear map. For $\theta > 0$, nearby points receive larger weights, and the local linear map depends more on similar system states. Larger values of $\theta$ allow the forecast function, $G$ to vary more across the state-space, similar to the length-scale of a locally-weighted regression [@Cleveland_1979].

To distinguish between red noise and deterministic nonlinear dynamics, we evaluate how forecast skill depends on $\theta$. For red noise, a single global map ($\theta = 0$) should produce the best forecasts, because it is fit to all of the data points, thereby minimizing the effects of observational noise. However, for nonlinear dynamics, the true forecast function, $G$, will vary as a function of system state, and forecast skill will improve when $\theta > 0$.

Example

The function s_map() implements the S-map method. We use the same data as the previous example, and fix E = 2 based on the results from simplex projection. Similarly, the default values for tau and tp remain appropriate -- if we had used different values when determining the optimal embedding dimension, we would do the same here. We also use the default set of values for theta, ranging from 0 to 8.

smap_output <- s_map(time_series = tentmap_del, 
                     lib = lib, pred = pred, 
                     E = 2)

Again, the result is a data.frame with rows for each model run, and columns for the model parameters and forecast statistics. Note that num_neighbors = 0: this parameter controls how many of the nearest points the S-map is allowed to be fit to (which can ease calculations when there are a large number of data points). A value of 0 tells the S-map to use all the points.

(ref:smap-caption) Forecast skill (rho) vs. Nonlinearity (theta) for the tentmap_del time series. Increased forecast skill for theta > 0 suggests nonlinear dynamics.

par(mar = c(4, 4, 1, 4), mgp = c(2.5, 1, 0))
plot(smap_output$theta, smap_output$rho, type = "l",
     xlab = "Nonlinearity (theta)", ylab = "Forecast Skill (rho)")

Figure \@ref(fig:tentmap-smap) shows that forecast skill substantially improves as $\theta$ increases, indicating nonlinear dynamics. Typically, we would expect forecast skill to decrease as $\theta$ continues to increase, because the model will become overfit to individual points. However, because the data in this example are observed without any noise, the S-map continues to converge closer to the true function as $\theta$ increases.

Simulating the S-map analysis with additive observational error gives a result that is more typical of real data (Figure \@ref(fig:tentmap-smap-noise)).

ts_err <- tentmap_del + rnorm(length(tentmap_del), 
                              sd = sd(tentmap_del) * 0.2)
smap_output_err <- s_map(time_series = ts_err, 
                         lib = lib, pred = pred, 
                         E = 2)

(ref:smap-noise-caption) With added observational noise, forecast skill (rho) increases and then decreases with theta.

par(mar = c(4, 4, 1, 4), mgp = c(2.5, 1, 0))
plot(smap_output_err$theta, smap_output_err$rho, type = "l",
     xlab = "Nonlinearity (theta)", ylab = "Forecast Skill (rho)")

Generalized Takens's Theorem

Rather than representing the system state with lags of a single time series [@Takens_1981], we can use different time series observed from the same system [@Sauer_1991; @Deyle_2011]. These multivariate representations can be more effective when data are limited and noisy, or necessary when there are stochastic drivers in the system that need to be explicitly included (see @Casdagli_1991 for more details).

The block_lnlp() function generalizes the simplex() and s_map() functions: any combination of time series (and their lags) can be used, with either the Simplex Projection or S-map. Instead of lags of one time series, the coordinates of the reconstruction are manually specified as columns of the input data (columns); and similarly for the variable to be forecast (the target_column argument). If lags of a time series are desired as coordinates, they need to be precomputed as additional columns (e.g. via the make_block() function).

Example

We begin with an example dataset from a coupled 3-species model system.

data(block_3sp)
str(block_3sp)

block_3sp is a 10-column data-frame, consisting of time, and 3 lags of each of the variables: unlagged (_t-1), lag-1 (_t-1), and lag-2 (_t-2). Note that the lagged columns begin with NA values because the entries correspond to missing observations of the variables (e.g. x at time 0 or -1). Points that contain missing values are automatically excluded from fitting and forecasting.

Columns can be referred to by numerical index (starting with 1) or column name. The optional argument first_column_time is used to indicate that the input data has a time index in the first column is a time index, and should be skipped. For example, to use the coordinates $\left(x_t, x_{t-1}, y_t\right)$ to predict future values of $x_t$, we would specify columns = c(1, 2, 4), target_column = 1 , and first_column_time = TRUE.

lib <- c(1, 100)
pred <- c(101, 200)

block_lnlp_output <- block_lnlp(block_3sp, lib = lib, pred = pred,
                                columns = c(1, 2, 4), 
                                target_column = 1,
                                first_column_time = TRUE,
                                stats_only = FALSE)

We can also refer to columns by name. Here, specifying first_column_time = TRUE is unnecessary, but will aid in labeling the predictions with the correct time value.

block_lnlp_alt <- block_lnlp(block_3sp, lib = lib, pred = pred,
                             columns = c("x_t", "x_t-1", "y_t"), 
                             target_column = "x_t", 
                             first_column_time = TRUE, 
                             stats_only = FALSE)

# test for equality
stopifnot(identical(block_lnlp_output, block_lnlp_alt))

As in simplex() and s_map(), the default value for the tp argument is 1---predictions are 1-step ahead. That is, the forecast model is $G: \left(x_t, x_{t-1}, y_t\right) \rightarrow x_{t+\texttt{tp}}$) Note that in cases where the dependent variable is not forecasted, but is included as a separate column, tp should be set to 0.

str(block_lnlp_output)

With stats_only = FALSE, the output also includes a list column (model_output), where each element is a data.frame that contains observed and predicted values for that model run. We can extract these values to inspect how individual predictions compare to observed values (Figure \@ref(fig:block-pred-vs-obs)).

list_of_model_predictions <- block_lnlp_output$model_output
first_model_predictions <- list_of_model_predictions[[1]]

observed <- first_model_predictions$obs
predicted <- first_model_predictions$pred

(ref:block-pred-vs-obs-caption) The predictions of variable $x$ against the observed values of $x$; perfect predictions will fall on the 1:1 line (blue dashed line).

par(pty = "s")
plot_range <- range(c(observed, predicted), na.rm = TRUE)
plot(observed, predicted, xlim = plot_range, ylim = plot_range,
     xlab = "Observed", ylab = "Predicted", asp = 1)
abline(a = 0, b = 1, lty = 2, col = "blue")

S-map Coefficients

As described in @Deyle_2016, the S-map coefficients can be interpreted as dynamic, time-varying interaction strengths, given an appropriate choice of reconstruction coordinates. We demonstrate this using $\left(x_t, y_t, z_t\right)$ to predict $x_{t+1}$ for the 3-species simulation described previously.

data(block_3sp)
lib <- c(1, 100)
pred <- c(101, 200)

block_smap <- block_lnlp(block_3sp, lib = lib, pred = pred,
                         columns = c("x_t", "y_t", "z_t"), 
                         target_column = "x_t",
                         method = "s-map", theta = 2,
                         stats_only = FALSE, 
                         first_column_time = TRUE,
                         save_smap_coefficients = TRUE)

As with model_output, the smap_coefficients column is a list-column, where the elements are data.frames with the S-map coefficients.

smap_coeffs <- block_smap$smap_coefficients[[1]]
str(smap_coeffs)

(ref:smap-coeffs-caption) A. Time series of $x$ and its predictions (points). B-D. The inferred effects of $x$, $y$, and $z$ from the S-map model.

par(mfrow = c(4, 1), mar = c(2, 4, 1, 4), oma = c(2, 0, 0, 0),
    mgp = c(2.5, 1, 0), yaxs = "i", xaxs = "i")

cols <- viridis::viridis(4, option = "D")
predictions <- block_smap$model_output[[1]]
make_panel <- function(y, col = "black", ..., label = "A")
{
    plot(predictions$time, y, xlim = c(100, 200), ylim = c(-2, 2), 
         type = "l", col = col, xlab = "", lwd = 1.5, ...)
    mtext(label, side = 3, at = 88, line = -0.2)
}

make_panel(predictions$obs, ylab = "x", col = "gray")
points(predictions$time, predictions$pred, lty = 2, pch = 3)
legend(x = 200, y = 2, legend = "predictions", pch = 3, bty = "n", 
       xjust = 1, yjust = 0.75, adj = c(0, 0.5), xpd = TRUE)

make_panel(smap_coeffs[, 1], col = cols[1], ylab = "effect of x", label = "B")
make_panel(smap_coeffs[, 2], col = cols[2], ylab = "effect of y", label = "C")
make_panel(smap_coeffs[, 3], col = cols[3], ylab = "effect of z", label = "D")
mtext("Time", side = 1, outer = TRUE, line = 0.5, cex = 0.8)

Here, smap_coeffs is a data.frame with 100 rows (the number of predictions) and 4 columns (the number of predictors plus one for a constant). Figure \@ref(fig:smap-coeffs) shows the observed and predicted values of x_{t+1} and the influence of x_t, y_t, and z_t, respectively, on the predictions.

Causal Inference and Cross Mapping

One corollary to Takens' Theorem [-@Takens_1981] is that smooth maps exist between multiple reconstructions of the same system. Consider a dynamic system that contains two interacting variables, $x$ and $y$. The univariate reconstructions, based on lags of $x$ ( $\mathbf{x}t = \left( x_t, x{t-\tau}, \dots, x_{t-(E_x-1)\tau} \right)$) and $y$ ( $\mathbf{y}t = \left( y_t, y{t-\tau}, \dots, y_{t-(E_y-1)\tau} \right)$) respectively, each map smoothly to the original state space. Thus, there is also a smooth map between $\mathbf{x}_t$ and $\mathbf{y}_t$ (Figure \@ref(fig:cross-mapping)). We can test for the map between $\mathbf{x}_t$ and $\mathbf{y}_t$ by measuring the predictive skill of mapping from $\mathbf{x}_t$ to $y_t$ and from $\mathbf{y}_t$ to $x_t$.

(ref:cross-mapping-caption) Smooth maps between reconstructions of the Lorenz Attractor. The one on the left uses lags of $x$ and the one on the right uses lags of $y$. By Takens' Theorem [-@Takens_1981], both reconstructions map smoothly to the original Lorenz Attractor (gray), and therefore to each other (magenta).

knitr::include_graphics("figure_3.pdf")

Directionality

When causation occurs in only one direction, e.g. $x$ influences $y$, but $y$ does not influence $x$, the mapping between $\mathbf{x}_t$ and $\mathbf{y}_t$ is only assured in one direction. Specifically, if $x$ influences $y$, then there is a mapping from $\mathbf{y}_t$ to $x_t$ (i.e. from the affected variable to the causal variable). Although the direction is counterintuitive, this occurs because $\mathbf{y}_t$ must capture all of the dynamical rules for changes in $y$, which includes the influence of $x$. Consequently, $\mathbf{y}_t$ can recover the values of $x$. Conversely, $\mathbf{x}_t$ is only guaranteed to capture the dynamical rules for changes in $x$, which does not require any information about $y$. Thus, there is no guarantee of a mapping from $\mathbf{x}_t$ to $y_t$.

Nevertheless, because $x$ does influence $y$, there is typically some predictive skill in the mapping from $\mathbf{x}_t$ to $y_t$. Except in cases where $y$ is synchronized to $x$ [@Rulkov_1995], the quality of the mapping will be limited, while the mapping from $\mathbf{y}_t$ to $x_t$ will continue to improve in skill with more and more data. This convergence is a critical property for inferring causality, and can be tested by measuring the cross mapping skill with larger and larger subsamples of the data. For a more detailed description of using cross mapping to infer causation, see @Sugihara_2012 and @Ye_2015a.

Convergent Cross Mapping (CCM)

The ccm() function is a wrapper to compute cross map skill for different sized subsamples of the data. In the following example, we examine the relationship between sardine landings in California and Scripps Pier sea-surface temperature (SST) as in @Sugihara_2012 (but see also [@Deyle_2013]).

To identify convergence, we compute cross-map skill over many random subsamples of the time series. lib_sizes specifies the sizes of the library set subsamples, and num_samples specifies the number of subsamples at each library size. random_libs and replace specify how to generate the subsamples. Here, setting both to TRUE enables random sampling with replacement. We use E = 3, following the supplementary materials for @Sugihara_2012.

data(sardine_anchovy_sst)
sardine_sst <- ccm(sardine_anchovy_sst, E = 3,
                   lib_column = "sardine", target_column = "sio_sst",
                   lib_sizes = seq(10, 80, by = 10), num_samples = 100,
                   random_libs = TRUE, replace = TRUE, silent = TRUE)
sst_sardine <- ccm(sardine_anchovy_sst, E = 3,
                   lib_column = "sio_sst", target_column = "sardine",
                   lib_sizes = seq(10, 80, by = 10), num_samples = 100,
                   random_libs = TRUE, replace = TRUE, silent = TRUE)
str(sardine_sst)

As before, the output is a data.frame with rows for each model run (in this case, 100 models at each of 8 library sizes). To interpret the results, we use the ccm_means() function to compute a mean value of the forecast statistics at each unique library size.

sardine_sst_ccm_means <- ccm_means(sardine_sst)
sst_sardine_ccm_means <- ccm_means(sst_sardine)
str(sardine_sst_ccm_means)

Figure \@ref(fig:sardine-sst) shows the cross mapping between sardines and sea-surface temperature. Increasing cross-map skill with library size suggest a causal influence; here cross mapping from sardines to SST indicates an effect of temperature on sardines. As expected, there is no cross mapping in the opposite direction, because sardines do not influence temperature.

(ref:sardine-sst-caption) Cross mapping between California sardine landings and Scripps Pier sea-surface temperature suggests an influence of temperature on sardines, but not vice-versa.

par(mar = c(4, 4, 1, 4), mgp = c(2.5, 1, 0))

plot(sardine_sst_ccm_means$lib_size, pmax(0, sardine_sst_ccm_means$rho), 
     type = "l", col = "red",
     xlab = "Library Size", ylab = "Cross Map Skill (rho)", ylim = c(0, 0.4))
lines(sst_sardine_ccm_means$lib_size, pmax(0, sst_sardine_ccm_means$rho), 
      col = "blue")
legend(x = "topleft", legend = c("sardine xmap SST", "SST xmap sardine"),
       col = c("red", "blue"), lwd = 1, cex = 0.8)

Data Formats

rEDM is designed to work with common data structures in R. Functions that operate on single-variable time-series data (e.g. simplex(), s_map()) will accept numeric vectors or ts time-series objects. Functions that operate on multivariate time-series data (e.g. block_lnlp(), ccm()) will accept numeric matrices, data.frames, or mts time-series objects. Please see the documentation associated with individual functions for more details.

Missing data can be recorded using NA or NaN values. These missing values are automatically ignored as appropriate. For example, when using Simplex Projection, nearest neighbors must be complete vectors, so reconstructed vectors that are missing one or more coordinates will be ignored.

Note that it is possible to make forecasts as long as there is a complete set of predictor variables. Thus forecasting unobserved (future) states is possible by substituting NA for unknown future values. However, these forecasts are not subsequently re-used as input, so the user will need to set up their own code for multi-step forward projections, if desired.

Conclusion

Here, we provided examples of using the rEDM package for modeling, forecasting, and inferring causal relationships among, time series. In addition, rEDM includes several vignettes that provide further details and other use cases: rEDM-algorithms gives a formal description of the Simplex Projection and S-map methods, rEDM-coprediction describes how to quantify dynamic similarity, and rEDM-time-delay-ccm explains how to assess time delays in causal effects. The implementation of these methods and associated documentation in rEDM will ultimately save time for ecologists, and provide a highly useful toolbox for a wide variety of applications in modeling ecological time series.

Acknowledgments {-}

rEDM is an implementation of EDM as an R package, and benefits from past versions developed by George Sugihara, Alan Trombla, Richard Penner, Victor Wong, Martin Casdagli, Jerome Cartagena, Mohsen Azarbayejani, Ava Pierce, Jennifer Trezzo, and Hao Ye.

We thank Jun Cai, Jane Cowles, Yair Daon, Andrew Edwards, Oliver Keyes, Steve Munch, James Stagge, Masayuki Ushio, and Ethan White, for their suggestions and contributions to the package.

This project is supported by a Gordon and Betty Moore Foundation Data-Driven Discovery Initiative through Grant GBMF4563 (to Ethan P. White), NSF grant DEB-1655203 (GS), NSF grant DBI-1667584 (GS), U.S. Department of Defense Strategic Environmental Research and Development Program 15 RC-2509 (GS), Lenfest Ocean Program award 00028335 (GS), the Deutsche Bank-Jameson Complexity Studies Fund (GS), the Sugihara Family Trust (GS), the Leslie and John McQuown Gift and the McQuown Chair in Natural Sciences, UCSD (GS).

Data collection for the Cedar Creek LTER was funded by NSF grant DEB-9411972 (to G. David Tilman), DEB-0080382 (to G. David Tilman), DEB-0620652 (to G. David Tilman), and DEB-1234162 (to Eric Seabloom).

Authors’ contributions {-}

H.Y. and G.S. conceived the package; H.Y. designed the package, with assistance from A.T.C. and E.R.D. on coding, testing, and documentation; All authors were involved with writing and editing the paper.

Data accessibility {-}

Example datasets used here are included in the rEDM package [@Ye_2018].

References



ha0ye/rEDM documentation built on March 30, 2021, 11:21 p.m.