One important interest of the GPoM package comes from its potential to tackle chaotic dynamics (that is, deterministic behaviors presenting a high sensitivity to initial conditions). This ability can be of first importance when dealing with biophysical and environmental systems, and more generally with most of real world behaviors because most of these dynamics are (or can be assumed to be) governed by determinisitic processes, and, at the same time, are found to be poorly predictable at long term. To model such type of dynamics is a difficult task because it requires using an approach which formulation can guaranty the independence to the initial conditions. Furthermore, to model real dynamical behaviors, the approach should also allow working with observational time series, that is, with data contaminated by noise, with short time series, with data which behavior may be perturbed by external forcing, subsampled measurements, etc.

The aim of this vignette is to give some insights of
some of these difficulties.
To use a theoretical case study can be very useful
for this purpose in order to keep a full control
on the experiments,
and also to enable to consider each source of difficulty,
one by one, independently.
For the sake of consistency, the same chaotic system
(the Rössler system) is used for all the experiments
considered in the present vignette.
The sensitivity to system algebraic structure will be
addressed in another vignette (`7 Retromodelling`

).

To illustrate the potential of the GPoM package to deal
with time series independently to initial conditions,
the global modelling technique is applied to two data sets
presenting completely different initial conditions.
The two sets of time series have been generated by
the same dynamical system (see vignette `1 Conventions`

).
The global modelling technique is then applied
(see vignette `3 Modelling`

) to check if the
original model is retrieved in both cases.

The Rössler system is taken as case study
for the present experimentations.
For $(a = 0.52, b = 2, c = 4)$, this system
can be described by the matrix `K`

as follows:

# parameters a = 0.52 b = 2 c = 4 # equations Eq1 <- c(0,-1, 0,-1, 0, 0, 0, 0, 0, 0) Eq2 <- c(0, 0, 0, a, 0, 0, 1, 0, 0, 0) Eq3 <- c(b,-c, 0, 0, 0, 0, 0, 1, 0, 0) K = cbind(Eq1, Eq2, Eq3) visuEq(K, substit = c("x", "y", "z"))

Two sets of initial conditions will be used from which two simulations are obtained.

# equations v1 <- c(0.6, -0.6, 0.4) v2 <- c(-4, 2, 0.) nVar = 3 dMax = 2 outNumi1 <- numicano(nVar, dMax, Istep = 2000, onestep = 1/50, KL = K, v0 = v1) outNumi2 <- numicano(nVar, dMax, Istep = 2000, onestep = 1/50, KL = K, v0 = v2) plot(outNumi2$reconstr[,1], outNumi2$reconstr[,2], type = 'l', xlab = 't', ylab = 'x(t)') lines(outNumi1$reconstr[,1], outNumi1$reconstr[,2], type = 'l', col = 'red') legend(3,-2.5, c("i.c. v1", "i.c. v2"), col=c('red', 'black'), lty=1, cex = 0.8)

The phase portraits of the two data sets are presented hereafter:

plot(outNumi2$reconstr[,2], outNumi2$reconstr[,3], type = 'l', xlab = 'x(t)', ylab = 'y(t)') lines(outNumi1$reconstr[,2], outNumi1$reconstr[,3], type = 'l', col = 'red') lines(outNumi2$reconstr[1:400,2], outNumi2$reconstr[1:400,3], type = 'p', col = 'orange', cex = 0.6) # initial condition lines(outNumi2$reconstr[1,2], outNumi2$reconstr[1,3], type = 'p', lwd = 3) lines(outNumi1$reconstr[1,2], outNumi1$reconstr[1,3], type = 'p', col = 'red', lwd = 3) legend(4,2, c("i.c. v1", "i.c. v2", "transient v2"), col=c('red', 'black', 'orange'), lty=c(1,1, NA), pch = c(NA,NA,1), cex = 0.6)

The trajectory in black is issued from the initial
condition `v2`

(black circle), it starts with a transient
(in orange) before reaching the attractor.
The trajectory in red is issued from the other initial
condition `v1`

(red circle) and has already converged to the attractor.

The global modelling technique is applied to these two sets of time series, from which it is expected to retrieve the original Rössler system.

# time vector tin <- outNumi1$reconstr[,1] # modelling from time series starting from initial condition v1 out1 <- gPoMo(data = outNumi1$reconstr[,2:4], tin = tin, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 1500, nPmin = 7, nPmax = 7, method = 'rk4')

# modelling from time series starting from initial condition v1 out2 <- gPoMo(data = outNumi2$reconstr[,2:4], tin = tin, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 1500, nPmin = 7, nPmax = 7, method = 'rk4')

# visualize all the models obtained from i.c. v1 visuOutGP(out1) # visualize all the models obtained from i.c. v2 visuOutGP(out2)

When applying the model search, models can be categorized as
(1) diverging trajectories (these will be directly rejected),
(2) fixed points (these will be kept aside once categorized),
(3) period-1 cycles or (4) not identified.
In the present version of the package, the obtained
models may belong either to (3) or to (4),
this latter being the most interesting category for us
because it can include all the non-trivial solutions
(categories (2) or (3) may also present some interest
in the case that converging solution or period cycles
should be expected).
Depending on the quality of the measured time series,
but also on the observability of the system, none,
few or many models may be obtained using `GPoM`

.
A selection criterion will have to be used if several
models are obtained.
In practice, the models characterized by a phase
portrait similar to the original phase portrait
should be preferred.
To do so, a selection based on a visual checking
among the obtained models is found to be very
efficient and generally quite obvious.
The more terms being used in the model, the better
the model should be. However, a high number of terms
may also foster numerical instabilities.
As a consequence, models with less algebraic
terms should be preferred as far as they enable a better
consistency with the original data set.
For this reason, our selection will systematically foster
the simplest model formulations among the obviously
'good' models.

Based on these criteria, the smallest obtained model able to reproduce the original phase portrait is found to be the model #5 in both cases. Obtained equations respectively read

# Edit the equations of model obtained from time series generated from v1 visuEq(out1$models$model5, substit = c("x", "y", "z"), approx = 2)

# Edit the equations of model obtained from time series generated from v1 visuEq(data_vignetteVI$out1model5, substit = c("x", "y", "z"), approx = 2)

for the first data set, and for the other one

# Edit the equations of model obtained from time series generated from v2 visuEq(out2$models$model5, substit = c("x", "y", "z"), approx = 2)

# Edit the equations of model obtained from time series generated from v2 visuEq(data_vignetteVI$out1model5, substit = c("x", "y", "z"), approx = 2)

The original equations of the Rössler system can thus be retrieved in both cases with data sets of different initial conditions (and even with very different trajectories, the first one being based on the transient only).

# modelling from time series starting from initial condition v1 out3 <- gPoMo(data = outNumi2$reconstr[1:400,2:4], tin = tin[1:400], dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 1500, nPmin = 7, nPmax = 7, method = 'rk4')

The error on the model coefficients varies from 0.1 % to 0.7 % for the observational time series located close to the attractor. The level of error doubles more or less when using the transient as input.

# min values min(abs((out1$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out2$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out3$models$model5 - K) / K) * 100, na.rm = TRUE) # max values max(abs((out1$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out2$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out3$models$model5 - K) / K) * 100, na.rm = TRUE)

The approach is thus able to retrieve the exact equations form, independently to the initial conditions. Despite a rather high level of precision (the error is lower than 1%), parameter identification remains slightly dependent to the initial conditions.

To test the sensitivity to the signal length, eight time series of decreasing length are considered in this experiment (small shifts have been applied to the plots along the vertical abscissa to help their visusalization).

plot(outNumi1$reconstr[,1], outNumi1$reconstr[,2], type = 'l', xlab = 't', ylab = 'x(t)', lwd = 3) lines(outNumi1$reconstr[1:1000,1], outNumi1$reconstr[1:1000,2]+0.25, col = 2, lwd = 3) lines(outNumi1$reconstr[1:900,1], outNumi1$reconstr[1:900,2]+0.5, col = 3, lwd = 3) lines(outNumi1$reconstr[1:800,1], outNumi1$reconstr[1:800,2]+1., col = 4, lwd = 3) lines(outNumi1$reconstr[1:700,1], outNumi1$reconstr[1:700,2]+1.25, col = 5, lwd = 3)

Corresponding phase portaits:

plot(outNumi1$reconstr[,2], outNumi1$reconstr[,3], type = 'l', xlab = 'x(t)', ylab = 'y(t)', lwd = 3) lines(outNumi1$reconstr[1:1000,2], outNumi1$reconstr[1:1000,3], col = 2, lwd = 3) lines(outNumi1$reconstr[1:900,2], outNumi1$reconstr[1:900,3], col = 3, lwd = 3) lines(outNumi1$reconstr[1:800,2], outNumi1$reconstr[1:800,3], col = 4, lwd = 3) lines(outNumi1$reconstr[1:700,2], outNumi1$reconstr[1:700,3], col = 5, lwd = 3)

The global modelling is applied to the corresponding time series of varying length, each time considering three time series corresponding to the three variables $x$, $y$ and $z$ of the Rössler system.

# modelling using half the initial time series out4 <- gPoMo(data = outNumi1$reconstr[1:1000,2:4], tin = tin[1:1000], dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 7, nPmax = 7, method = 'rk4') # modelling using 45% of the initial time series out5 <- gPoMo(data = outNumi1$reconstr[1:900,2:4], tin = tin[1:900], dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 7, nPmax = 7, method = 'rk4') # modelling using 40% of the initial time series out6 <- gPoMo(data = outNumi1$reconstr[1:800,2:4], tin = tin[1:800], dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 7, nPmax = 7, method = 'rk4') # modelling using 35% of the initial time series out7 <- gPoMo(data = outNumi1$reconstr[1:700,2:4], tin = tin[1:700], dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 7, nPmax = 7, method = 'rk4')

The original formulation is retrieved in all the cases. As an example, the equations obtained with the shortest time series are the following:

# visualize all the models obtained from i.c. v1 visuOutGP(out7)

For shorter time series, it becomes more difficult to detect the model.

# min values min(abs((out1$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out4$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out5$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out6$models$model5 - K) / K) * 100, na.rm = TRUE) min(abs((out7$models$model5 - K) / K) * 100, na.rm = TRUE) # max values max(abs((out1$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out4$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out5$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out6$models$model5 - K) / K) * 100, na.rm = TRUE) max(abs((out7$models$model5 - K) / K) * 100, na.rm = TRUE)

In the present case, the signal length variations do not lead to significant differences in the precision of the parameters: levels of error remains between 0.07 % and 1.0 %.

It was shown that subsampling could have a very
big influence on the phase space
reconstruction^[S. Mangiarotti, 2018. The global
modelling classification technique applied to the detection
of chaotic attractors. *Supplementary Material A* to "Can the
global modelling technique be used for crop classification?"
by S. Mangiarotti, A.K. Sharma, S. Corgne, L. Hubert-Moy,
L. Ruiz, M. Sekhar, Y. Kerr, 2018.
*Chaos, Solitons & Fractals*, **106**, 363-378.]
(see also vignette `2 Preprocessing`

).
In the present section, we will investigate
the possibility to retrieve the original Rössler model
from the subsampled and resampled time series.

Original (in gray), subsampled (in red) and resampled (in green) time series are presented hereafter:

plot(outNumi1$reconstr[,1], outNumi1$reconstr[,2], type = 'l', col = 'gray', xlab = 't', ylab = 'x(t)', lwd = 3) # subsample sub1 <- 50 tsub1 <- outNumi1$reconstr[seq(1, 2000, by = sub1),][,1] datsub1 <- outNumi1$reconstr[seq(1, 2000, by = sub1),][,2:4] lines(tsub1, datsub1[,1], type = 'p', col = 'red', lwd = 3) # resample xout <- seq(min(tsub1), max(tsub1), by = 0.01) rspl1x <- spline(tsub1, y = datsub1[,1], method = "fmm", xout = xout) rspl1y <- spline(tsub1, y = datsub1[,2], method = "fmm", xout = xout) rspl1z <- spline(tsub1, y = datsub1[,3], method = "fmm", xout = xout) lines(rspl1x$x, rspl1x$y, type='l', col='green') legend(0, 5, c("original", "subsampled", "resampled"), col=c('gray', 'red', 'green'), lty=c(1, NA, 1), pch = c(NA,1,NA), cex = 0.8)

The global modelling was first applied to the subsampled data set.

# modelling using half the initial time series out8 <- gPoMo(data = datsub1, tin = tsub1, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 2, nPmax = 12, method = 'rk4')

None of the models enabled to reproduce the original phase portrait satisfyingly: the original model could not be retrieved from this subsampled data set, even partially.

The global modelling was then applied to the resampled data set:

# modelling using half the initial time series out9 <- gPoMo(data = cbind(rspl1x$y, rspl1y$y, rspl1z$y), tin = rspl1x$x, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 2, nPmax = 12, method = 'rk4')

The simplest model presenting an appropriate phase portrait is the model #44.

# model #44 visuOutGP(out9, selecmod = 44, prioMinMax = "model")

# Equations visuEq(out9$models$model44, substit = 1, approx = 4)

# Equations visuEq(data_vignetteVI$out9model44 , substit = 1, approx = 4)

This model has one more algebraic term than the original system. This means that interpretation should be considered with more caution when considering time series under subsampled conditions. One important source of error in the case of resampled time series can result from the computation of derivatives. It may thus be interesting to investigate the possibility to obtain simpler formulation by tuning the window length used to compute the derivatives as follows:

# modelling using half the initial time series out10 <- gPoMo(data = cbind(rspl1x$y, rspl1y$y, rspl1z$y), tin = rspl1x$x, dMax = 2, nS = c(1,1,1), show = 0, winL = 13, IstepMin = 10, IstepMax = 400, nPmin = 2, nPmax = 7, method = 'rk4')

Using a slightly stronger smoothing
(`winL = 13`

instead of `winL = 9`

by default),
a simpler formulation can be found
which effectively corresponds to the original system:

# model #25 visuOutGP(out10, selecmod = 25, prioMinMax = "model")

# Equations visuEq(out10$models$model25, substit = 1, approx = 4)

# Equations visuEq(data_vignetteVI$out10model25, substit = 1, approx = 4)

The error on the model parameters is very heterogeneous and much higher than what was found in previous tests (from 0.2 % to 50 %). By experience, other parameters than the smoothing window may be considered to try to obtain better and/or simpler models. In particular, alternative windows of the time series may be a good way to avoid some local inconsistencies that may result from noise, resampling, or potentially from local nonstationary conditions.

Measurement noise can be another source of perturbation. In the present section, we will investigate the possibility to retrieve the original Rössler model from the time series contaminated by measurement noise.

Original time series (in gray) and noisy (in red) time series (5% of noise of the original signal variance)

plot(outNumi1$reconstr[,1], outNumi1$reconstr[,2], type = 'l', col = 'gray', xlab = 't', ylab = 'x(t)', lwd = 3) # variance of (x, y, z): (5.795879, 5.629243, 4.654374) # add noise of 5% of the variance An1 <- 0.05 t <- outNumi1$reconstr[,1] datn1 <- outNumi1$reconstr[,2:4] datn1[,1] <- datn1[,1] + rnorm(dim(datn1)[1], mean = 0, sd = sqrt(5.795879) * An1) datn1[,2] <- datn1[,2] + rnorm(dim(datn1)[1], mean = 0, sd = sqrt(5.629243) * An1) datn1[,3] <- datn1[,3] + rnorm(dim(datn1)[1], mean = 0, sd = sqrt(4.654374) * An1) lines(t, datn1[,1], type = 'l', col = 'red', lwd = 1) bf <- butter(2, 1/50, type="low") b1x <- as.vector(filter(bf, datn1[,1])) b1y <- as.vector(filter(bf, datn1[,2])) b1z <- as.vector(filter(bf, datn1[,3])) b1 <- cbind(b1x, b1y, b1z) points(t[100:2000] - 0.5, b1[100:2000,1], col="black", pch=20) legend(0, 5, c("original", "noisy (5%)"), col=c('gray', 'red'), lty=1, cex = 0.8)

The time series (in black) was smoothed using a low pass filter
(the butter function of the `R`

package `signal`

can be used
for this purpose).

The same type of perturbation was applied with a higher level of noise (10%) added.

plot(outNumi1$reconstr[,1], outNumi1$reconstr[,2], type = 'l', col = 'gray', xlab = 't', ylab = 'x(t)', lwd = 3) # add noise of 10% of the variance An2 <- 0.2 t <- outNumi1$reconstr[,1] datn2 <- outNumi1$reconstr[,2:4] datn2[,1] <- datn2[,1] + rnorm(dim(datn2)[1], mean = 0, sd = sqrt(5.795879) * An2) datn2[,2] <- datn2[,2] + rnorm(dim(datn2)[1], mean = 0, sd = sqrt(5.629243) * An2) datn2[,3] <- datn2[,3] + rnorm(dim(datn2)[1], mean = 0, sd = sqrt(4.654374) * An2) lines(t, datn2[,1], type = 'l', col = 'red', lwd = 1) bf <- butter(2, 1/70, type="low") b2x <- as.vector(filter(bf, datn2[,1])) b2y <- as.vector(filter(bf, datn2[,2])) b2z <- as.vector(filter(bf, datn2[,3])) b2 <- cbind(b2x, b2y, b2z) points(t[50:2000] - 0.5, b2[50:2000,1], col="black", pch=20) legend(0, 5, c("original", "noisy (10%)"), col=c('gray', 'red'), lty=1, cex = 0.8)

The global modelling technique is then applied to these noisy data sets

# modelling using half the initial time series out11 <- gPoMo(data = b1, tin = t, dMax = 2, nS = c(1,1,1), show = 0, IstepMin = 10, IstepMax = 400, nPmin = 2, nPmax = 12, method = 'rk4')

# model #25 visuOutGP(out11, selecmod = 25, prioMinMax = "model")

The simplest model presenting an appropriate phase portrait is the model #25 which has the same formulation as the original system:

# Equations visuEq(out11$models$model25, substit = 1, approx = 4)

# Equations visuEq(data_vignetteVI$out11model25, substit = 1, approx = 4)

The error on the parameters can vary from 0.7 % to 17 %.

# min values min(abs((out11$models$model25 - K) / K) * 100, na.rm = TRUE) # max values max(abs((out11$models$model25 - K) / K) * 100, na.rm = TRUE)

The same analysis could also be performed with a level
of noise of 20% of variance compared to the original
signal.
In practice some variations can happen in the present
experiment since each simulation will depend on
the particular realization of noise generated by
the random generator (the `rnorm`

function was used here).
The original equations could be retrieved in both cases
with a maximum level of parameter error of around 24%
(a smoother filter had to be applied for these cases).

In the present vignette, it was shown that, up to a certain extent, the approach can be considered as robust to (1) initial conditions changes, (2) measurement noise, (3) subsampling and (4) short length time series. When the conditions become more difficult, the precision of the model parameters will first decrease. Some spurious terms will start to appear in the model formulation under harder conditions. Further, no model at all will generally be obtained.

Because it is useful to try to separate the difficulties when attempting to understand a problem, these limitations were considered separately, one by one. Of course, more problems will be found when several of these difficulties will take place at the same time, which is generally the case when a data set from real world conditions is considered. In such cases, more investigations may be required (for choosing the period of analysis, the resampling and smoothing parameters, etc.), and intepretation should be considered with more cautions.

Many other limitations should be considered.
The same chaotic system (the Rössler system) was
systematically considered in order to keep
the coherency between the various experiments.
Based on the previous illustrations,
what would be the difficulty when considering
other dynamical systems remains unexplored.
This question will start to be investigated
in the next vignette `7 Retro-modelling`

.

**Any scripts or data that you put into this service are public.**

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.