Using the Data2LD Package

Introduction: What does Data2LD do?

Data2LD estimates the parameters that define a linear differential equation that models observations of one or more curves evolving over time, space or another continuum.

Below we offer a basic introduction to systems of linear differential equations, but it would be useful of obtain the text Ramsay and Hooker (2017) for additional background on linear differential equations and examples of data analyses using models of this nature.

You will need to have a working knowledge of the functional data software package fda in order to implement a Data2LD analysis. If you are a little shakey on the subject of functional data analysis, you can consult Kokowszka, P. (2017) and Ramsay, Hooker and Graves (2009) as introductory references or Ramsay and Silverman (2005) as a more complete source.

These notes are intended to only introduce the idea of a differential equation and how to set up a data analysis using such an equation as a model. The examples here are elementary. More sophisticated sample analyses are to be found in the demo files AverageTemp CruiseControl Fdascript HeadImpact * Refinery

The parameter cascading approach

We won't go into the math here, but we want to explain the key idea behind how Data2LD estimates the parameters that define the differential equation. The values of these parameters are contained in a one-column matrix that we call theta in our code examples.

Actually, there are two sets of parameters that are needed to define the solution of a differential equation. Although we focus on the typically small number that define the equation itself, we also need a larger number of parameters that we use to define a variable in the differential equation, $x(t)$. We represent the one or more of these at computational level by choosing a set of basis functions, $\phi_k(t), k=1,...,K$, that is sufficiently rich that it can accommodate how complex the shape of $x(t)$ may be. Choosing the right basis is a key step, which we will comment on in these notes.

Each of these basis functions $\phi_k(t)$ is multiplied by a coefficient $c_k$, and the resulting terms are then added to define a linear expansion of our approximation to x(t). That is, $$x(t) = \sum_k^K c_k \phi_k(t).$$ Depending on how curvature and other shape features there are in the variable, the number of $K$ these coefficients is apt to be large, and if there are multiple variables in the a set of equations, this will be particularly so.

Our central idea is to define these coefficients as functions $c_k(\theta)$ of the parameters defining the equation. In this way, the linear expansion coefficients are no longer free to vary as they please, and ultimately the variables themselves are completely determined by the parameter values that we estimate or that we use along the way to the estimation.

This parameter cascading estimation strategy has two important advantages. First, it means that our fit to the data is low-dimensional in the sense that it depends only on the small fixed number of equation parameters. The fit also depends on a single constant $0 \le \rho < 1$ that controls how much emphasis is placed on satisfying the equation exactly.

The second advantage is that the computation is greatly speeded up and is much more stable relative to older approaches to parameter estimation. We also derive additional stability in the estimation by how we control $\rho$. For small $\rho$ values, the emphasis is strongly on fitting the data, and as such the computation tends to proceed relatively smoothly. We steadily increase $\rho$ towards one, as you will see below, and with each increase use as starting parameter values those computed for the previous value. As a consequence, you will see in our two examples that typically only two or three optimization steps are required for each value of $\rho.$ We are usually particularly interested in how well we fit the data for $\rho$ very close to one, when the variable(s) are very close to being exact solutions to the equation. Thus, we ultimately use values like $\rho = 0.999$.

The architecture of a linear differential equation

A single differential equation and its terms

A single linear differential equation has a derivative of order $m$ on the left side of the equation, which we shall denote by $D^m x(t)$ instead of using the older and more cumbersome notation $d^m x/d t^m.$ Although $D^m x(t)$ is a function of time or some other continuum, we will often minimize clutter in our notation by dropping "$(t)$" from our expressions.

Here are some simple examples that may help to clarify the structure of linear differential equation systems, and illustrate the notation that we use.

The right side of the equation consists of a sum of terms, each being the product of a coefficent, $\alpha$ or $\beta$, and a variable, $x(t)$ or $u(t)$.

The coefficient or rate may be a constant, or it may be a function of time and/or of some variable that is external to the system of equations. What the coefficient or rate function multiplies can be one of two things. If it multiplies a derivative of the variable $x(t)$ in the equation, we call the product a homogeneous term in the equation, and we will use the notation $\beta$ or $\alpha(t)$ for the coefficient. The homogeneous coefficient or multiplier functions $\beta_j$ can be functions of $t$, or of any other external variable, but the most frequent case in practice is that they are constants. If the system is linear, which we assume here, what coefficient cannot be is a function of the value $x$ or of any of its derivatives.

If the coefficient is a function of a system variable, or if what it multiplies is not just the simple value of a variable, the equation is nonlinear, and not an object that this package uses. A contrasting example is the differential equation $Dx(t) = \betax(t)[K - x(t)]$, which is nonlinear because it involves the square of $x$. Nonlinear equations are treated in the package CollocInfer.

A coefficient may also multiply a function of time that is not a variable in the system, that is, not one of the variable derivatives $D^j x(t)$. The term then represents external input that modifies the behavior of the derivative of the variable $D^m x(t)$ on the left side. We call such a product a forcing term and we use the notaton $\alpha$ or $\alpha(t)$ for the coefficient, and $u(t)$ for an external function.

Often a term has a third multiplier, which is a fixed constant. This we call a factor in the term. It is especially often that the factor is -1, and in such cases it is usually understood that coefficient function is non-negative.

The general formulation for a homogeneous term on the right side of the equation is $b \beta(t) D^j x(t)$ where $b$ is a fixed known constant and $j < m$, such as the -1 that we used above. A forcing term is $a \alpha(t) u_{\ell}(t)$ where $a$ is the fixed factor, which is dropped if it is one.

In the simplest situation, the derivative order on the left side is $m = 1$, there is only a single term on the right side and $\beta_j$ is a constant. For example, the differential equation defining exponential decay over time is $Dx(t) = -\beta x(t)$ or, dropping "$(t)$", $Dx = -\beta x$. Here the single constant parameter $\beta > 0$ determines the rate of decay, and is therefore called a "rate constant." The factor is -1, as we indicated above. The time taken for the decay to be about 95% complete is $4/\beta$.

A simple order $m = 2$ example is the equation for harmonic motion $D^2x = -\beta x$, $\beta > 0$ and factor -1, the solution to which is the weighted sum of a sine and a cosine function. The frequency of period is $T = 2 \pi / \beta$ so that if $\beta = 2 \pi$ the solution oscillates once per time unit.

A slightly more complex equation is that for damped harmonic motion $D^2x = -\beta_0 x - \beta_1 Dx, \beta_0 > 0$. The solution combines sinusoidal behavior with an exponential evolution of the amplitude. Depending on the sign and size of the damping coefficient $\beta_1$, the evolution is exponential decrease or increase.

To sum up, what makes the equation linear is (1) that it is a sum of terms, and (2) each term is a product of a multiplier or rate function and the value of a variable or its derivative (homogeneous), or an external variable (forcing). Such is the jargon of dynamic modelling.

Chapter 3 of Ramsay and Hooker (2017) can be consulted for more details.

Systems of linear differential equations

A system of differential equations involves $d$ variables $x_i, i=1,\ldots,d$. We allow the left side of equation $i$ to have an order of derivative $m_i$ that varies from one equation to another.

The right side of the $i$th equation is a sum of terms involving derivatives $D^k x_j$ of orders $k$ less than $m_j$. Although there is the potential for a very large number of such terms if $d$ is at all large, this usually does not happen. Rather the terms for equation $i$ involve only one or two variables other than the variable $x_i$ on the left side of the equation, as well as terms involving that variable itself.

More example systems

All these examples involve only a single variable, and we want to extend them to allow for more than one variable with mutual additive effects on each other. Now we will need two subscripts on each coefficient, the first of which pertains to the variable being affected, and the second to the variable producing the effect. In fact, for higher order equations, we will also need a third index that specifies the order of derivative.

But, thankfully, most systems used in practice will not need this level of complexity. In fact, typically, the majority of these contributions are not in the equation at all. Moreover, some of the elements in the equation will be associated with known rate functions, for example, with values that may have been determined by earlier experiments.

Often a specific rate function $\beta$ or forcing function coefficient function $\alpha$ may appear more than once in a system of equations, in which case the notation for specifying the equations may have to be modified so as to make this clear. But it's still important to keep the three-index notation $\beta_{i_1,i_2,j}$ in mind since, in summary, on the right side of equation $i_1$ there is the potential to have a contribution from the $j$th derivative of variable $i_2$, as well as from forcing functions $u_{i_1 \ell}, \ell = 1, \ldots, L_{i_1}$ modulated by multiplicative coefficient $\alpha_{i_1 \ell}$.

Here, in summary, are the facets of a linear differential equation system that an Data2LD analysis can handle:

Future versions of Data2LD will allow other possibilities, such as multiple observations at times, and smoothing or regularization of estimated rate or coefficient functions.

Functional observations

A single functional observation is a set of $n$ values $y_j$ at a set of time points $t_j$ contained with some interval that we shall assume, without losing any generality, ranges from 0 to $T$. For simplicity, we shall also assume that the observations are univariate, that is, each $y_j$ is a single real number. We also assume that the times are strictly ordered, so that $t_j < t_{j+1}$. What makes the observations functional is that a smooth function $x(t)$ underlies these data, and that $y_j$ reflects the function value $x(t_j)$.

Data2LD implicitly assumes the classic relation $y_j = x(t_j) + e_j$ where the "errors" or "residuals" have a distribution with mean $\mu = 0$, variance $\sigma^2$ that is finite, and that their distribution is independent of that of the $x(t_j)$'s. These assumptions justify the use of least squares approximation to the data, which Data2LD implements.

Note that not all of the $d$ functions $x_i, i=1,\ldots,d$ may be observed, a situation that is quite common in applications.

What are the steps in a typical Data2LD analysis?

Here we take you through the steps and decisions required to execute an Data2LD analysis and explore the results. Also, you might be helped by following through the code in the next Section for either the single variable refinery data, or for the two-variable the cruise control example. The distribution of Data2LD contains a number of sample analyses that illustrate the use of Data2LD in live data analyses.

Define the problem

This is where you set up the differential equation, perhaps the most important step of all.

Your greatest hazard will be exuberance! It is really easy to over-parameterize a model defined by a differential equation, and in the writer's experience, a large portion of published equations are just that. An over-parameterized equation is one for which certain coefficients and combinations of coefficients cannot be estimated given the data that one is using. Whether or not a model is over-parameterized will depend on the configuration of the data. For multi-equation systems it is common to have data available for only a subset of the equations. It may not be obvious which combinations of parameters are under-determined by the data, and a good deal of experimentation with alternative models may be required before the final equation is defined.

So begin with the simplest possible equation or system that you can imagine that has anything to say about the data, even if it is known to be only a caricature. Check the estimated system using methods described below to see that is well-defined by the data, and then consider only a few elements to the equation, preferably one at a time, again accompanied by careful confirmation of identifiability. You have been warned!

Specify the observation interval, the observation times, and load the observations

The observation values and the times of observation are specified in a list array of length $d$, the number of variables in the system. A list array is an un-named list defined by the R code listArray = vector("list", d). A specfic $j$th element in the array is accessed by the R code listArray[[j]].

A specific member of listArray is itself a named list, with two members: argvals: This contains a vector of increasing observation times $t$. y: This contains the same number of observation values.

It is usual for some variables not to be observed, and the corresponding lists will be empty as a consequence. The number of spacing of times can vary from one list to another, but even if they are the same for all variables, this information is required in list corresponding to an observed variable.

It can happen that observations do not have a one-to-one correspondence with variables. For example, an observation may be of the sum of two variables. Unfortunately, function Data2LD cannot accommodate such data configurations, but this is a target for future development.

Select one or more basis systems for representing the solution(s)

There are two classes of basis systems in an Data2LD analysis: the rate or coefficient function bases and the variable bases. The variable bases $\phi_{ik}(t)$ define the estimate of the solution functions $x_i$. Here some care and thought is required. The basis system is not determined so much by the data, as it would be in normal smoothing, but rather by what behaviour the variable must exhibit in order to allow an accurate solution to the equations. Often, for example, highly localized curvature is present and especially where a forcing function is discontinuous, as it often is. In such a condition, high knot density for a B-spline expansion is required, or even multiple knots at a known specific location. Both the cruise control and the head impact examples exhibit this problem. A good quality final choice for a variable system may require considerable experimentation.

The class of basis systems is those that define the coefficient or rate functions. These are usually functional data basis objects, and where constant coefficients are involved are constant basis functions. Normally the coefficient bases are much smaller in size than those for representing $x_i(t)$ since we expect a lot of curvature in the equation solution, but only mild curvature in a rate function $\beta(t)$. There is also a provision for coefficient functions that the user defines.

Setting up vectors of list objects defining each term in each differential equation

Each rate or coefficient function is defined by one or more parameters. Some functions will have parameters that must be estimated by Data2LD, and some others will be defined by parameter values treated as fixed. All parameter values will need to be specified, however, since those to be estimated must have initial values that will be modified in the course of optimizing the fits to the data. It will often be the case that rate or coefficient functions are simple constants, and thus require only a single parameter value. A more general but still simple case is where the function is defined by a basis function expansion, and therefore can be considered as a functional data object (see references above for more information on functional data objects if needed).

The most general case is one in which the user defines the function by providing a function definition In this case, and if these parameters require estimation, a second function definition is also required for the partial derivative of the function with respect to the parameters that define it.

Setting up the list vectors defining the system of equations

A vector of length $d$, which will call a modelList, specifies the structure of the system of $d$ differential equations in roughly the same way that you would write down the equations one after the other. The vector is defined by the code modelList <- vector("list", d).

Each list in modelList contains in turn list and other objects that define the essential information about the corresponding equation. These seven items are named members or fields, and are as follows, with the name appearing in bold face and description of the object following:

The XList object

Now let's look at the list object that is contained in a list in XList.

The following six fields are required to be set up by the user:

Again we supply a single line command make.Xterm(funobj, parvec, estimate, variable, derivative, factor) that can do this set-up of a single homogeneous term for you.

The FList object

The list objects contained in the lists of the FList list array specify the forcing functions and the coefficients $\alpha(t)$ that multiply them.

There are five required fields for each forcing function:

The command that will set this up automatically is make.Fterm(fun, parvec, estimate, Ufd, factor).

Finally, these two list vectors plus the other three objects name, order and weight that define a single linear differential equation must be combined into list object that is within the respective list of the over model list array that defines the complete system. A single line command that will do this for you is make.Variable(name, order, XList, FList, weight). The weight field can be set manually if you need to do so, and the remaining two fields nallXterm and nallFterm are automatically set when you use the make.Model function described below.

The function checkModel should be used immediately after the model list array is specified, and it will be called automatically within Data2LD. It can be invoked by the command checkModel(XbasisList, modelList,).

Compute starting values for estimated coefficients using derivative matching

This is an optional step, since Data2LD often works fine with simple initial coefficient values like zero, and supplying good initial values often only saves one or two iterations.
However, if good initial values are considered important, these can be constructed by a process that we call *gradient matching.} This involves an initial smoothing of the data for each observed variable, with a view to estimating the highest required derivative. Then the derivatives are converted to functional data objects, and these are used in the arguments of the fRegress function that fits a concurrent linear model. \cite{Ramsay:09} can be consulted for further details, and the cruise control examples show how this is done in that context.

Preliminary computations of four-way tensors in checkModel()

In this as well as in many algorithms, certain quantities need only be computed once, and once set up, can be reused each time a function is invoked. In the case of Data2LD, the integral of products of four basis functions in all pairs of terms are required over and over again, and the approximation of these integral can be a time-consuming. But this computational load can be greatly reduced by computing once and for all the four-way array of integrals of products of four basis functions, where the basis functions involved are those the $\beta$ and/or $\alpha$ bases and at the same time those of possible pairs of solution basis functions.

This is done in the checking function checkModel, and using this function before doing any analysis should always be done. In addition, the function displays a nice summary of the structure of the model as a means of further checking.

A simple example: The refinery data

Our simplest example is the analysis of the refinery data described in our references and available for analysis in our distribution package.

The single differential equation is $Dx(t) = -\beta x(t) + \alpha u(t)$. Note that both rate coefficient $\beta$ and forcing coefficient $\alpha$ are constants. By convention, a minus sign appears in front of $\beta$, which is expected to be positive. The homogeneous part of the equation, $Dx(t) = -\beta x(t)$, is the differential equation for exponential decay. The data are observations of fluid level in a tray in a refinery cracking tower, and the forcing function is a valve setting which is closed until time 67 minutes when it is opened. The forcing function is specified as a functional data object Ufd, and is a step function. The observations are recorded until 193 minutes. Both coefficients will be estimated.

These commands set up the data that we want to analyze:

library("Data2LD")
N <- 194
TimeData <- RefineryData[,1]
TrayData <- RefineryData[,2]
ValvData <- RefineryData[,3]

Now plot the data:

plot(TimeData, TrayData, type="p", xlab="Time (sec)", ylab="Speed (km/hr)") 
lines(c(67,67), c(0,4.0), type="l")
plot(TimeData, ValvData, type="p", xlab="Time (sec)", ylab="Control")
lines(c(67,67), c(0,0.5), type="l")

We now load the observation times and the data into a list object. Here and elsewhere our un-named list objects containing named lists are of length 1. This seems unnecessary, but we stick to this format because we will often be working with more than one variable.

  TrayList <- list(argvals=RefineryData[,1], y=RefineryData[,2])
  TrayDataList  <- vector("list",1)
  TrayDataList[[1]] <- TrayList

This model assumes that a single forcing function $u(t)$ is driving the output. This is the valve setting, which is a step function that changes its level at time 67. The following code sets up a functional data object for this forcing function $u(t)$.

Valvbreaks <- c(0,67,193)
Valvnbasis <- 2
Valvnorder <- 1
Valvbasis  <- create.bspline.basis(c(0,193), Valvnbasis, Valvnorder, Valvbreaks)
Valvfd     <- smooth.basis(TimeData, ValvData, Valvbasis)$fd

Here we define and plot the spline basis object that will be used to represent the solution $x(t)$ and load it into a list object of length one. Here a little explanation about our order and knot choices for represent $x(t)$ is needed.We use order 5 B-splines as a basis because we want the first derivative to be smooth nearly everywhere, except at the time 67 of the valve opening when an abrupt change in the input forcing function takes place. There we have to allow the first derivative to be continuous but not its higher derivatives. We achieve this by putting four knots at the same value, namely 67 minutes. After time 67, 13 equally spaced interior knots are used, but before then, where the function is flat, we have no interior knot.

  Traynorder <- 5
  Traybreaks <- c(0, rep(67,3), seq(67, 193, len=15))
  Traynbasis <- 22
  TrayBasis  <- create.bspline.basis(c(0,193), Traynbasis, Traynorder, 
                                   Traybreaks)

We plot the basis so that we can see that placing two knots or breakpoints at time 67 will leave the tray level function $x(t)$ continuous at 67 but will allow its first derivative to be discontinuous. The vertical dashed lines indicate knot placements.

  par(mfrow=c(1,1))
  plot(TrayBasis, xlab="", ylab="B-spline basis functions")
  TrayBasisList    <- vector("list",1)
  TrayBasisList[[1]] <- TrayBasis

Now we turn to defining the right side of the equation. Both of the coefficient functions are constants, but we should still define them as functional data objects in order to remind ourselves that coefficients can be functions of time. In this code we define a constant basis over the observation interval, make it into a functional data object with value zero everywhere, and then make two fdPar objects, one for the homogeneous term, and one for the forcing term.

conbasis   <- create.constant.basis(c(0,193))
confd      <- fd(0,conbasis)
betafdPar  <- fdPar(confd)
alphafdPar <- fdPar(confd)

This is overkill; we could just put conbasis in as the coefficient function for both terms, we do want to keep the process in its most general form.

Next we define the list object of length 1 containing the specification of the single homogeneous term of the model. In the definition, we supply a reasonable positive value for the value of the coefficient function, specify that its value will be optimized in the estimation process, that it will multiple the variable itself rather than one of its derivatives, and that there will be a fixed factor -1 multiplying the term. How did we get the value 0.04? We know that an exponential decay is pretty much completed in $4/\beta$ time units (minutes), and from our plot of the data we could that this took about 193 - 67 = 126 minutes. Using 4/0.04 = 100 minutes is close enough in this context.

  #  Xterm Fields:    funobj     parvec  estimate  variable deriv. factor
  XTerm <- make.Xterm(betafdPar, 0.04,   TRUE,     1,       0,     -1)
  XList <- vector("list", 1)
  XList[[1]] <- XTerm

Then we define the list object containing the specification of the single forcing term.

  #  Fterm Fields:    funobj      parvec  estimate  Ufd      factor
  FTerm <- make.Fterm(alphafdPar, 1.0,    TRUE,     Valvfd,  1)
  FList <- vector("list", 1)
  FList[[1]] <- FTerm

Now we assemble these terms into the definition of the first order forced linear differential equation.

  TrayVariable <- make.Variable(XList=XList, FList=FList, 
                                name="Tray level", order=1)
  TrayModelList <- vector("list",1)
  TrayModelList[[1]] <- TrayVariable

Finally we define the model object by combining the basis system with the variable definitions and the coefficient definitions. This step is also mandatory because it not only checks the structure of the objects in the argument list, but it also computes some four-way arrays or tensors that are used repeatedly in an analysis. It also displays the structure of the model so that we make a visual check as well.

checkModelList <- checkModel(TrayBasisList, TrayModelList, summarywrd=TRUE)
TrayModelList  <- checkModelList$model
nparam         <- checkModelList$nparam
print(paste("Number of parameters =",nparam))

We invoke function Data2LD() once to ensure that everything is working. To do this, we must define a value of the smoothing parameter $\rho$ that controls the degree of smoothness in the solution function.

rhoVec <- 0.5
Data2LDList <- Data2LD(TrayDataList, TrayBasisList, TrayModelList, rhoVec)
MSE  <- Data2LDList$MSE    # Mean squared error for fit to data
DMSE <- Data2LDList$DpMSE  #  gradient with respect to parameter values

We can now proceed to the optimizations of data fits for a sequence of values of $\rho$. The following code sets up values that control the optimization, and defines a sequence of $\rho$ values:

dbglev   <-  1    #  debugging level
iterlim  <- 50    #  maximum number of iterations
convrg   <- 1e-6  #  convergence criterion

gammavec <- 0:7
nrho     <- length(gammavec)
rhoMat   <- as.matrix(exp(gammavec)/(1+exp(gammavec)),nrho,1)
dfesave  <- matrix(0,nrho,1)
gcvsave  <- matrix(0,nrho,1)
MSEsave  <- matrix(0,nrho,1)
ISEsave  <- matrix(0,nrho,1)
thesave  <- matrix(0,nrho,nparam)

The value of $\rho$ provided by the argument rho requires special comment, and a more detailed explanation of its meaning can be found in \cite{RamsayHooker17}. the value of $\rho$ may be either a single number greater than or equal to zero and less than one, or a vector of such numbers. The role of these values is to control the relative emphasis on fitting the data versus satisfying the differential equation. For lower values such as 0.5, the data will be closely approximated but the resulting $x(_i)t)$ will not fit satisfy the differential equation particularly well, and the estimates of the parameters in theta.opt will be relatively poor. On the other hand, high values like 0.999 will place strong emphasis on fitting the differential equation and provide good parameter estimate, but at the expense of a poorer fit to the data.

For smaller values of $\rho$ the optimization proceeds rapidly, but if we proceed directly to use large values, the optimization can become more difficult, and may terminate in a local minimum that is not globally optimal.

We therefore recommend using a vector of values, and the optimization code will use these one after another, feeding the parameter estimates at each value on as initial values for the next optimization. The values of rho be increasing, but not by equal steps. Instead, using the formula $\rho = \exp(\gamma)/[1 + \exp(\gamma)]$ where the increasing values of $\gamma$ can be integer sequences such as 0, 1, ..., 7, which values the values of $\rho$ as 0.5, 0.73, 0.88, 0.95, 0.98, 0.99, 0.997, and 0.999.

Here we cycle through the increasing values of $\rho$ and optimize the fit using function Data2LD.opt. Note that we pass the parameter estimates at each value on as initial values for the next optimization.

TrayModelList.opt <- TrayModelList

for (irho in 1:nrho) {
  rhoi <- rhoMat[irho]
  print(paste("rho <- ",round(rhoi,5)))
  Data2LDResult <- Data2LD.opt(TrayDataList, TrayBasisList, 
                               TrayModelList,  rhoi, convrg, 
                               iterlim, dbglev)
  theta.opti <- Data2LDResult$thetastore
  TrayModelList.opti <- modelVec2List(TrayModelList, theta.opti)
  Data2LDList <- Data2LD(TrayDataList, TrayBasisList, TrayModelList, rhoi)
  MSE       <- Data2LDList$MSE 
  df        <- Data2LDList$df
  gcv       <- Data2LDList$gcv 
  ISE       <- Data2LDList$ISE 
  Var.theta <- Data2LDList$Var.theta
  thesave[irho,] <- theta.opti
  dfesave[irho]   <- df
  gcvsave[irho]   <- gcv
  MSEsave[irho]   <- MSE
  ISEsave[irho]   <- ISE
}

We display for each value of $\rho$ some summary measures. Value df is a measure of degrees of freedom in the fit. As $\rho$ nears one, this value approaches 2, the number of parameters in the model. Value gcv is called the generalized cross-validation value, and is often used to choose the best value of $\rho$. Value MSE is the mean-squared error of the fit to the data, and value ISE is the integrated squared error between the left and right sides of the differential equation.

print("    rho      df         gcv,       MSE       ISE")
for (irho in 1:nrho) {
  print(round(c(rhoMat[irho,1], dfesave[irho], gcvsave[irho], 
                MSEsave[irho], ISEsave[irho]),5))
}

As we indicated in the introduction, we often want to see how well we fit the data using a nearly exact solution to the equation. This code evaluates that fit at the highest value $\rho = 0.999.$

irho <- nrho  #  evaluate solution for (highest rho value
irho <- 8  #  evaluate solution for (highest rho value
TrayModelList <- modelVec2List(TrayModelList, thesave[irho,])
Data2LDList <- Data2LD(TrayDataList, TrayBasisList, TrayModelList, rhoMat[irho,])
df        <- Data2LDList$df
gcv       <- Data2LDList$gcv 
MSE       <- Data2LDList$MSE 
ISE       <- Data2LDList$ISE 
Var.theta <- Data2LDList$Var.theta
beta  <- thesave[irho,1]
alpha <- thesave[irho,2]
print(paste("beta  =",round(beta,3)))
print(paste("alpha =",round(alpha,3)))
print(paste("df  = ", round(df,1)))
print(paste("gcv = ", round(gcv,5)))
print(paste("MSE = ", round(MSE,5)))
print(paste("ISE = ", round(ISE,5)))

This code plots the fit to the tray level data along with the data.

# plot fit of solution to data
Trayfd   <- Data2LDList$XfdParList[[1]]$fd
tfine    <- seq(0,193,len=101)
Trayfine <- eval.fd(tfine, Trayfd)
Ufine    <- eval.fd(tfine, Valvfd)
par(mfrow=c(1,1))
plot(tfine, Trayfine, type="l", lwd=2, 
     xlab="Time", ylab="Tray level", xlim=c(0,193), ylim=c(-0.5,4.5))
points(TimeData, TrayData)

We have estimated the two constant coefficients in the homogeneous and forcing terms. How well does the data define our estimated values? This requires estimates of standard errors of these estimates, which we can easily extract from the matrix Var.theta returned by function Data2LD using the highest value of $\rho$.

theta   <- thesave[nrho,]
stderr  <- sqrt(diag(Var.theta))
thetaUP <- theta + 2*stderr
thetaDN <- theta - 2*stderr
print(round(c(theta[1], thetaDN[1], thetaUP[1], stderr[1]),4))
print(round(c(theta[2], thetaDN[2], thetaUP[2], stderr[2]),4))

At the highest value of $\rho$ we hope (but cannot guarantee) that the linear differential equation will be almost exactly solved by the function fitting the data. If so, and if we are satisfied with the fit to the data, then we can celebrate having found the right model for the data. In the following code, we compute values of the left and right side of the equation and display then plotted over each other.

# compute the derivative Dx(t) and the right side of the equation
theta <- thesave[irho,]
DTrayData <- eval.fd(TimeData, Trayfd, 1)
DTrayfit  <- -theta[1]*Trayfine + theta[2]*Ufine
# plot the left and right sides of the equation
plot(TimeData, DTrayData, type="b", lwd=2,
     xlab="Time", ylab="DTray level", xlim=c(0,193), ylim=c(-0.01,0.12))
lines(tfine, DTrayfit)

We see that at the highest value of $\rho$ we have exactly satisfied the differential equation since first derivative values (circles), estimated by smoothing the data and the fit (smooth curve) nearly overlap.

What have we achieved here? This simplest of differential equations is able to capture how tray level responds to a change in valve setting by using only two parameters. The fit to the data is about as good as we could wish.

Further details are available in both Ramsay and Hooker (2007) and in the code that can be invoked by the command demo("Refinery").

A two-variable system: The cruise control model

This example is not much more complicated, but does illustrate how one rate function can appear in more than one place in a system. It models how a simple feedback system works, in this case the accelerator pedal in a car. The two variables are the speed ($S$) of a car and feedback ($C$) as the driver manipulates the accelerometer to change the target speed ($S_0$). The data consist of 41 simulated observations for each variable, distributed uniformly over the time interval [0,80]. The input variable is the desired speed or set point $S_0(t)$, and it is changed at times 0, 20, 40 and 60 to speeds 60, 40, 80 and 60 kilometres per hour, respectively.

The two differential equations are: $$DS(t) = -\beta_{SS} S(t) + \beta_{SC} C(t)$$ $$DC(t) = \beta_{CS} S(t) - \beta_{CC} C(t) + \alpha_{C} S_0(t)$$ where $\beta_{SS} = 1$, $\beta_{SC} = 1/4$, $\beta_{CS} = -1$, $\beta_{CC} = 0$ and $\alpha_C = 1.$ The homogeneous part of the speed equation is again that of exponential decay, and it is forced by the control variable. The homogeneous part of the control equation is held fixed at zero, and it is forced by the difference between the set point and the current speed. The system works by increasing $C$ when the current speed is less than the target and decreasing it when it is greater. The increase is passed into the speed equation as an additive forcing by $C$ modulated by the value of $\alpha_S$. Because of the exponential decay dynamics of $S$, the speed will approach the new speed target with an exponentially decreasing rate.

T     <- 80  #  seconds
rng   <- c(0,T)
n     <- 41
tobs  <- seq(0,T,len=n)
tfine <- seq(0,T,len=501)

We now set up the set-point forcing function. The set-point function uses an order 1 step function B-spline basis in order to define a function that is piecewise constant, since order 1 implies that there is only one parameter defining each basis function. The knots are placed at the points where the set-point changes values.

steporder  <- 1  #  step function basis
stepnbasis <- 4  #  four basis functions
stepbreaks <- c(0,20,40,60,80)
stepbasis  <- create.bspline.basis(rng, stepnbasis, steporder, stepbreaks)
stepcoef   <- c(60,40,80,60)  # target speed for each step
SetPtfd    <- fd(stepcoef, stepbasis)  # define the set point function

In this example we simulate data, and this will require that we construct the true solutions to the two equations, and then add some noise to simulate data with error.

In order to compute the solutions, we first define a function that evaluates the right side of the equation for a time value:

cruise0 <- function(t, y, parms) {
  DSvec    <- matrix(0,2,1)
  Uvec     <- eval.fd(t, parms$SetPtfd)
  DSvec[1] <- -y[1] + y[2]/4
  DSvec[2] <-  Uvec - y[1]
  return(list(DSvec=DSvec))
}

We first have a look at the solution at a fine mesh of values by solving the equation for the points in tfine using the differential equation solver lsoda in the deSolve package. We confess we ran into a small glitch here. If we solved the equation of the full interval $0,80$, lsoda tried to solve equation for a time value slightly larger than 80. So we solved the equation for all but last time value, and then tacked on the true solution values at the end. The matrix ytrue that is returned has the time values in the first column and the values of the speed and control variables in the remaining two columns

y0 <- matrix(0,2,1)
parms = list(SetPtfd=SetPtfd)                   #  define the input to the cruise0 function above
ytrue = lsoda(y0, tfine[1:500], cruise0, parms) # solve the equation using an initial value solver
ytrue = rbind(ytrue,matrix(c(80,60,240),1,3))   # combine with the final values

Now we use the interpolation function approx to get true values at the 41 observation times.

speedObs    <- approx(tfine, ytrue[,2], seq(0,80,len=41))$y
controlObs  <- approx(tfine, ytrue[,3], seq(0,80,len=41))$y

Next we simulate noisy data at the observation points by adding a random zero-mean Gaussian deviate to each curve value. The deviates for speed have a standard deviation 2 kilometers per hour, and those for the control level have a standard deviation of eight control units.

sigerr <- 2
yobs     <- matrix(0,length(tobs),2)
yobs[,1] <- as.matrix(  speedObs + rnorm(41)*sigerr)
yobs[,2] <- as.matrix(controlObs + rnorm(41)*sigerr*4)

Now plot the data along with the true solution:

plot(tfine, ytrue[,2], type="l", xlab="Time (secs)", ylab="Speed")
lines(c(0,T), c(60,60), lty=3)
points(tobs, yobs[,1], pch="o")
plot(tfine, ytrue[,3], type="l", xlab="Time (secs)", ylab="Control level")
lines(c(0,T), c(240,240), lty=3)
points(tobs, yobs[,2], pch="o")

We now provide simulated noisy data values in a list vector of length 2 observed for the values in tobs, and also the true values over the fine mesh tfine.

Sdata <- list(argvals=tobs, y=yobs[,1])
Cdata <- list(argvals=tobs, y=yobs[,2])
cruiseDataList <- vector("list",2)
cruiseDataList[[1]] <- Sdata
cruiseDataList[[2]] <- Cdata

The next step is to set up basis objects for the values of the two variables. The acceleraton of the speed variable will be smooth but the acceleration of the control variable will change discontinuously at the points where the forcing function changes values in a stepwise fashion. For this reason the speed B-spline basis is of order five but the control variable is of order four. We use three knots at the points of input change in order to capture these degrees of smoothness. We also position a single knot between each interval between change points.

cruiseBasisList <- vector("list",2)
delta <- 2*(1:10)
breaks   <- c(0, delta, 20, 20+delta, 40, 40+delta, 60, 60+delta)
nbreaks  <- length(breaks)
nSorder <- 5
nSbasis <- nbreaks + nSorder - 2
Sbasis  <- create.bspline.basis(c(0,80), nSbasis, nSorder, breaks)
cruiseBasisList[[1]] <- Sbasis
nCorder <- 4
nCbasis <- nbreaks + nCorder - 2
Cbasis  <- create.bspline.basis(c(0,80), nCbasis, nCorder, breaks)
cruiseBasisList[[2]] <- Cbasis

Now that we have the data and the bases for representing the solution, we can turn to the task of specifing the five constant coefficient functions $\beta_{SS}, \beta_{SC}, \beta_{CS}, \beta_{CC}$ and $\alpha_C$. Here we use the handy functions make.Xterm and make.Fterm to set up each rate or forcing coefficient. We make the homogeneous term $\beta_C$ zero and not estimated, leaving only four parameters to be estimated from the data.

First set up a functional parameter object that has the constant value 1 over all the interval. This defines each of the coefficient functions.

conbasis    <- create.constant.basis(rng)
confdPar    <- fdPar(conbasis)

There are two homogeneous terms for each equation, and the second equation has a forcing function which sets the target speeds. Here we define each of the five terms.

# List object for the speed equation: A term for speed and a term for control, but no forcing
SList.XList = vector("list",2)
#  Fields:                     funobj    parvec  estimate  variable deriv. factor
SList.XList[[1]] <- make.Xterm(confdPar, 1,      TRUE,     1,       0,     -1)
SList.XList[[2]] <- make.Xterm(confdPar, 1/4,    TRUE,     2,       0,      1)
SList.FList = NULL
SList = make.Variable("Speed", 1, SList.XList, SList.FList)
# List object for the control equation:  a term for speed, and a zero-multiplied term for control plus a term for the forcing function SetPtfd
CList.XList <- vector("list",2)
#  Fields:                     funobj    parvec  estimate  variable deriv. factor
CList.XList[[1]] <- make.Xterm(confdPar, 1,      TRUE,     1,       0,     -1)
CList.XList[[2]] <- make.Xterm(confdPar, 0,      FALSE,    2,       0,      1)
CList.FList <- vector("list",1)
#  Fields:                     funobj    parvec  estimate  Ufd      factor
CList.FList[[1]] <- make.Fterm(confdPar, 1,      TRUE,     SetPtfd, 1)
CList <- make.Variable("Control", 1, CList.XList, CList.FList)
#  List array for the whole system
cruiseVariableList <- vector("list",2)
cruiseVariableList[[1]] <- SList
cruiseVariableList[[2]] <- CList

Now set up the model object.

cruiseModelList <- vector("list",2)
cruiseModelList[[1]] <- SList
cruiseModelList[[2]] <- CList

We run a check on the set up of the model. It is so easy to slip up somewhere!

cruiseModelCheckList <- checkModel(cruiseBasisList, cruiseModelList)
cruiseModelList <- cruiseModelCheckList$modelList
nparam <- cruiseModelCheckList$nparam
print(paste("total number of parameters = ", nparam))

We do a preliminary evaluation of the least squares criterion at the parameter values that we have set up in coefList. Even after running the check above, it's possible that there is an error that the checking function did not catch.

rhoVec <- 0.5*matrix(1,1,2)
Data2LDList <- Data2LD(cruiseDataList, cruiseBasisList, cruiseModelList, rhoVec)
print(Data2LDList$MSE)
print(Data2LDList$DpMSE)

Notice that the gradient for the fourth parameter is exactly 0, so that the parameter never changes value.

We are now ready to optimize the criterion with respect to parameter values. This involves the following call to function Data2LD.opt: Data2LD.opt(yList, XbasisList, cruiseList, coefList.opt, rho). Here is a setup of this nature.

conv    <- 1e-4  
iterlim <- 20    
dbglev  <-  1   

Gvec    <- c(0:7)
nrho    <- length(Gvec)
rhoMat  <- matrix(exp(Gvec)/(1+exp(Gvec)),nrho,2)
dfesave <- matrix(0,nrho,1)
gcvsave <- matrix(0,nrho,1)
MSEsave <- matrix(0,nrho,2)
thesave <- matrix(0,nrho,nparam)

Here are the analyses for each value of $\rho.$ Note the we start by defining a copy of the model List, so that the optimal value estimated for any value of $\rho$ can be passed on as the initial value to be used in the optimization for the next $\rho$ value.

You will now see how quickly the solution is achieved for each value of $\rho$.

cruiseModelList.opt <- cruiseModelList  #  initialize the optimizing coefficient values
for (irho in 1:nrho) {   
  rhoVeci <- rhoMat[irho,]
  print(paste(" ------------------  rhoVeci <- ", round(rhoVeci[1],4), 
              " ------------------"))
  Data2LDOptList <- Data2LD.opt(cruiseDataList, cruiseBasisList, cruiseModelList.opt, 
                                rhoVeci, conv, iterlim, dbglev)
  theta.opti <- Data2LDOptList$theta  # optimal parameter values
  cruiseModelList.opt <- modelVec2List(cruiseModelList, theta.opti) 
  #  evaluate fit at optimal values and store the results
  Data2LDList    <- Data2LD(cruiseDataList, cruiseBasisList, cruiseModelList.opt, rhoVeci,
                            TRUE)
  thesave[irho,]  <- theta.opti
  dfesave[irho]   <- Data2LDList$df
  gcvsave[irho]   <- Data2LDList$gcv
  x1fd            <- Data2LDList$XfdParList[[1]]$fd
  x1vec           <- eval.fd(tobs,  x1fd)
  msex1           <- mean((x1vec - speedObs)^2)
  x2fd            <- Data2LDList$XfdParList[[2]]$fd
  x2vec           <- eval.fd(tobs,  x2fd)
  msex2           <- mean((x2vec - controlObs)^2)
  MSEsave[irho,1] <- msex1
  MSEsave[irho,2] <- msex2
}

Here are the summary measures for these analyses. Value RMSE is the square root of the mean squared error.

ind <- 1:nrho
#  print df, gcv and MSEs
print("    rho      df        gcv        RMSE:")
print(cbind(round(rhoMat,4),  round(dfesave,1), 
            round(gcvsave,1), round(sqrt(MSEsave),2)))

It can be interesting to see how the estimated parameters change over the sequence of values of $\rho$, noting that the fourth fixed parameter never changes value.

matplot(rhoMat, thesave, type="b", xlab="rho", ylab="theta(rho)")

Now let's look more closely at the solution for the maximum value of $\rho$, where the fit to the data should be also a near-solution to the differential equation.

rho.opt   <- rhoMat[nrho,]
theta.opt <- thesave[nrho,]
#  convert the optimal parameter values to optimal cruiseModelList
cruiseModelList.opt <- modelVec2List(cruiseModelList, theta.opt)
#  evaluate the solution at the optimal solution
DataLDList <- Data2LD(cruiseDataList, cruiseBasisList, cruiseModelList.opt, rho.opt,
                      TRUE)

Plot the estimated solutions, as estimated from the data, rather than from the initial value estimates in the above code

XfdParList <- Data2LDList$XfdParList
Xfd1 <- XfdParList[[1]]$fd
Xfd2 <- XfdParList[[2]]$fd
Xvec1 <- eval.fd(tfine, Xfd1)
Xvec2 <- eval.fd(tfine, Xfd2)
Uvec  <- eval.fd(tfine, SetPtfd)
cruiseDataList1 <- cruiseDataList[[1]]
plot(tfine, Xvec1, type="l", xlim=c(0,80), ylim=c(0,100), 
     ylab="Speed",
     main=paste("RMSE =",round(sqrt(MSEsave[3,1]),4)))
lines(tfine, ytrue[,2], lty=2)
points(cruiseDataList1$argvals, cruiseDataList1$y, pch="o")
#lines(tfine, Uvec, lty=2)
cruiseDataList2 <- cruiseDataList[[2]]
plot(tfine, Xvec2, type="l", 
     xlim=c(0,80), ylim=c(0,400),
     xlab="Time (sec)", ylab="Control",
     main=paste("RMSE =",round(sqrt(MSEsave[3,2]),4)))
lines(tfine, ytrue[,3], lty=2)
points(cruiseDataList2$argvals, cruiseDataList2$y, pch="o")

Display parameters with 95% confidence limits

stddev.opt <- sqrt(diag(DataLDList$Var.theta))
theta.tru <- c(1, 1/4,  1, 0,  1)
print("    True      Est.      Std. Err. Low CI    Upr CI:")
for (i in 1:nparam) {
  print(round(c(theta.tru[i], 
                theta.opt[i], 
                stddev.opt[i], 
                theta.opt[i]-2*stddev.opt[i], 
                theta.opt[i]+2*stddev.opt[i]), 4))
}

Fitting the constrained cruise control model $\beta_{CS} = \alpha$

In our first analysis we ignored the fact that that the speed coefficents $\beta_{CS}$ and $\alpha$ are identical since they define a difference. Thus the four-parameter model over-fits the data by one degree of freedom. We saw that these two values were close, but not identical. Now we re-analyze the data by incorporating the constraint into the analysis, and in this way reduce the number of parameters to be estimated to 3.

Suppose that we have $n$ parameters and these are subject to $k$ constraints. We can write this in matrix terms as $A'p = 0$ where $A$ is an $n$ by $k$ matrix, each column of which defines a constraint on the $n$ parameter values in column vector $p$.

We deal with linear parameter constraints in the optmization by defining an $n$ by $n-k$ matrix $P$ in such a way that $q = P'p$ is a vector of length $n-k$ whose values are unconstrained, and that $p = P*q$ is a vector length $n$ that satisfies the $k$ constraints.

There are several ways of defining matrix $PR with this property, but perhaps the easiest way is to use the qr decomposition. If we evaluate statement [Q,R] = qr(A), the resulting matrix $Q$ is $n$ by $n$ and has orthonormal columns, so that $Q'Q = I$. Matrix $R$, not used here, is upper triangular. Matrix $P$ is defined as the last $n-k$ columns of $Q$, and therefore $P'P = I.$

In our example, matrix $A$ has a single column ($k$ = 1), and its values are defined by the command parMap = matrix(c(0, 0, 1, 0, -1),5,1). It is not necessary to re-define the model and run checkModel. Instead, we just add matrix parMap' as an additional argument to the call to functionData2LD.opt`, so that

#  parMap = matrix(c(0, 0, 1, 0, -1),5,1)  
#  Data2LDOptList <- Data2LD.opt(cruiseDataList, cruiseBasisList, cruiseModelList.opt, 
#                                rhoVeci, conv, iterlim, dbglev, parMap)

If you re-run the above code, you will discover that convergence is faster for each value of $\rho$ and that these two values are now equal. Moreover, the estimates of the other parameters are also improved

See Ramsay and Hooker (2017) for much more explanation of these analyses.

References



Try the Data2LD package in your browser

Any scripts or data that you put into this service are public.

Data2LD documentation built on Aug. 6, 2020, 1:08 a.m.