Calculates the Discrepancy of a Model Solution with Observations
Description
Given a solution of a model and observed data, estimates the residuals, and the variable and model costs (sum of squared residuals).
Usage
1 2 
Arguments
model 
model output, as generated by the integration routine or the steadystate solver, a matrix or a data.frame, with one column per dependent and independent variable. 
obs 
the observed data, either in long (database) format (name, x, y), a data.frame, or in wide (crosstable, or matrix) format  see details. 
x 
the name of the independent variable; it should be a
name occurring both in the 
y 
either 
err 
either 
cost 
if not 
weight 
only if 
scaleVar 
if 
... 
additional arguments passed to Rfunction 
Details
This function compares model output with observed data.
It computes
the weighted residuals, one for each data point.
the variable costs, i.e. the sum of squared weight residuals per variable.
the model cost, the scaled sum of variable costs .
There are three steps:
1. For any observed data point, i, the weighted residuals are estimated as:
res_i=(mod_iobs_i)/err_i
with weight_i = 1/err_i and where Mod_i and Obs_i are the modeled, respectively observed value of data point i.
The weights are equal to 1/error, where the latter can be inputted,
one for each data point by specifying err
as an extra column in
the observed data.
This can only be done when the data input is in long (database) format.
When err
is not inputted, then the weights are specified via argument
weight
which is either:

"none"
, which sets the weight equal to 1 (the default) 
"std"
, which sets the weights equal to the reciprocal of the standard deviation of the observed data (can only be used if there is more than 1 data point) 
"mean"
, which uses 1/mean of the absolute value of the observed data (can only be used if not 0).
2. Then for each observed variable, j, a variable cost is estimated as the sum of squared weighted residuals for this variable:
Cost_varj=sum(for i=1,n_j) (res_i^2)
where n_j is the number of observations for observed variable j.
3. Finally, the model Cost is estimated as the scaled sum of variable costs:
sum(Cost_varj/scale_j)
and where scale_j allows to scale the variable
costs relative to the number of observations. This is set by
specifying argument scaleVar
. If TRUE
, then the variable
costs are rescaled. The default is NOT to rescale
(i.e. scale_j=1).
The models typically consist of (a system of) differential equations, which are either solved by:
integration routines, e.g. the routines from package
deSolve
,steadystate estimators, as from package
rootSolve
.
The data can be presented in two formats:

data table (long) format; this is a two to four column data.frame that contains the
name
of the observed variable (always the FIRST column), the (optional)value of the independent variable
(default column name = "time"), thevalue of the observation
and the (optional)value of the error
. For data presented in this format, the names of the column(s) with the independent variable (x
) and the name of the column that has the value of the dependent variabley
must be passed to functionmodCost
. 
crosstable (wide) format; this is a matrix, where each column denotes one dependent (or independent) variable; the column name is the name of the observed variable. When using this format, only the name of the column that contains the dependent variable must be specified (
x
).
As an example of both formats consider the data, called Dat
consisting
of two observed variables, called "Obs1" and "Obs2", both containing two
observations, at time 1 and 2:
name  time  val  err 
Obs1  1  50  5 
Obs1  2  150  15 
Obs2  1  1  0.1 
Obs2  2  2  0.2 
for the long format and
time  Obs1  Obs2 
1  50  1 
2  150  2 
for the crosstab format. Note, that in the latter case it is not possible to provide separate errors per data point.
By calling modCost several consecutive times (using the cost
argument),
it is possible to combine both types of data files.
Value
a list of type modCost
containing:
model 
one value, the model cost, which equals the sum of scaled variable costs (see details). 
minlogp 
one value, log(model probablity), where it is assumed
that the data are normally distributed, with standard deviation =

var 
the variable costs, a data.frame with, for each observed variable the following (see details):

residuals 
the data residual, a data.frame with several columns:

Note
In the future, it should be possible to have more than one independent variable present. This is not yet implemented, but it should allow e.g. to fit time series of spatially dependent variables.
Author(s)
Karline Soetaert <karline.soetaert@nioz.nl>
References
Soetaert, K. and Petzoldt, T., 2010. Inverse Modelling, Sensitivity and Monte Carlo Analysis in R Using Package FME. Journal of Statistical Software 33(3) 1–28. http://www.jstatsoft.org/v33/i03
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115  ## =======================================================================
## Type 1 input: name, time, value
## =======================================================================
## Create new data: two observed variables, "a", "b"
Data < data.frame(name = c(rep("a", 4), rep("b", 4)),
time = c(1:4, 2:5), val = c(runif(4), 1:4))
## "a nonsense model"
Mod < function (t, y, par) {
da < 0
db < 1
return(list(c(da, db)))
}
out < ode(y = c(a = 0.5, b = 0.5), times = 0:6, func = Mod, parms = NULL)
Data # Show
out
## The cost function
modCost(model = out, obs = Data, y = "val")
## The cost function with a data error added
Dat2 < cbind(Data, Err = Data$val*0.1) # error = 10% of value
modCost(model = out, obs = Dat2, y = "val", err = "Err")
## =======================================================================
## Type 2 input: Matrix format; column names = variable names
## =======================================================================
## logistic growth model
TT < seq(1, 100, 2.5)
N0 < 0.1
r < 0.5
K < 100
## analytical solution
Ana < cbind(time = TT, N = K/(1 + (K/N0  1) * exp(r*TT)))
## numeric solution
logist < function(t, x, parms) {
with(as.list(parms), {
dx < r * x[1] * (1  x[1]/K)
list(dx)
})
}
time < 0:100
parms < c(r = r, K = K)
x < c(N = N0)
## Compare several numerical solutions
Euler < ode(x, time, logist, parms, hini = 2, method = "euler")
Rk4 < ode(x, time, logist, parms, hini = 2, method = "rk4")
Lsoda < ode(x, time, logist, parms) # lsoda is default method
Ana2 < cbind(time = time, N = K/(1 + (K/N0  1) * exp(r * time)))
## the SSR and residuals with respect to the "data"
cEuler < modCost(Euler, Ana)$model
cRk4 < modCost(Rk4 , Ana)$model
cLsoda < modCost(Lsoda, Ana)$model
cAna < modCost(Ana2 , Ana)$model
compare < data.frame(method = c("euler", "rk4", "lsoda", "Ana"),
cost = c(cEuler, cRk4, cLsoda, cAna))
## Plot Euler, RK and analytic solution
plot(Euler, Rk4, col = c("red", "blue"), obs = Ana,
main = "logistic growth", xlab = "time", ylab = "N")
legend("bottomright", c("exact", "euler", "rk4"), pch = c(1, NA, NA),
col = c("black", "red", "blue"), lty = c(NA, 1, 2))
legend("right", ncol = 2, title = "SSR",
legend = c(as.character(compare[,1]),
format(compare[,2], digits = 2)))
compare
## =======================================================================
## Now suppose we do not know K and r and they are to be fitted...
## The "observations" are the analytical solution
## =======================================================================
## Run the model with initial guess: K = 10, r = 2
parms["K"] < 10
parms["r"] < 2
init < ode(x, time, logist, parms)
## FITTING algorithm uses modFit
## First define the objective function (model cost) to be minimised
## more general: using modFit
Cost < function(P) {
parms["K"] < P[1]
parms["r"] < P[2]
out < ode(x, time, logist, parms)
return(modCost(out, Ana))
}
(Fit<modFit(p = c(K = 10, r = 2), f = Cost))
summary(Fit)
## run model with the optimized value:
parms[c("K", "r")] < Fit$par
fitted < ode(x, time, logist, parms)
## show results, compared with "observations"
plot(init, fitted, col = c("green", "blue"), lwd = 2, lty = 1,
obs = Ana, obspar = list(col = "red", pch = 16, cex = 2),
main = "logistic growth", xlab = "time", ylab = "N")
legend("right", c("initial", "fitted"), col = c("green", "blue"), lwd = 2)
Cost(Fit$par)
