gaussianWeights: A kernel average smoother function to weigh residuals...

View source: R/gaussianWeights.R

gaussianWeightsR Documentation

A kernel average smoother function to weigh residuals according to a Gaussian density function This function is still experimental... use with care

Description

A calibration dataset in database format (cf. modCost for the database format) is extended in order to fit model output using a weighted least squares approach. To this end, the observations are replicated for a certain number of times, and weights are assigned to the replicates according to a Gaussian density function. This density has the relevant observation as mean value. The standard deviation, provided as a parameter, determines the number of inserted replicate observations (see Detail).

This weighted regression approach may be interesting when discontinuities exist in the observational data. Under these circumstances small changes in the timing (or more general the position along the axis of the independent variable) of the model output may have a disproportional impact on the overall goodness-of-fit (e.g. timing of nutrient depletion). Additionally, this approach may be used to model uncertainty in the independent variable (e.g. slices of sediment profiles, or the timing of a sampling).

Usage

gaussianWeights (obs, x = x, y = y, xmodel, spread, weight = "none",
                 aggregation = x ,ordering)

Arguments

obs

dataset in long (database) format as is typically used by modCost

x

name of the independent variable (typically x, cf. modCost) in obs. Defaults to x (not given as character string; cf. subset)

y

name of the dependent variable in obs. Defaults to y.

xmodel

an ordered vector of unique times at which model output is produced. If not given, the independent variable of the observational dataset is used.

spread

standard deviation used to calculate the weights from a normal density function. This value also determines the number of points from the model output that are compared to a specific observa- tion in obs (2 * 3 * spread + 1; containing 99.7% of the Gaussian distribution, centered around the observation of interest).

weight

scaling factor of the modCost function ("sd", "mean", or "none"). The Gaussian weights are multiplied by this factor to account for differences in units.

aggregation

vector of column names from the dataset that are used to aggregate observations while calculating the scaling factor. Defaults to the variable name, "name".

ordering

Optional extra grouping and ordering of observations. Given as a vector of variable names. If none given, ordering will be done by variable name and independent variable. If both aggregation and ordering variables are given, ordering will be done as follows: x within ordering (in reverse order) within aggregation (in reverse order). Aggregation and ordering should be disjoint sets of variable names.

Details

Suppose: spread = 1/24 (days; = 1 hour) x = time in days, 1 per hour

Then: obs_i is replicated 7 times (spread = observational periodicity = 1 hour):

=> obs_i-3 = ... = obs_i-1 = obs_i = obs_i+1 = ... = obs_i+3

The weights (W_i+j, for j = -3 ...3) are calculated as follows: W'_i+j = 1/(spread * sqrt(2pi)) * exp(-1/2 * ((obs_i+j - obs_i)/spread)^2

W_i+j = W'_i+j/sum(W_i-3,...,W_i+3) (such that their sum equals 1)

Value

A modified version of obs is returned with the following extensions:

1. Each observation obs[i] is replicated n times were n represents the number of modelx values within the interval [obs_i - (3 * spread), obs_i + 3 * spread)].

2. These replicate observations get the same x values as their modeled counterparts (xmodel).

3. Weights are given in column, called "err"

The returned data frame has the following columns:

  • "name" or another name specified by the first element of aggregation. Usually this column contains the names of the observed variables.

  • "x" or another name specified by x

  • "y" or another name specified by y

  • "err" containing the calculated weights

  • The rest of the columns of the data frame given by obs in that order.

Author(s)

Tom Van Engeland <tom.vanengeland@nioz.nl>

Examples

## =======================================================================
## A Sediment example
## =======================================================================

## Sediment oxygen concentration is measured every
## centimeter in 3 sediment types
depth <- 0:7
observations <- data.frame(
                    profile = rep(c("mud","silt","sand"), each=8),
                    depth   = depth,
                    O2      = c(c(6,1,0.5,0.1,0.05,0,0,0),
                                c(6,5,3,2,1.5,1,0.5,0),
                                c(6,6,5,4,3,2,1,0)
                              )
                )

## A model generates profiles with a depth resolution of 1 millimeter
modeldepths <- seq(0, 9, by = 0.05)

## All these model outputs are compared with  weighed observations.
gaussianWeights(obs = observations, x = depth, y = O2,
                xmodel = modeldepths,
                spread = 0.1, weight = "none", 
                aggregation = profile)



# Weights of one observation in silt at depth 2:
Sub <- subset(observations, subset = (profile == "silt" & depth == 2))
plot(Sub[,-1])
SubWW <- gaussianWeights(obs = Sub, x = depth, y = O2, 
                xmodel = modeldepths, spread = 0.5, 
                weight="none", aggregation = profile)
SubWW[,-1]

FME documentation built on July 9, 2023, 3:07 p.m.