ctsmTMB: Methods for the 'ctsmTMB' R6 class

ctsmTMBR Documentation

Methods for the 'ctsmTMB' R6 class

Description

The following public methods are used to construct a stochastic state space model system, consisting of a set of stochastic differential equations (SDEs), and one or more algebraic observation equations (AOEs). The AOEs are used to infer information about the value of the (latent) states governed by the SDEs, and thus must be functions of at least one state.

Value

The function returns an object of class R6 and ctsmTMB, which can be used to define a stochastic state space system.

Methods

Public methods


Method new()

Initialize private fields

Usage
ctsmTMB$new()

Method .private()

Extract the private fields of a ctsmTMB model object. Primarily used for debugging.

Usage
ctsmTMB$.private()

Method addSystem()

Define stochastic differential equation(s) on the form

d<state> ~ f(t,<states>, <inputs>) * dt + g(t, <states>, <inputs>) * dw

Usage
ctsmTMB$addSystem(form, ...)
Arguments
form

a formula specifying a stochastic differential equation

...

additional formulas similar to form for specifying multiple equations at once.


Method addObs()

Define algebraic observation equations on the form

<observation> ~ h(t, <states>, <inputs>) + e)

where h is the observation function, and e is normally distributed noise with zero mean.

This function only specifies the observation name, and its mean through h.

Usage
ctsmTMB$addObs(form, ..., obsnames = NULL)
Arguments
form

a formula specifying an observation equation

...

additional formulas similar to form for specifying multiple equations at once.

obsnames

character vector specifying the name of the observation. This is used when the left-hand side of form consists of more than just a single variable (of class 'call').


Method setVariance()

Specify the variance of an observation equation.

A defined observation variable y in e.g. addObs(y ~ h(t,<states>,<inputs>) is perturbed by Gaussian noise with zero mean and variance to-be specified using setVariance(y ~ p(t,<states>,<inputs>). We can for instance declare setVariance(y ~ sigma_x^2 where sigma_x is a fixed effect parameter to be declared through setParameter.

Usage
ctsmTMB$setVariance(form, ...)
Arguments
form

formula class specifying the observation equation to be added to the system.

...

additional formulas identical to form to specify multiple observation equations at a time.


Method addInput()

Declare variables as data inputs

Declare whether a variable contained in system, observation or observation variance equations is an input variable. If e.g. the system equation contains an input variable u then it is declared using addInput(u). The input u must be contained in the data.frame .data provided when calling the estimate or predict methods.

Usage
ctsmTMB$addInput(...)
Arguments
...

variable names that specifies the name of input variables in the defined system.


Method setParameter()

Declare which variables that are (fixed effects) parameters in the specified model, and specify the initial optimizer guess, as well as lower / upper bounds during optimization. There are two ways to declare parameters:

  1. You can declare parameters using formulas i.e. setParameter( theta = c(1,0,10), mu = c(0,-10,10) ). The first value is the initial value for the optimizer, the second value is the lower optimization bound and the third value is the upper optimization bound.

  2. You can provide a 3-column matrix where rows corresponds to different parameters, and the parameter names are provided as rownames of the matrix. The columns values corresponds to the description in the vector format above.

Usage
ctsmTMB$setParameter(...)
Arguments
...

a named vector or matrix as described above.


Method setAlgebraics()

Add algebraic relations.

Algebraic relations is a convenient way to transform parameters in your equations. In the Ornstein-Uhlenbeck process the rate parameter theta is always positive, so estimation in the log-domain is a good idea. Instead of writing exp(theta) directly in the system equation one can transform into the log domain using the algebraic relation setAlgebraics(theta ~ exp(logtheta)). All instances of theta is replaced by exp(logtheta) when compiling the C++ function. Note that you must provide values for logtheta now instead of theta when declaring parameters through setParameter

Usage
ctsmTMB$setAlgebraics(form, ...)
Arguments
form

algebraic formula

...

additional formulas


Method setInitialState()

Declare the initial state values i.e. mean and covariance for the system states.

Usage
ctsmTMB$setInitialState(initial.state)
Arguments
initial.state

a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state.


Method setInitialVarianceScaling()

A scalar value that is multiplied onto the estimated initial state covariance matrix. The scaling is only applied when the initial state/cov is estimated, not when it is set by the user.

Usage
ctsmTMB$setInitialVarianceScaling(scaling)
Arguments
scaling

a numeric scalar value.


Method setLamperti()

Set a Lamperti Transformation

If the provided system equations have state dependent diffusion in a few available ways then it is advantageous to perform a transformation to remove the state dependence. This comes at the cost of a more complicated drift function. The following types of state-dependence is currently supported

  1. 'identity' - when the diffusion is state-independent (default)

  2. 'log' - when the diffusion is proportional to to x * dw

  3. 'logit' - when the diffusion is proportional to x * (1-x) * dw

  4. 'sqrt-logit' - when the diffusion is proportional to sqrt(x * (1-x)) * dw

Usage
ctsmTMB$setLamperti(transforms, states = NULL)
Arguments
transforms

character vector - one of either "identity, "log", "logit", "sqrt-logit"

states

a vector of the state names for which the specified transformations should be applied to.


Method setModelname()

Set modelname used to create the C++ file for TMB

When calling TMB::MakeADFun the (negative log) likelihood function is created in the directory specified by the setCppfilesDirectory method with name <modelname>.cpp

Usage
ctsmTMB$setModelname(name)
Arguments
name

string defining the model name.


Method setMAP()

Enable maximum a posterior (MAP) estimation.

Adds a maximum a posterior contribution to the (negative log) likelihood function by evaluating the fixed effects parameters in a multivariate Gaussian with mean and covariance as provided.

Usage
ctsmTMB$setMAP(mean, cov)
Arguments
mean

mean vector of the Gaussian prior parameter distribution

cov

covariance matrix of the Gaussian prior parameter distribution


Method getSystems()

Retrieve system equations.

Usage
ctsmTMB$getSystems()

Method getObservations()

Retrieve observation equations.

Usage
ctsmTMB$getObservations()

Method getVariances()

Retrieve observation variances

Usage
ctsmTMB$getVariances()

Method getAlgebraics()

Retrieve algebraic relations

Usage
ctsmTMB$getAlgebraics()

Method getInitialState()

Retrieve initially set state and covariance

Usage
ctsmTMB$getInitialState()

Method getParameters()

Get initial (and estimated) parameters.

Usage
ctsmTMB$getParameters(type = "all", value = "all")
Arguments
type

one of "all", free" or "fixed" parameters.

value

one of "all", initial", "estimate", "lower" or "upper"


Method getEstimate()

Retrieve initially set state and covariance

Usage
ctsmTMB$getEstimate()

Method getLikelihood()

Retrieve initially set state and covariance

Usage
ctsmTMB$getLikelihood()

Method getPrediction()

Retrieve initially set state and covariance

Usage
ctsmTMB$getPrediction()

Method getSimulation()

Retrieve initially set state and covariance

Usage
ctsmTMB$getSimulation()

Method estimate()

Estimate the fixed effects parameters in the specified model.

Usage
ctsmTMB$estimate(
  data,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  loss = "quadratic",
  loss_c = NULL,
  control = list(trace = 1, iter.max = 1e+05, eval.max = 1e+05),
  use.hessian = FALSE,
  laplace.residuals = FALSE,
  unconstrained.optim = FALSE,
  estimate.initial.state = FALSE,
  silent = FALSE
)
Arguments
data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

method

character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".

ode.solver

Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".

ode.timestep

numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

  1. Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.

  2. TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

loss

character vector. Sets the loss function type (only implemented for the kalman filter methods). The loss function is per default quadratic in the one-step residuals as is natural when the Gaussian (negative log) likelihood is evaluated, but if the tails of the distribution is considered too small i.e. outliers are weighted too much, then one can choose loss functions that accounts for this. The three available types available:

  1. Quadratic loss (quadratic).

  2. Quadratic-Linear (huber)

  3. Quadratic-Constant (tukey)

The cutoff for the Huber and Tukey loss functions are determined from a provided cutoff parameter loss_c. The implementations of these losses are approximations (pseudo-huber and sigmoid approximation respectively) for smooth derivatives.

loss_c

cutoff value for huber and tukey loss functions. Defaults to c=3

control

list of control parameters parsed to nlminb as its control argument. See ?stats::nlminb for more information

use.hessian

boolean value. The default (TRUE) causes the optimization algorithm stats::nlminb to use the fixed effects hessian of the (negative log) likelihood when performing the optimization. This feature is only available for the kalman filter methods without any random effects.

laplace.residuals

boolean - whether or not to calculate one-step ahead residuals using the method of oneStepPredict.

unconstrained.optim

boolean value. When TRUE then the optimization is carried out unconstrained i.e. without any of the parameter bounds specified during setParameter.

estimate.initial.state

boolean value. When TRUE the initial state and covariance matrices are estimated as the stationary solution of the linearized mean and covariance differential equations. When the system contains time-varying inputs, the first element of these is used.

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.


Method likelihood()

Construct and extract function handlers for the negative log likelihood function.

The handlers from TMB's MakeADFun are constructed and returned. This enables the user to e.g. choose their own optimization algorithm, or just have more control of the optimization workflow.

Usage
ctsmTMB$likelihood(
  data,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  loss = "quadratic",
  loss_c = NULL,
  estimate.initial.state = FALSE,
  silent = FALSE
)
Arguments
data

a data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

method

character vector specifying the filtering method used for state/likelihood calculations. Must be one of either "lkf", "ekf", "laplace".

ode.solver

Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".

ode.timestep

the time-step used in the filtering schemes. The time-step has two different uses depending on the chosen method.

  1. Kalman Filters: The time-step is used when numerically solving the moment differential equations.

  2. Laplace Approximation: The time-step is used in the Euler-Maruyama simulation scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

  1. Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.

  2. TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

loss

character vector. Sets the loss function type (only implemented for the kalman filter methods). The loss function is per default quadratic in the one-step residuals as is natural when the Gaussian (negative log) likelihood is evaluated, but if the tails of the distribution is considered too small i.e. outliers are weighted too much, then one can choose loss functions that accounts for this. The three available types available:

  1. Quadratic loss (quadratic).

  2. Quadratic-Linear (huber)

  3. Quadratic-Constant (tukey)

The cutoff for the Huber and Tukey loss functions are determined from a provided cutoff parameter loss_c. The implementations of these losses are approximations (pseudo-huber and sigmoid approximation respectively) for smooth derivatives.

loss_c

cutoff value for huber and tukey loss functions. Defaults to c=3

estimate.initial.state

boolean value. When TRUE the initial state and covariance matrices are estimated as the stationary solution of the linearized mean and covariance differential equations. When the system contains time-varying inputs, the first element of these is used.

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.


Method predict()

Perform prediction/filtration to obtain state mean and covariance estimates. The predictions are obtained by solving the moment equations n.ahead steps forward in time when using the current step posterior state estimate as the initial condition.

Usage
ctsmTMB$predict(
  data,
  pars = NULL,
  method = "ekf",
  ode.solver = "euler",
  ode.timestep = diff(data$t),
  k.ahead = nrow(data) - 1,
  return.k.ahead = 0:k.ahead,
  return.covariance = TRUE,
  initial.state = self$getInitialState(),
  estimate.initial.state = private$estimate.initial,
  silent = FALSE,
  use.cpp = FALSE
)
Arguments
data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

pars

fixed parameter vector parsed to the objective function for prediction/filtration. The default parameter values used are the initial parameters provided through setParameter, unless the estimate function has been run, then the default values will be those at the found optimum.

method

The prediction method

ode.solver

Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".

ode.timestep

numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

  1. Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.

  2. TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

k.ahead

integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).

return.k.ahead

numeric vector of integers specifying which n.ahead predictions to that should be returned.

return.covariance

boolean value to indicate whether the covariance (instead of the correlation) should be returned.

initial.state

a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state

estimate.initial.state

bool - stationary estimation of initial mean and covariance

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

use.cpp

a boolean to indicate whether to use C++ to perform calculations

Returns

A data.frame that contains for each time step the posterior state estimate at that time.step (k = 0), and the prior state predictions (k = 1,...,n.ahead). If return.covariance = TRUE then the state covariance/correlation matrix is returned, otherwise only the marginal variances are returned.


Method simulate()

Perform prediction/filtration to obtain state mean and covariance estimates. The predictions are obtained by solving the moment equations n.ahead steps forward in time when using the current step posterior state estimate as the initial condition.

Usage
ctsmTMB$simulate(
  data,
  pars = NULL,
  use.cpp = FALSE,
  method = "ekf",
  ode.solver = "rk4",
  ode.timestep = diff(data$t),
  simulation.timestep = diff(data$t),
  k.ahead = nrow(data) - 1,
  return.k.ahead = 0:k.ahead,
  n.sims = 100,
  initial.state = self$getInitialState(),
  estimate.initial.state = private$estimate.initial,
  silent = FALSE
)
Arguments
data

data.frame containing time-vector 't', observations and inputs. The observations can take NA-values.

pars

fixed parameter vector parsed to the objective function for prediction/filtration. The default parameter values used are the initial parameters provided through setParameter, unless the estimate function has been run, then the default values will be those at the found optimum.

use.cpp

a boolean to indicate whether to use C++ to perform calculations

method
  1. The natural TMB-style formulation where latent states are considered random effects and are integrated out using the Laplace approximation. This method only yields the gradient of the (negative log) likelihood function with respect to the fixed effects for optimization. The method is slower although probably has some precision advantages, and allows for non-Gaussian observation noise (not yet implemented). One-step / K-step residuals are not yet available in the package.

  2. (Continuous-Discrete) Extended Kalman Filter where the system dynamics are linearized to handle potential non-linearities. This is computationally the fastest method.

  3. (Continuous-Discrete) Unscented Kalman Filter. This is a higher order non-linear Kalman Filter which improves the mean and covariance estimates when the system display high nonlinearity, and circumvents the necessity to compute the Jacobian of the drift and observation functions.

All package features are currently available for the kalman filters, while TMB is limited to parameter estimation. In particular, it is straight-forward to obtain k-step-ahead predictions with these methods (use the predict S3 method), and stochastic simulation is also available in the cases where long prediction horizons are sought, where the normality assumption will be inaccurate

ode.solver

Sets the ODE solver used in the Kalman Filter methods for solving the moment differential equations. The default "euler" is the Forward Euler method, alternatively the classical 4th order Runge Kutta method is available via "rk4".

ode.timestep

numeric value. Sets the time step-size in numerical filtering schemes. The defined step-size is used to calculate the number of steps between observation time-points as defined by the provided data. If the calculated number of steps is larger than N.01 where N is an integer, then the time-step is reduced such that exactly N+1 steps is taken between observations The step-size is used in the two following ways depending on the chosen method:

  1. Kalman filters: The time-step is used as the step-size in the numerical Forward-Euler scheme to compute the prior state mean and covariance estimate as the final time solution to the first and second order moment differential equations.

  2. TMB method: The time-step is used as the step-size in the Euler-Maruyama scheme for simulating a sample path of the stochastic differential equation, which serves to link together the latent (random effects) states.

simulation.timestep

timestep used in the euler-maruyama scheme

k.ahead

integer specifying the desired number of time-steps (as determined by the provided data time-vector) for which predictions are made (integrating the moment ODEs forward in time without data updates).

return.k.ahead

numeric vector of integers specifying which n.ahead predictions to that should be returned.

n.sims

number of simulations

initial.state

a named list of two entries 'x0' and 'p0' containing the initial state and covariance of the state

estimate.initial.state

bool - stationary estimation of initial mean and covariance

silent

logical value whether or not to suppress printed messages such as 'Checking Data', 'Building Model', etc. Default behaviour (FALSE) is to print the messages.

return.covariance

boolean value to indicate whether the covariance (instead of the correlation) should be returned.

Returns

A data.frame that contains for each time step the posterior state estimate at that time.step (k = 0), and the prior state predictions (k = 1,...,n.ahead). If return.covariance = TRUE then the state covariance/correlation matrix is returned, otherwise only the marginal variances are returned.


Method print()

Function to print the model object

Usage
ctsmTMB$print()

Method clone()

The objects of this class are cloneable with this method.

Usage
ctsmTMB$clone(deep = FALSE)
Arguments
deep

Whether to make a deep clone.

Examples

library(ctsmTMB)
model <- ctsmTMB$new()

# adding a single system equations
model$addSystem(dx ~ theta * (mu+u-x) * dt + sigma_x*dw)

# adding an observation equation and setting variance
model$addObs(y ~ x)
model$setVariance(y ~ sigma_y^2)

# add model input
model$addInput(u)

# add parameters
model$setParameter(
  theta   = c(initial = 1, lower=1e-5, upper=50),
  mu      = c(initial=1.5, lower=0, upper=5),
  sigma_x = c(initial=1, lower=1e-10, upper=30),
  sigma_y = 1e-2
)

# set the model initial state
model$setInitialState(list(1,1e-1))

# extract the likelihood handlers
nll <- model$likelihood(data=Ornstein)

# calculate likelihood, gradient and hessian w.r.t parameters
nll$fn(nll$par)
nll$gr(nll$par)
nll$he(nll$par)

# estimate the parameters using an extended kalman filter
fit <- model$estimate(data=Ornstein)

# perform moment predictions
pred <- model$predict(data=Ornstein)

# perform stochatic simulations
sim <- model$simulate(data=Ornstein, n.sims=10)


ctsmTMB documentation built on April 12, 2025, 1:45 a.m.