ddm: Estimation of 5-Parameter DDM

View source: R/ddm.R

ddmR Documentation

Estimation of 5-Parameter DDM

Description

Fit the 5-parameter DDM (Diffusion Decision Model) via maximum likelihood estimation. The model for each DDM parameter can be specified symbolically using R's formula interface. With the exception of the drift rate (which is always estimated) all parameters can be either fixed or estimates.

Usage

ddm(
  drift,
  boundary = ~1,
  ndt = ~1,
  bias = 0.5,
  sv = 0,
  data,
  optim = "nlminb",
  args_optim = list(init = NULL, lo_bds = NULL, up_bds = NULL, control = list()),
  args_ddm = list(err_tol = 1e-06),
  use_gradient = TRUE,
  compiled_model = TRUE,
  model = TRUE,
  mmatrix = TRUE,
  response = TRUE,
  na.action,
  subset,
  contrasts = NULL
)

Arguments

drift

Two-sided formula. The left-hand side describes the response, the right-hand side provides a symbolic description of the regression model underlying the drift rate (v). The left-hand side needs to specify the column in the data containing response time and corresponding binary decision, concatenated by + , e.g., rt + response ~ ...

boundary, ndt, bias, sv

Either a one-sided formula providing a symbolic description of the regression model or a scalar number given the value this parameter should be fixed to. Boundary separation (a), non-decision time (t0), relative initial bias (w), or inter-trial variability in the drift rate (sv)

data, na.action, subset

arguments controlling formula processing via model.frame.

optim

character string or fitting function indicating which numerical optimization method should be used. The default "nlminb" uses the corresponding function.

args_optim

named list of additional arguments for the optimization function. The available options are:

  • init vector containing the initial values to be used in the optimization for each coefficient. Note that the number and type of coefficients used (i.e., intercept or difference) is determined by the model matrices, which are in turn determined by the formulas assigned to each DDM parameter (see the Details section for an overview of formulas and coefficients). For an example, run the ddm() function with parameter set to its default value, and view the initial value vector via the slot $optim_info$args_optim$init. By default for intercept coefficients, the initial values for the drift rate (v), boundary separation (a), and non-decision time (t0) will be generated using EZ-Diffusion (Wagenmakers et al. 2007); bias (w) is initialized to 0.5; inter-trial variability in the drift rate (sv) is initialized to 0.0. Difference coefficients are initialized to 0.0.

  • lo_bds vector containing the lower bounds to be used in the optimization for each coefficient. The same consideration of coefficient types occurs here as it does with init. The defaults for intercept coefficients are: drift rate v > -Inf, boundary separation a > 0, non-decision time t0 > 0, bias w > 0, inter-trial variability in the drift rate sv \le 0. All difference coefficients have a lower bound of -Inf.

  • up_bds vector containing the upper bounds to be used in the optimization for each coefficient. The same consideration of coefficient types occurs here as it does with init. The defaults for intercept coefficients are: drift rate v < Inf, boundary separation a < Inf, non-decision time t0 < Inf, bias w < 1, inter-trial variability in the drift rate sv < Inf. All difference coefficients have a upper bound of Inf.

  • control additional control arguments passed to control argument of optimization function specified in optim

args_ddm

named list of additional arguments passed to density function calculation. Currently, the only option for this is err_tol, the desired error tolerance used in the density function calculation.

use_gradient

logical. Should gradient be used during numerical optimization? Default is TRUE.

compiled_model, model, mmatrix, response

logicals. If TRUE the corresponding components of the fit (the compiled model object, the model frame, the model matrix, the response matrix) are returned.

contrasts

optional list. See the contrasts.arg of model.matrix.default

Details

ddm uses model.matrix for transforming the symbolic description of the regression model underlying each parameter into estimated coefficients. The following provides a few examples:

  • ~ 1 estimates a single coefficient, named (Intercept)

  • ~ condition estimates the intercept plus k - 1 coefficients for a factor with k levels (e.g., intercept plus one coefficient if condition has two levels). The interpretation of the coefficients depend on the factor contrasts employed, which are usually based on the contrasts specified in options("contrasts"). For the default treatment contrasts (contr.treatment), the intercept corresponds to the first factor level and the additional coefficients correspond to the difference from the intercept (i.e., first factor level). When using contr.sum the intercept corresponds to the grand mean and the additional coefficients correspond to the differences from the grand mean.

  • ~ 0 + condition estimates no intercept but one coefficient per factor level. This specification can also be used to get one coefficient per cell for a multi-factorial design, e.g., ~ 0 + condition1:condition2.

  • ~ 0 + condition1 + condition1:condition2 estimates one "intercept" per level of condition1 factor (which is not called intercept) plus k - 1 difference parameters from the condition-specific intercept for the k-levels of condition2. The interpretation of the difference parameters again depends on the contrasts used (e.g., treatment vs. sum-to-zero contrasts, see examples). This formula specification can often make sense for the drift rate when condition1 is the factor (such as item type) mapped to upper and lower response boundary of the DDM and condition2 is another factor by which we want the drift rate to differ. This essentially gives one overall drift rate per response boundary plus the differences from this overall one (note again that with treatment contrasts this overall drift rate is the first factor level of condition2).

To get meaningful results it is necessary to estimate separate drift rates for the different condition/item-types that are mapped onto the upper and lower boundary of the diffusion model.

If a non-default fitting function is used, it needs to minimize the negative log-likelihood, accept the following arguments, init, objective, gradient, lower, upper, control , and return a list with the following arguments coefficients, loglik, converged, optim (where converged is boolean and optim can be an arbitrary list with additional information).

Value

Object of class ddm (i.e., a list with components as listed below) for which a number of common methods such as print, coef, and logLik are implemented, see ddm-methods.

  • coefficients a named list whose elements are the values of the estimated model parameters

  • dpar a character vector containing the names of the estimated model parameters

  • fixed_dpar a named list whose elements are the values of the fixed model parameters

  • loglik the value of the log-likelihood function at the optimized parameter values

  • hessians a named list whose elements are the individual Hessians for each of the model parameters

  • vcov a named list whose elements are the individual variance-covariance matrices for each of the model parameters

  • nobs the number of observations in the data used for fitting

  • npar the number of parameters used to fit the model (i.e., the estimated model parameters plus any hyperparameters)

  • df.residual the residual degrees of freedom (the number of observations - the number of parameters)

  • call the original function call to ddm

  • formula the formulas used in the model (1 indicates that the model parameter was estimated with a single coefficient; 0 indicates that the model parameter was fixed)

  • dpar_formulas a named list whose elements are the formulas for the model parameters (1 indicates that the model parameter was estimated with a single coefficient; 0 indicates that the model parameter was fixed)

  • na.action na.action

  • terms a named list whose elements are , and whose last element is named full and shows the breakdown of the model with all model parameters

  • levels a named list whose elements are the levels associated with any parameters that are factors (the elements are NULL if the parameter is not a factor), and whose last element is named FULL and shows all of the levels used in the model

  • contrasts a named list whose elements are the type of contrasts used in the model

  • args_ddm a named list whose elements are the optional arguments used in the calculation of the DDM log-likelihood function

  • link a named list whose elements show information about the link function used for each model parameter (currently the only link function is the identity function)

  • converged a logical indicating whether the optimization converged (TRUE) or not (FALSE)

  • optim_info a named list whose elements are information about the optimization process (e.g., the name of the algorithm used, the final value of the objective function, the number of evaluations of the gradient function, etc.)

  • model the data used in the model (might need to check this)

  • response the response data used in the model

  • mmatrix a named list whose elements are the model matrices for each of the estimated parameters

  • compiled_model C++ object that contains the compiled model (see list below for more details)

The C++ object accessible via the compiled_model component of the above R object of class ddm contains the following components:

  • rt a numeric vector of the response time data used in the model

  • response an integer vector of the response data used in the model (coded such that 1 corresponds to the "lower" boundary and 2 corresponds to the "upper" boundary)

  • err_tol the error tolerance used in the calculations for fitting the DDM

  • coefficients a numeric vector containing the current set of coefficients for the formulas provided to the ddm() function call; the coefficients correspond to the DDM parameters in the following order: v, a, t0, w, sv

  • likelihood a double containing the log-likelihood for the current set of coefficients (note this can be changed by calling the function calculate_loglik() below)

  • modmat_v a numeric matrix containing the model matrix for v, the drift rate, determined by the formula input to the argument drift in the ddm() function call

  • modmat_a a numeric matrix containing the model matrix for a, the boundary separation, determined by the formula input to the argument boundary in the ddm() function call

  • modmat_t0 a numeric matrix containing the model matrix for t0, the non-decision time, determined by the formula input to the argument ndt in the ddm() function call

  • modmat_w a numeric matrix containing the model matrix for w, the inital bias, determined by the formula input to the argument bias in the ddm() function call

  • modmat_sv a numeric matrix containing the model matrix for sv, the inter-trial variability in the drift rate, determined by the formula input to the argument sv in the ddm() function call

  • hess_v a numeric matrix containing the Hessian for v, the drift rate, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • hess_a a numeric matrix containing the Hessian for a, the boundary separation, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • hess_t0 a numeric matrix containing the Hessian for t0, the non-decision time, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • hess_w a numeric matrix containing the Hessian for w, the initial bias, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • hess_sv a numeric matrix containing the Hessian for sv, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • vcov_v a numeric matrix containing the variance-covariance matrix for v, the drift rate, whose dimensions are determined by the formula input to the argument drift in the ddm() function call

  • vcov_a a numeric matrix containing the variance-covariance matrix for a, the boundary separation, whose dimensions are determined by the formula input to the argument boundary in the ddm() function call

  • vcov_t0 a numeric matrix containing the variance-covariance matrix for t0, the non-decision time, whose dimensions are determined by the formula input to the argument ndt in the ddm() function call

  • vcov_w a numeric matrix containing the variance-covariance matrix for w, the inital bias, whose dimensions are determined by the formula input to the argument bias in the ddm() function call

  • vcov_sv a numeric matrix containing the variance-covariance matrix for v, the inter-trial variability in the drift rate, whose dimensions are determined by the formula input to the argument sv in the ddm() function call

  • calculate_loglik calculates and returns a double containing the negated log-likelihood (note that this will overwrite the likelihood component of the C++ object)

  • calculate_gradient calculates and returns a numeric vector of the negated gradients for the provided coefficient values; the gradients are stored in the same manner as their corresponding coefficents (note that this will overwrite the likelihood) component of the C++ object)

  • calculate_hessians calculates and returns a named list of the negated Hessians for each model parameter for the provided coefficient values (note that this will overwrite the likelihood, hess_v, hess_a, hess_t0, hess_w, and hess_sv components of the C++ object)

  • calculate_vcov calculates and returns a named list of the variance-covariance matrices for each model parameter for the stored coefficients (note that this will overwrite the likelihood, hess_v, hess_a, hess_t0, hess_w, hess_sv, vcov_v, vcov_a, vcov_t0, vcov_w, and vcov_sv components of the C++ object)

  • calculate_standard_error calculates and returns a numeric vector of the standard errors of the stored coefficients; the standard errors are stored in the same manner as their corresponding coefficients (note that this will overwrite the likelihood, hess_v, hess_a, hess_t0, hess_w, hess_sv, vcov_v, vcov_a, vcov_t0, vcov_w, and vcov_sv components of the C++ object)

Examples

# prepare data
data(med_dec, package = "fddm")
med_dec <- med_dec[which(med_dec[["rt"]] >= 0), ] ## only use valid RTs
## select data from one participant
p1 <- med_dec[med_dec[["id"]] == 2 & med_dec[["group"]] == "experienced", ]
head(p1)


##----------------------------------------------
##        Easiest: Fitting using emmeans       -
##----------------------------------------------

## Because we use an ANOVA approach, we set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))

fit0 <- ddm(rt + response ~ classification*difficulty, data = p1)
summary(fit0)

## for more tests, we use emmeans:
if (requireNamespace("emmeans")) {
# for ANOVA table:
emmeans::joint_tests(fit0)
# get conditional main effects of difficulty (for each level of classification):
emmeans::joint_tests(fit0, by = "classification")

# get mean drift rates per condition:
em1 <- emmeans::emmeans(fit0, "difficulty", by = "classification")
em1
# compare mean drift rates per condition
pairs(em1)
update(pairs(em1), by = NULL, adjust = "holm")
}

options(op) # reset contrasts


##----------------------------------------------------------------
##              Fitting with custom parametrisation              -
##----------------------------------------------------------------

## one drift rate per classification by difficulty design cell
fit1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)
summary(fit1)

## set default contrasts (just in case contrasts have been changed)
op <- options(contrasts = c('contr.treatment', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to first level of difficulty factor (easy)
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard - easy)
fit1b <- ddm(rt + response ~ 0 + classification + classification:difficulty,
             data = p1, args_optim = list(control = list(iter.max = 1000,
                                                         eval.max = 1000)))
summary(fit1b)
options(op) # reset contrasts

## set orthogonal sum-to-zero contrasts
op <- options(contrasts = c('contr.sum', 'contr.poly'))
## one drift rate "intercept" per classification condition (blast vs. non-blast)
## corresponding to mean drift rate for the classification condition
## plus one further coefficient per classification condition corresponding to
## difference from "intercept" (hard/easy - mean drift rate)
fit1c <- ddm(rt + response ~ 0 + classification + classification:difficulty,
             data = p1)
summary(fit1c)
options(op) ## reset contrasts

## all variants produce same fit, only meaning of parameters differs
logLik(fit1)
logLik(fit1b)
logLik(fit1c)
logLik(fit0) ## also model above

## all models estimate same drift rates, but in different parametrisation:
coef(fit1)  ## drift rates per design cell
## same drift rates based on fit1b:
c(coef(fit1b)[1:2],
  coef(fit1b)[1] + coef(fit1b)[3], coef(fit1b)[2] + coef(fit1b)[4])
## same drift rates based on fit1c:
c(coef(fit1c)[1] + coef(fit1c)[3], coef(fit1c)[2] + coef(fit1c)[4],
  coef(fit1c)[1] - coef(fit1c)[3], coef(fit1c)[2] - coef(fit1c)[4])

# we can estimate a model that freely estimates response bias
# (instead of fixing it at 0.5)
fit2 <- ddm(rt + response ~ 0 + classification:difficulty, bias = ~1, data = p1)
summary(fit2)
## Note: estimating bias only makes sense in situations like here where the
## response boundaries do not correspond to correct/incorrect but to the
## actual responses participants gave (here: blast vs. non-blast classification)

## now let's perform a likelihood ratio test to check if estimating response
## bias freely leads to a significant increase in model fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
  lmtest::lrtest(fit1, fit2)
  ## does not look like it (p = 0.1691)
}


# we can also make a DDM parameter, such as boundary, depend on a numeric
# variable, such as block number
fit3 <- ddm(rt + response ~ 0 + classification:difficulty,
            boundary = ~ block, data = p1)
summary(fit3)

## does making boundary depend on block leads to a significant increase in model
## fit?
if (requireNamespace("lmtest")) { ## requires package lmtest
  lmtest::lrtest(fit1, fit3)
  ## does not look like it (p = 0.198)
}


##----------------------------------------------------------------
##              Fitting with optimization arguments              -
##----------------------------------------------------------------
## example of how to use your own initial values and bounds for the optimization
## of the coefficients (determined by the model matrices/formulas)
options(op) # reset contrasts

# this uses the default generated initial values and bounds
fitex0 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1)

# we can see the number of coefficients (and thus the number of initial values
# and bounds) required if we wish to use our own
fitex0$optim_info$args_optim$init
fitex0$optim_info$args_optim$lo_bds
fitex0$optim_info$args_optim$up_bds
# -1.9270279  2.5408864 -1.9270279  0.3181987  1.7907438  0.1264035
# -Inf        -Inf      -Inf        -Inf       0          0
# Inf         Inf       Inf         Inf        Inf        0.461
# the first four coefficients are for the drift rate (given by the formula we
# provided), and the last two are for the boundary separation and non-decision
# time, respectively (note that the default is to estimate these two model
# parameters with a single coefficient).

# to use our own initial values, we can include them in the `args_optim` list,
# but note that they must be within the generated bounds
fitex1 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
              args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3)))

# to use our own bounds, we again can include them in the `args_optim` list,
# but note that they must contain the generated initial values
fitex2 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
              args_optim = list(lo_bds = c(-20, -20, -20, -20, 0, 0),
                                up_bds = c(20, 20, 20, 20, 10, 0.5)))

# to use both our own initial values and bounds, we include them in the
# `args_optim` list, and the bounds must contain the initial values
fitex3 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
              args_optim = list(init = c(-1.5, 1, -1, 1, 1.5, 0.3),
                                lo_bds = c(-20, -20, -20, -20, 0, 0),
                                up_bds = c(20, 20, 20, 20, 10, 0.5)))

# the only other option in the `args_optim` list is the control parameters
# directly used by the optimization function (e.g., the maximum number of
# iterations); here we'll set the maximum number of iterations and function
# evaluations
fitex4 <- ddm(rt + response ~ 0 + classification:difficulty, data = p1,
              args_optim = list(control = list(iter.max = 1000,
                                               eval.max = 1000)))

fddm documentation built on June 26, 2024, 5:10 p.m.