Fitting a dynamical confidence model to a data set"
In dynConfiR: Dynamic Models for Confidence and Response Time Distributions

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(dynConfiR)
library(dplyr)
library(ggplot2)

This vignette shows how to fit the models and use model parameters to predict and simulate new data. It covers only how to get a quick fit to a single participant (i.e. only one set of fitted parameters) for a single model. The wrapper function fitRTConfModels may be used to fit several models to several independent participants in parallel and with one command, which usually takes quite a while (around several days depending also on the number of trials per participant).

The dynamic weighted evidence and visibility model (dynWEV) is one possible model. See https://osf.io/9jfqr/ for the theoretical description of this and the other models.

Import Data

The ConfidenceOrientation dataset is included in the package and can be loaded after the package is loaded. We subset to work only with the data from one participant and pick only the theoretically relevant columns.

data("ConfidenceOrientation")
part8 <- ConfidenceOrientation %>%
  filter(participant == 8) %>%
  select(SOA, stimulus, response, rt, disc_rating)
head(part8)

Call fitting function

A specific model, in this example the dynWEV model, is fitted With a simple call to the function fitRTConf.

The fitting functions either require the columns condition, stimulus, response (or correct), rt, and rating to be present in the data argument or alternatively arguments that specify the mapping of column names to the required variables with e.g. condition="SOA". For simplicity we rename the column names before fitting, to have congruent column names in data and later predictions.

As the fitting procedure takes some time to deliver reasonable fits, we load previously fitted parameters into the environment (hidden code) and comment the actual function call.

To get a fast (but probably rather inaccurate) fit that is likely to represent only a locally minimizing maximum likelihood estimation one may use several strategies: - the argument grid_search=FALSE prevents the time consuming grid search for the best starting values - adjusting the values in the opts argument, which is a list, reduces the time of the actual optimization procedure - "nAttempts"=1 forces only the best set of initial parameters to be optimized - "nRestarts"=1 if one does not want to restart the optimization routine several times - one may also restrict some parameters that thus not have to be fit. Note, that this has theoretical implications on what the model may predict. There are several common restrictions - an unbiased starting point: z=0.5 - no variation in starting point: sz=0 - symmetric confidence thresholds for both possible responses: sym_thetas = TRUE - no variation in non-decision time: st0=0

Finally, as in this experiment confidence was reported simultaneously with the discrimination response, we set restr_tau= "simult_conf" (see documentation).

parfit <- data.frame(v1 = 0.0372688024414027, v2 = 0.0297559327941849, 
    v3 = 0.228682139296959, v4 = 0.907332624555809, v5 = 1.51928135797365, 
    sv = 0.703366746805957, a = 1.9909337760955, z = 0.484042545790362, 
    sz = 0.968085010668281, t0 = 0.013792576932083, st0 = 0.50473451191079839, 
    thetaLower1 = 0.977932449428557, thetaLower2 = 1.4002660381916, 
    thetaLower3 = 1.65136147933687, thetaLower4 = 1.82689584748271, 
    thetaUpper1 = 0.894965128071132, thetaUpper2 = 1.21972431731305, 
    thetaUpper3 = 1.67435525566997, thetaUpper4 = 1.85889858426025, 
    tau = 1.49783458056779, w = 0.632631960506194, svis = 0.00228472967185629, 
    sigvis = 0.0698971790853653, fixed = "sym_thetas = TRUE", 
    negLogLik = 3130.4865270751, N = 1611L, k = 23L, BIC = 6430.81909296327, 
    AICc = 6307.61073530962, AIC = 6306.9730541502)

part8 <- part8 %>% rename(condition=SOA, 
                          rating = disc_rating)
# parfit <- fitRTConf(part8, "dynWEV",
#                     restr_tau="simult_conf")
parfit

Prediction

After fitting the parameters to the empirical data, these parameters are used to compute the response and response time distribution predicted by the model. One could use the high-level functions predictConf and predictRT here (again for several participants and/or models there are the wrappers predictConfModels and predictRTModels), but we use predictWEV_Conf and predictWEV_Conf to specify the precision argument to speed up the computations.

predictedResponses <- 
  predictWEV_Conf(parfit, "dynWEV", simult_conf = TRUE, 
                  precision = 1e-3, maxrt = 5, subdivisions = 50)
predictedRTdist <- 
  predictWEV_RT(parfit, "dynWEV", simult_conf = TRUE, 
  maxrt = 5, precision = 1e-3, subdivisions = 50,
  scaled=TRUE, DistConf = predictedResponses)
print(head(predictedResponses))
print(head(predictedRTdist))

Visual comparison

The predicted distributions may be visually compared to the empirical distributions to check how accurately the model fits the data. Therefore, we transform the condition column in the prediction data sets to fit the one in the empirical data and aggregate the data sets over stimulus and response identity and distinguish only correct and incorrect responses.

part8 <- part8 %>% 
  mutate(condition = as.factor(condition), 
         correct = as.numeric(stimulus==response))
empirical_response_dist <- part8 %>% 
  group_by(condition) %>% 
  mutate(ntrials = n()) %>%
  group_by(correct, condition, rating) %>%
  summarise(p = n()/ntrials[1], .groups = "drop")
predictedResponses <- predictedResponses %>%
  mutate(condition = factor(condition, labels=levels(part8$condition))) %>%
  group_by(correct, condition, rating) %>%
  summarise(p = mean(p), .groups = "drop") 

predictedRTdist <- predictedRTdist %>%
  mutate(condition = factor(condition, labels=levels(part8$condition))) %>%
  group_by(correct, rating, rt) %>%
  summarise(dens = mean(dens),
            densscaled = mean(densscaled), .groups = "drop")

ggplot(empirical_response_dist, aes(x=rating, y=p)) +
  geom_bar(aes(fill=as.factor(correct)), stat="identity")+
  geom_point(data=predictedResponses) +
  scale_fill_discrete(name="Accuracy")+
  facet_grid(cols=vars(correct), rows=vars(condition))

ggplot(subset(part8, rt<18), aes(x=rt, color=as.factor(correct))) +
  geom_density(aes(linetype="Observed"), size=1.2)+
  geom_line(data = predictedRTdist, 
            aes(y=densscaled, linetype="Prediction"), 
            size=1.2)+
  scale_color_discrete(name="Accuracy")+
  scale_linetype_discrete(name="")+
  theme(legend.position = "bottom")+
  xlim(0, 5)+
  facet_grid(rows=vars(rating), cols=vars(correct))