cross_val_tmax: Cross-validation for t_max estimation

View source: R/model_setup_helpers.R

cross_val_tmaxR Documentation

Cross-validation for t_max estimation

Description

Denison et al. (2020) report large variance in the optimal t_max parameters between subjects. This function can thus be used to recover the optimal parameter for subjects using cross-validation. The same procedure utilized by Denison et al. (2020) is adopted here: 1/n trials are held-out and the model is fitted on the remaining trials. The error between the average on the held-out set and the model prediction is then taken as the cross-validation error. The held-out set cycles through the entire data-set resulting in n repetitions and n cross-validation errors. The average cross-validation error for a specific t_max is then reported.

Usage

cross_val_tmax(
  cand_tmax,
  folds,
  pulse_spacing,
  trial_data,
  factor_id = "subject",
  model = "WIER_SHARED",
  n = 10.1,
  f = 1/(10^24),
  drop_last = 500,
  maxiter_inner = 10000,
  maxiter_outer = 25,
  convergence_tol = 1e-08,
  start_lambda = 0.1,
  should_accum_H = F,
  init_cf = NULL,
  expand_by = 800,
  sample_length = 20,
  time_id = "time",
  pupil_id = "pupil",
  should_plot = T
)

Arguments

cand_tmax

vector with all t_max values to be considered

folds

list of vectors, each vector corresponds to fold and contains trial values to be held-out in that fold!

pulse_spacing

Model pulses every 'pulse_spacing' samples. Setting this to 1 ensures 1 pulse every sample

trial_data

trial-level data with a time and pupil column. Also needs a factor column

factor_id

Name of the factor column. Model will estimate demand trajectory for each level of this factor

model

Model template.

n

Choice for parameter defined by Hoeks & Levelt (number of laters)

f

Choice for parameter defined by Wierda et al. (scaling factor), can also be a vector with values for each t_max candidate

drop_last

Drop pulses that would happen in the last drop_last ms

maxiter_inner

Maximum steps taken by inner optimizer

maxiter_outer

Maximum steps taken by outer optimizer

convergence_tol

Convergence check to terminate early

start_lambda

Initial lambda value. Must be > 0 if a penalty should be used! Setting this to 0 and maxiter_outer=1, leads to estimation of an un-penalized additive model, i.e., recovers the traditional NNLS estimate used by Wierda et al. (2012) and Denison et al. (2012).

should_accum_H

Whether Hessian should be approximated using BFGS rule or not. If not, then least squares Hessian matrix is used. With the BFGS rule models ended up being much smoother in our simulations. So this should be set to true if under-smoothing is observed. However, the BFGS update is much more costly and takes much more time!

init_cf

NULL or vector with initial coefficient estimate

expand_by

Time in ms by which to expand the time-series in the past. Then pulses that happened before the recorded time-window can still be approximated! See artificial_data_analysis vignette for details.

sample_length

Duration in ms of a single sample. If pupil dilation time-course was down-sampled to 50HZ, set this to 20

time_id

Name of time column in trial_data

pupil_id

Name of pupil column in trial_data

should_plot

Whether or not fit plots should be generated as well.

t_max

Choice for parameter defined by Hoeks & Levelt (response maximum in ms)

Details

Note that different forms of cross-validation are possible depending on the experimental design and one's assumptions. It is possible to optimize t_max for each subject for each condition individually (then only data from one subject and one condition should be passed to the function) or across conditions (then data from all conditions should be passed to the function). Based on the findings by Denison et al. (2020), the latter is likely sufficient and more appropriate.


JoKra1/papss documentation built on June 15, 2022, 8:57 a.m.