knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(TEfits)
One context in which a time-evolving model would be fit is within cognitive psychology. For instance, within a test of working memory span, participants may complete a set of trials (e.g., 80) each with some set size (e.g., memory load of 6 items). The task is novel and difficult, and it is possible that they are learning within the task. This would manifest as an increase in accuracy with increasing trial number.
As a rough simulation of one participants' data, we will generate by-trial accuracies that exponentially increase from a starting point accuracy of .4 to an asymptotic accuracy of .9.
set.seed(111) dat <- data.frame( accuracy = rbinom(80,6, .9-.5*2^(-rnorm(80,1:80,3)/5) # exponential change from .4 to .9 with a rate [half-change time constant] of log2(5+1)=2.58 )/6, trial_number = 1:80 )
The typical analytical approach would be to find the average accuracy (r round(mean(dat$accuracy),3)
). One alternative would be to remove some early set of trials (e.g., 20) as so-called "practice" and average the remaining trials (r round(mean(dat$accuracy[21:80]),3)
).
This latter method implicitly acknowledges the possibility of nonstationarity in the measure of interest (accuracy), but it addresses the issue by arbitrarily reducing the amount of data being considered. Alternatively, if the asymptotic accuracy is desired, TEfits
can be used to fit a time-evolving model and find the asymptotic accuracy:
m_simple <- TEfit(dat[,c('accuracy','trial_number')])
m_simple$model$par['pAsym']
This is more accurate than either approach, while utilizing all data. A more general summary can be viewed:
summary(m_simple)
Formula
shows the equation that was fit in the model
Converged
indicates whether the model converged, as measured by similar parameter values being found by the optimization runs with the lowest errors. This is a heuristic, and should ideally be corroborated with parameter distributions from bootstrapped fits.
Fit Values
prints the fit parameter values. In 3-parameter exponential change, rate
is log~2~ of the amount of time taken for 50% of change to occur
Goodness-of-fit
prints several fit indices (e.g., the err
and BIC
of the best fit). Also printed are the corresponding fit indices for the null model (i.e., the model that is not time-evolving). The row name indicates the error function (e.g., ols
or logcosh
).
Test of change in nonindependence
prints the rank correlation between the null model residuals and the time variable, the rank correlation between the full model residuals and the time variable, the ratio between the absolute values of these correlations, and the p value as calculated with psych::r.test()
. If the tseries
package is available, the p-value from kpss.test
of stationarity will also be included.
A plot can also be viewed:
plot(m_simple)
By default TEfit
minimizes the sum of squared error (ols
) between model predictions and the response variable. This allows a fit of the conditional mean of the response variable, and it is directly comparable to "normal" methods of linear regression (e.g., lm
; see ?TElm
). However, analysts' knowledge about their data may allow for more specific choices. For example, when considering accuracy on a memory span task, this accuracy is bounded at 0 and 1 (i.e., 0% and 100%). A bernoulli
error function is therefore appropriate whereas an ols
error function is not (e.g., due to skewed residuals and the ability to evaluate the error at predictions above 1 or below 0). The previous model can be fit using a bernoulli
error function instead:
m_bernoulli <- TEfit(dat[,c('accuracy','trial_number')],errFun = 'bernoulli')
summary(m_bernoulli, printOutput = F)[c("param_vals",'GoF')]
Note the different scale of error values here, when using the negative log likelihood of the bernoulli
distribution, as compared to the ols
fit.
Parameter distributions can be described by resampling the data with replacement and fitting the model to each dataset (see ?tef_bootList
). Here we will use 50 resamples for the sake of speed; using fewer than 500 for final inferences is not recommended, and an order of magnitude higher may be necessary if categorical decisions (e.g., parameter "significance") are near decision threshold values.
m_bootstrap <- TEfit(dat[,c('accuracy','trial_number')], errFun = 'bernoulli', bootPars = tef_bootList(resamples = 50))
summary(m_bootstrap) plot(m_bootstrap)
Relative to the previous model additional information is provided regarding .025 and .975 quantiles of parameter estimates as well as the pseudo standard error (assuming that CI came from normal distribution; see ?summary.TEfit
). Three new outputs are also included which should be somewhat self-explanatory: Percent of resamples predicting an increase in values
and Timepoint at which resampled estimates diverge from timepoint 1, with Cohen's d>1
both aid in interpreting the strength of time-evolving trends. Bootstrapped parameter correlations
provides a glimpse at the dimensionality of the parameterization.
If we imagine that our working memory task involves 9 possible options with replacement (e.g., the numerals 1 through 9), we know that the guessing rate is 1/9. As with the bernoulli
distributional information that can be incorporated into a TEfits model, a priori bounds to predicted values can be incorporated. If accuracies are theoretically unable to fall below 1/9 or above 1 we can re-fit the previous model with this information.
m_bounded <- TEfit(dat[,c('accuracy','trial_number')], errFun = 'bernoulli', bootPars = tef_bootList(resamples = 50), control = tef_control(y_lim = c(1/9,1)) )
summary(m_bounded, printOutput = F)[c("param_vals",'GoF')] plot(m_bounded)
Given the current dataset this should not make much of a difference. However, a priori prediction bounds can often provide powerful theory-based constraints on parameters.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.