In dmbates/RePsychLing: Data sets from Psychology and Linguistics experiments

library(RePsychLing)
library(knitr)
library(rstan)
library(parallel)
library(xtable)
opts_chunk$set(comment=NA)
options(width=92,
        show.signif.stars = FALSE)

Structure of the data

The data from Kronmüller and Barr (2007) is available as the kb07 data frame in the RePscyhLing package for R

str(kb07)

As is common with factorial designs the experimental factors and their interactions are given short names: a single character (S,P,C) for the main effects, two-character names for the two-factor interactions and a three-character name for the three-factor interaction.

The data are from 56 subjects who responded to 32 iterms. Specifically, subjects had to select one of several objects presented on a monitor with a cursor. The manipulations involved

auditory instructions that maintained or broke a precedent of reference for the objects established over prior trials, (P)
with the instruction being presented by the speaker who established the precedent (i.e., an old speaker) or a new speaker (S) a
whether the task had to be performed without or with a cognitive load consisting of six random digits. (C)

All factors were expressed using a ±1 encoding which ensures that the scale of the interactions is the same as the scale of the main effects and of the intercept term. The columns of the full-factorial model matrix, X

X <- unname(model.matrix(~ 1+S+P+C+SP+SC+PC+SPC, kb07))
attr(X,"assign") <- NULL
str(X)

have the same length and are nearly orthogonal.

crossprod(X)   # X'X

The response, RTtrunc, is the reaction time after truncating some unusually large values.

Formulating the Stan model

Stan is a system for creating Markov chain Monte Carlo (MCMC) samplers for statistical models.

In the model specification the data, including dimensions of arrays, are described first.

standat <- '
data {
  int<lower=0>  N; // num observations
  int<lower=1>  K; // length of fixed-effects vector
  int<lower=0>  M; // num subjects
  int<lower=1>  J; // length of subj vector-valued random effects
  int<lower=0>  L; // num items
  int<lower=1>  I; // length of item vector-values random effects
  int<lower=1,upper=M> subj[N]; // subject indicator
  int<lower=1,upper=L> item[N]; // item indicator
  row_vector[K] X[N]; // model matrix for fixed-effects parameters
  row_vector[J] Zs[N]; // generator model matrix for subj random effects
  row_vector[I] Zi[N]; // generator model matrix for item random effects 
  vector[N]     y; // response vector (reaction time)
}

'

The response vector, y, is expressed as a vector of N elements.
The integer vectors subj and item are the indicators of subject and of item for each response.

The fixed-effects model matrix, X, the subject-specific random effects model matrix, Zs, and the item-specific random effects model matrix, Zi, are stored as vectors of N row_vector's. The reason for storing the model matrices like this is to provide fast access to individual rows when formulating the model.

For generality the number of columns in these matrices is specified separately as K, J and I. For the maximal model these will all be eight. The numbers of subjects and items are M and L, respectively.

Model parameters

In a Bayesian formulation all the random variables are considered model parameters; there is no distinction between the random effects and the fixed-effects coefficients. The covariance matrices for the within-subject and within-item random effects are specified according to the Cholesky factor of the correlation matrix, a cholesky_factor_corr type and the standard deviations, which are called taus and taui, respectively.

The spherical random effects, often written as $u$, are called us and ui respectively.
These are matrices that are stored as vectors of vectors.

stanpars <- '
parameters {
  cholesky_factor_corr[J] Ls; // Cholesky factor of subj r.e. correlations
  cholesky_factor_corr[I] Li; // Cholesky factor of item r.e. correlations
  vector<lower=0>[J] taus; // standard deviations of unconditional subj r.e. dist
  vector<lower=0>[I] taui; // standard deviations of unconditional item r.e. dist
  vector[J] us[M];     // spherical subj random effects
  vector[I] ui[L];     // spherical item random effects
  vector[K] beta;      // fixed-effects
  real<lower=0> sigma; // standard deviation of response given random effects
}

'

The correlation matrices formed from the Cholesky factors are considered transformed parameters

stantrans <- '
transformed parameters {
  matrix[J,J] corrs;
  matrix[I,I] corri;
  corrs <- tcrossprod(Ls);  // for monitoring subj correlations
  corri <- tcrossprod(Li);  // for monitoring item correlations
}

'

Finally, the model is expressed in terms of square matrices Lambdas and Lambdai that are the Cholesky factors of the covariance matrices for the subject-specific and item-specific random effects, respectively.
The prior distributions for the parameters (taus, taui, Ls, Li, us, and ui) are also specified at this point. No prior distribution is provided for beta, implying that this vector has a "flat" or "locally uniform" improper prior distribution.
It could be given, say, a diffuse multivariate Gaussian distribution, but doing so has no effect on inferences.

stanmod <- '
model {
  matrix[J,J] Lambdas; 
  vector[J] bs[M];
  matrix[I,I] Lambdai; 
  vector[I] bi[L];
  taus ~ cauchy(0,2.5);
  taui ~ cauchy(0,2.5);
  Ls ~ lkj_corr_cholesky(2);
  Li ~ lkj_corr_cholesky(2);
  Lambdas <- diag_pre_multiply(taus,Ls);
  Lambdai <- diag_pre_multiply(taui,Li);
  for (m in 1:M) {
    us[m] ~ normal(0,1);
    bs[m] <- Lambdas * us[m];
  }
  for (l in 1:L) {
    ui[l] ~ normal(0,1);
    bi[l] <- Lambdai * ui[l];
  }
  for (n in 1:N)
    y[n] ~ normal(X[n] * beta + Zs[n] * bs[subj[n]] + Zi[n] * bi[item[n]], sigma);
}

'

In the last loop the linear predictor for y[n] is evaluated as the sum of three products of row_vectors and vectors. A row_vector multiplied by a vector is a scalar, which is the reason for storing X, Zs and Zi as vectors of row_vectors.

These pieces are concatenated to form the model

model <- paste(standat, stanpars, stantrans, stanmod)

Compiling the model

The model is compiled via a call to stan that includes the model and the data. The data must be specified as a list or as an environment.

For the maximal model, X, Zs and Zi are the model matrix for the full factorial

maxdat <- 
  within(list(), {
    N <- nrow(X)
    K <- J <- I <- ncol(X)
    M <- length(levels(kb07$subj))
    L <- length(levels(kb07$item))
    X <- Zs <- Zi <- unname(X)
    y <- kb07$RTtrunc
    subj <- as.integer(kb07$subj)
    item <- as.integer(kb07$item)
    }
    )
str(maxdat)

Initially we set the number of chains to zero to check that the model compiles properly

maxmodel <- stan(model_name="maxmodel", model_code=model, data=maxdat, chains=0)

Creating the chains.

We use mclapply from the parallel package to generate the chains in parallel.

system.time(KB07_stan <-
  sflist2stanfit(
    mclapply(1:4, mc.cores = 4,    # adjust number of cores to suit 
      function(i) stan(fit = maxmodel, 
                       data = maxdat,
                       iter=2000,
                       chains = 1, 
                       chain_id = i, 
                       refresh = -1))
    )
  )

We see that the elapsed time is considerably less than the user time. This is because four processes are running in parallel.

A close examination of the timing shows that much more time is spent in the "warmup" phase than in actually generating samples. Stan defaults to using a form of Hamiltonian Monte Carlo (HMC) sampling called a "No U-Turn Sampler" (NUTS) and it is tuning these samplers that is taking most of the time.

KB07_results<- summary(KB07_stan,
                       pars=c("beta", "sigma",
                              "taus","taui",
                              "corrs","corri"),
                       probs = c(0.025,  0.975), digits_summary = 3)
rownames(KB07_results$summary)

Note that all of the correlation matrix elements are monitored even though

?stan the diagonal elements are, by definition, unity and the matrix must be symmetric. An arduous extraction provides the table

print(xtable(KB07_results$summary[c(1:25,27:33,36:41,45:49,55:57,62:64,81,91:97,100:105),c(1,4,5)]), type="html")

Note that most of the correlations, especially those for the item-specific random effects, have a mean close to zero and upper and lower limits that are more-or-less symmetric. This is strong evidence that these could be zero.

Because of the way the priors are defined the taus and taui values cannot become zero. However, many of these values are very close to zero. The only standard deviations that are substantion are the by-subject intercept and the by-item intercept and coefficient for P.

Reduced model

The final, reduced model has a single column (the intercept) in Zs and two columns (intercept and main-effect for P) in Zi.

finaldat <- 
  within(list(), {
    N <- nrow(X)
    K <- ncol(X)
    J <- 1L
    I <- 2L
    M <- length(levels(kb07$subj))
    L <- length(levels(kb07$item))
    X <- X
    Zs <- X[, 1, drop=FALSE]
    Zi <- X[, 1:2]
    y <- kb07$RTtrunc
    subj <- as.integer(kb07$subj)
    item <- as.integer(kb07$item)
    }
    )
str(finaldat)

system.time(KB07_finalstan <-
  sflist2stanfit(
    mclapply(1:4, mc.cores = 4,    # adjust number of cores to suit 
      function(i) stan(fit = maxmodel, 
                       data = finaldat,
                       iter=2000,
                       chains = 1, 
                       chain_id = i, 
                       refresh = -1))
    )
  )