In mdzeig/errorstat: Compute error statistics (e.g., MSE, bias)

knitr::opts_chunk$set(echo = TRUE)

Overview

The errorstat package provides convenient routines for computing mean-squared error (MSE), bias and other statistics. These other statstics can be provided by the user as two-argument functions, where the first argument is the true parameter values and the second is the estimated parameter values. It provides three functions -- errorstat, aggregate_errors and flatten -- which together provide a workflow for computing estimation error for individual simulations and aggregating across a study.

The `errorstat` function

The errorstat function computes the desired error statistics. Its first two arguments, truth and estimate, correspond to the true and estimated values. Its third argument, uppers allows the user to specify a partition via the upper bounds of each constituent intervals (except $\infty$, which is included by default). When truth and estimate are vectors, errorstat computes each error statistic for each partition element. For example, if uppers = -2:2, then errorstat will compute the desired error statistics for $(-\infty, -2]$, $(-2, -1]$, $(-1, 0]$, $(0, 1]$, $(1, 2]$ and $(2, \infty)$.

The arguments truth and estimate can also lists of vectors of the same size. The length of each of these lists must be the same as well as the lengths of the corresponding vector elements. Lists of depth greater than one will be evaluated recursively, resulting in a list whose structure is identical to that of truth and estimate and whose leaves are the result of applying errorstat to each corresponding pair of leaves in truth and estimate. This is useful for computing error statistics for multiple variables.

The following example illustrates how errorstat might be applied using the Rasch model. The Rasch model is a two-parameter model of test-taker performance. The first parameter $\theta$ is a length $P$ vector comprised of the abilities of the $P$ test takers, and the second parameter $\beta$ is a length $I$ vector comprised of the difficulties of the $I$ test items. It models the probability that test taker $p$ correctly responds to item $i$ as the inverse logit of $\theta_p - \beta_i$.

library(errorstat)
library(eRm) # sim.rasch, RM, person.parameter functions
ability <- rnorm(100)
difflty <- rnorm(20)
resp <- sim.rasch(ability, difflty)
rmfit <- RM(resp)
ppfit <- person.parameter(rmfit)
errorstat(list(ability = ability, difflty = difflty),
          list(ability = coef(ppfit), difflty = -coef(rmfit)),
          -2:2)

For a more complicated example, suppose we would like to compute error statistics separately for items 1-10 and 11-20. With errorstat, this can be accomplished by

errorstat(list(ability = ability, 
              difflty = list(first10 = difflty[1:10], 
                             last10 = difflty[11:20])),
          list(ability = coef(ppfit), 
               difflty = list(first10 = -coef(rmfit)[1:10],
                              last10 = -coef(rmfit)[11:20])),
          -2:2)

The `aggregate_errors` function

The aggregate_errors function is used to combine error statistics across multiple simulations. Each pair of error statistics is combined as follows. Let $T_1$ and $T_2$ be error statistics corresponding to two different simulations and let $n_1$ and $n_2$ be their corresponding sample sizes. Then, the combined statistic is

$$ T_{1 + 2} = \frac{n_1}{n_1 + n_2} T_1 + \frac{n_2}{n_1 + n_2} T_2. $$

For $m$ simulations $T_1, \ldots, T_m$, the corresponding statistic will be

$$ T_{1 + \ldots + m} = \frac{n_1}{n_1 + \ldots + n_m} T_1 + \ldots + \frac{n_1}{n_1 + \ldots + n_m} T_m. $$

When $T_1, \ldots, T_m$ can be interpretted as the sample mean of some quantity -- as is the case with MSE and bias -- then $T_{1 + \ldots + m}$ is equivalent to the value of the statistic that would have been computed for the whole sample.

As an example, we simulate and fit three different data sets and compute the combined MSE and bias in each of the regions $(-\infty, -2], \ldots, (2, \infty)$.

# simulate 3 datasets
truth <- replicate(3, simplify = FALSE, {
  ability <- rnorm(100)
  difflty <- rnorm(20)
  list(ability = ability, 
       difflty = difflty,
       resp = sim.rasch(ability, difflty))
})
# fit each data set
ests <- lapply(truth, 
               function(x) {
                 rmfit <- RM(x$resp)
                 ppfit <- person.parameter(rmfit)
                 list(ability = coef(ppfit),
                      difflty = -coef(rmfit))
               })
# compute & aggregate the statistics 
aggregate_errors(
  errorstat(lapply(truth, "[", -3), ests, -2:2)
)

The aggregate_errors function can also aggregate across more complicated list structures. For example, we could separate the MSE and bias of items 1-10 from 11-20 as follows.

split_difflty <- function(x) {
  list(ability = x$ability,
       difflty = list(first10 = x$difflty[1:10],
                      last10 = x$difflty[11:20]))
}
(split_error <- aggregate_errors(
  errorstat(lapply(truth, split_difflty),
            lapply(ests, split_difflty),
            -2:2)
))

The `flatten` function

The flatten function creates a data.frame from a list of errors. The resulting data.frame will contain the error statistics of each list element concatenated vertically and one or more columns of labels. The names of these columns can be provided by the user via the labels argument. The value associated with each element is determined either by the names attribute or list index.

Suppose for example that we would like to combine and plot MSE statistics in split_error. We can achieve this using ggplot2 as follows.

(flat_errors <- flatten(split_error$difflty, labels = "items"))
library(ggplot2)
ggplot(flat_errors, aes(ival, mse, fill = items)) + 
  geom_bar(stat = "identity", position = "dodge")