In dchudz/predcomps: Average Predictive Comparisons

APCs

Predictive Comparisons Generalize Regression Coefficients

At the heart of this package is the idea of a predictive comparison: We vary the input of interest holding the other inputs constant, and look at the differences in predicted values. Let $u$ represent the input of interest and $v$ the (vector of) other inputs. Let $f$ be a functinon making predictions, so

$$\hat{y} = f(u,v)$$

Our $f$ could come from any predictive model. If we have a statistical model, then probably we would choose

$$f(u,v) = \mathcal{E}[y \mid u, v, \theta]$$

(where $\theta$ are the parameters of the model). But we need not have a statistical model at all. The prediction function $f$ could come from a random forest, or a support vector machine.

Given the function $f$ and a choice of $u_1$, $u_2$, and $v$, we can compute

$$\delta_{u_1 \rightarrow u_2, v} = \frac{f(u_2, v) - f(u_1, v)}{u_2-u_1}$$

If $f$ were a linear model with no interactions, the above would not depend on the particular choices of $u_1$, $u_2$, and $v$ and would be the regression coefficient corresponding to $u$. This is the formal sense in which predictive comparisons generalize regression coefficients. Since for more complicated models this varies as the inputs vary, we will take an average across a well chosen set of inputs.

Choice of Inputs

The APC is

$$\frac{\mathcal{E}[\Delta_f]}{\mathcal{E}[\Delta_u]}$$

where $\Delta_f = f(u_2,v) - f(u_1,v)$, $\Delta_u = u_2 - u_1$, and $\mathcal{E}$ is expectation under the following process:

sample $v$ from the (marginal) distribution of the corresponding inputs
sample $u_1$ and $u_2$ independently from the distribution of $u$ conditional on $v$

The reason for this definition is that we want to use values of $v$ that are representative of our data generation process, and transitions in $u$ that are representative of what really occurs at those values of $v$.

Computing the numerator and denominator separately rather than taking an expected value of $\delta_{u_1 \rightarrow u_2, v}$ amounts to weighting by the size of $(u_2 - u_1)$. This avoids having the result excessively influenced by small changes in $u$.

Estimation

The rows of our data represent samples from the joint distribution $u$, $v$, so each row amounts to a sample from $v$ followed by a sample $u_1$ conditional on $v$. The difficult thing is drawing another sample $u_2$ conditional on $v$. We approximate these by assigning weights to rows based on the proximity of the $v$ in that row to the $v$ in the row in question. The weights are:

$$\frac{1}{\text{mahalanobisConstantTerm} + \text{(mahalanobis distance)}}$$

The mahalanobis distance is a unitless version of distance that takes into account the correlation structure of $v$. The mahalanobisConstantTerm (which defaults to 1, but this is not always an appropriate choice) prevents all of the weight from going to the closest points. More work needs to be done in thinking about the weights.

For more details, see Gelman and Pardoe 2007 (section 4), my note explaining a small change from the weights described in the paper, or my code that computes the appropriate weights.

Absolute Version

The absolute APC (as opposed to the signed version described above) replaces $\Delta_f = f(u_2,v) - f(u_1,v)$ above with $|\Delta_f| = |f(u_2,v) - f(u_1,v)|$. See e.g. the input $u_8$ in my simulated linear model with interactions for an example where the signed APC is roughly 0 but the absolute APC is large.

By default, I always compute and display an absolute version of the APC alongside the signed version.

A Small Example

This is a example running APCs on a simulated linear model with independent inputs and no interactions. For more involved examples, see the examples section.

The inputs ($x_1$, $x_2$, $x_3$) are independent, with

$$y \sim 2x_1 - 2x_2 + x_3 + \mathcal{N}(0,.1)$$

First we set up the data:

n <- 200
x1 <- runif(n=n, min=0, max=1)
x2 <- runif(n=n, min=0, max=1)
x3 <- runif(n=n, min=0, max=10)
y <- 2*x1 + (-2)*x2 + 1*x3 + rnorm(n, sd=.1)
df <- data.frame(x1, x2, x3, y)

Then we fit a linear model:

fittedLm <- lm(y ~ ., data=df)
fittedLm

We can then plot the average predictive comparisons:

library(predcomps)
apcDF <- GetPredCompsDF(fittedLm, df=df)
PlotPredCompsDF(apcDF, variant="Apc") + theme_gray(base_size = 18)

Using different shapes / colors, both the absolute and signed versions are plotted. For symmetry, the absolute version is plotted with both a positive and negative sign. Since this is a linear model with no interactions, the signed APCs match those from the fitted linear model.

This is what the returned data frame apcDF looks like: