set.seed(89890)
opts_chunk$set(tidy=FALSE)
library("plyr")
library(predcomps)
print.data.frame <- function(...) base::print.data.frame(..., row.names=FALSE)

Renormalizing Weights

In computing the APC, we assign weights to pairs of observations based on the Mahalanobis distance between the corresponding $v$'s. This note uses a toy example to argue that we must renormalize the weights so that when we group by the first element of each pair and sum the weights, each group has the same sum-of-weights.

This suggested renormalization is not discussed in Gelman and Pardoe 2007.

Toy example with exact transitions

If $u$ is an input of interest and $v$ are the other inputs, recall that the APC is the expected value of a quantity formed upon sampling from $v$, sampling twice from $u$ conditional on $v$, and computing predictive comparisons using those $u$'s. See equation (2) of Gelman and Pardoe 2007.

If there were enough pairs of points with identical $v$, we could just use the sample distribution of $u$ given $v$. As noted in the paper, we may have few (if any) pairs of points with identical $v$. Even so, let's think through an example where we do have such identical pairs:

Suppose $v$ consists of only 1 input, which can take one of two values. For simplicity, assume $u$ has exactly two possible (equally likely) values at each $v$ (which are different depending on the value of $v$), so there is only one possible transition at each $v$. Here's an example:

exampleDF <- data.frame(
  v=c(3,3,7,7),  
  u=c(10,20,12,22) 
  )[rep(c(1,2,3,4),c(40,40,10,10)),]
# Count each u/v combination:
kable(ddply(exampleDF, c("v","u"), function(df) data.frame(CountOfRows = nrow(df))))

Say we have a model $\hat{y} = f(u,v)$. I'll choose $\hat{y} = f(u,v) = uv$ for a simple example. (How the model is estimated is completely orthogonal to the questions addressed here, and I'll just pretend it's known exactly.)

Equation (2) in the paper says the numerator in the APC should be:

$$(.4)(.5)(.5)(f(20,3) - f(10, 3)) + (0.1)(.5)(.5)(f(22,7) - f(12,7)) $$

The .5's are the $p(u|v)$'s (and will cancel out in this case). (Terms with transition size 0 aren't included.)

The denominator is:

$$(.4)(.5)(.5)((20 - 10) + (.1)(.5)(.5)((22 - 12)$$

The ratio simplifies to:

$$.8 \delta_u(10 \rightarrow 20, 3, f) + 0.2 \delta_u(12 \rightarrow 22, 7, f)$$

This is all overkill for our very simple example, where it's easy to see that the APC is just $(.8)(3) + (.2)(6)$. But I wanted to be very concrete.

I'll compute it:

f <- function(u, v) return(u*v)
ApcExact <- .8*(f(20,3) - f(10,3))/10 + .2*(f(22,7) - f(12,7))/10
ApcExact

Now without exact duplicates

Now imagine we don't have any exact duplicates of $v$. To get a corresponding example like that, I'll modify the first example by adding a really tiny bit of noise to $v$: $v_{new} = v + N(0,\epsilon)$.

exampleDF2 <- transform(exampleDF, v = v + rnorm(nrow(exampleDF), sd=.001))

Now we form pairs and compute weights as described in the paper. Here's a sample of the resulting data frame of pairs, just to get a sense of what it looks like:

pairsDF <- GetPairs(exampleDF2, u="u", v="v", renormalizeWeights=FALSE)
kable(pairsDF[sample(1:nrow(pairsDF), 12), ], row.names=FALSE)

Now pairs with nearby $v$'s (which would have been the same $v$'s previously) have high weights, where pairs from far-away $v$'s (which were different $v$'s in the previous example) have low weights. That's good.

But $v$ near 3 now has more weight in the data set for two reasons:

  1. we started with more $v$'s near 3, so there are more rows with $v$ near 3 as the first element of the pair; and
  2. each time $v$ is near $3$ in the first element of each pair, there are more nearby $v$'s to pair with, so we get higher Weights.

Reason (1) is good, but reason (2) is not so good.

In the data frame of pairs, the weights are all close to 0.14 or 1. Let's look at the joint distribution of $u$ and $v$ in just the pairs with weights close to 1:

pairsDF <- data.frame(vRounded = round(pairsDF$v), pairsDF)
pairsHighWeightsDF <- subset(pairsDF, Weight > 0.9)
ddply(pairsHighWeightsDF,
      c("vRounded","u"), 
      function(df) data.frame(CountOfRows = nrow(df),
                              ProportionOfRows = nrow(df)/nrow(pairsHighWeightsDF)))

We see that $v$'s near 7 makes up only about 5.7% of the pairs. (It would be exactly $(.2)(.2) = 4$%, except that when we form pairs to compute the APC we don't pair any row with itself.)

If we form the APC based on these pairs and these weights, we weight the $v$'s near 3 too much, so our APC is too low:

pairsDF$yHat1 <- f(pairsDF$u, pairsDF$v)
pairsDF$yHat2 <- f(pairsDF$u.B, pairsDF$v)
pairsDF$uDiff <- pairsDF$u.B - pairsDF$u
ApcApprox1 <- 
  with(pairsDF,
       sum(Weight * (yHat2 - yHat1) * sign(uDiff)) / sum(Weight * uDiff * sign(uDiff)))
ApcApprox1

I showed the full computation above, but we can also use the GetAPC function from this package:

GetSingleInputApcs(function(df) return(df$u * df$v), exampleDF2, u="u", v="v", renormalizeWeights=FALSE)$PerUnitInput.Signed

Instead, we can normalize Weights so that within each first element of the pair.

pairsDFWeightsNormalized <- ddply(pairsDF, "OriginalRowNumber", transform, Weight = Weight/sum(Weight))
ApcApprox2 <- 
  with(pairsDFWeightsNormalized, 
       sum(Weight * (yHat2 - yHat1) * sign(uDiff)) / sum(Weight * uDiff * sign(uDiff)))
ApcApprox2

These renormalized Weights are the ones returned from GetPairs by default, and used in GetAPC by default:

GetSingleInputApcs(function(df) return(df$u * df$v), exampleDF2, u="u", v="v", renormalizeWeights=TRUE)$PerUnitInput.Signed


dchudz/predcomps documentation built on May 15, 2019, 1:48 a.m.