\newcommand{\yH}{\hat{y}} \newcommand{\x}{\mathbf{X}} \newcommand{\xt}{\x^\top} \newcommand{\xtx}{\xt\x} \newcommand{\xtxI}{(\xtx)^{-1}} \newcommand{\b}{\beta} \newcommand{\bt}{\b^\top} \newcommand{\xb}{\mathbf{\x\b}} \newcommand{\ep}{\epsilon} \newcommand{\bH}{\hat{\b}} \newcommand{\bHt}{\bH^\top} \newcommand{\xbH}{\x\bH} \newcommand{\xtxBH}{(\xtx)\bH} \newcommand{\xty}{\xt y} \newcommand{\E}{\mathrm{E}} \newcommand{\s}{\sigma} \newcommand{\ss}{\s^2} \newcommand{\ssH}{\hat{\ss}} \newcommand{\xo}{\x_0} \newcommand{\xof}{\x_{0,f}} \newcommand{\xot}{\xo^\top} \newcommand{\v}{\mathrm{var}} \newcommand{\a}{\mathbf{A}} \newcommand{\xoma}{(\xo - \a)} \newcommand{\xomat}{\xoma^\top}

OLS estimator

Consider linear model

[ y = \xb + \ep ]

Let $\bH$ be the estimate of the population parameter $\b$, we want to minimize the sum of squared residuals $\sum \ep_i^2$, with

\begin{align} \ep &= y - \xbH \ \ep^\top\ep &= (y - \xbH) (y - \xbH) \ &=y^\top y - 2\bHt\xt y + \bHt \xtx \bH \end{align}

Minimizing above, i.e, taking derivative w.r.t $\bH$ and equating to $0$, we have

[ \xtxBH = \xty ]

Provided $\xtxI$ exists:

\begin{align} \xtxI \xtxBH &= \xtxI \xty\ \implies \bH &= \xtxI \xty \end{align}

From the result above

\begin{align} \bH &= \xtxI \xty\ &= \xtxI \xt (\xb + \ep) \ &= \b + \xtxI \xt \ep \end{align}

Variance-covariance matrix

\begin{align} \E[(\bH - \b)(\bH - \b)^\top] &= \E[(\xtxI \xt \ep) (\xtxI \xt \ep)^T] \ &= \E[\xtxI \xt \ep\ep^\top \x \xtxI] \ &= \xtxI \xt \E[\ep\ep^\top] \x \xtxI \ &= \ss \xtxI \end{align}

and

[ \ssH = \frac{\ep^\top\ep}{n - k} ]

Predicting new data points

Let $\xo$ be the design matrix containing values of the predictors, from which we want to make predictions, then

\begin{align} \hat{y} &= \xo\bH\ &= \xo\xtxI \xty \end{align}

and

\begin{align} \v{(\hat{y})} &= \v[\xo\xtxI \xty]\ &= \xo\xtxI \xt \v(y) [\xo\xtxI \xt]^\top \ &= \ss \xo\xtxI \xt \x \xtxI \xot\ &= \ss \xo \xtxI \xot \end{align}

Anchoring and effects

Let $\xo$ be a centered model matrix containing appropriately chosen values of focal predictors and non-focal predictors as the average of the corresponding columns in the model matrix, and let $\a$ be the anchor matrix, with same dimensions as $\xo$ and values in each columns repeated $n$ times. Also, let $\a_f$ be the anchor column corresponding to the focal predictor, the rest of the columns corresponds the non-focal predictors with same values as in $\xo$. Any appropriate value (within the range of focal values) can be chosen for $\a_f$ but by default center point is chosen. Then

\begin{align} \v{(\hat{y})} &= \ss \xoma \xtxI \xomat \end{align}

We can see that, for any values $\a_f=\xof$, $\v(\hat{y}) = \boldsymbol{0}$. If $\a_f = \overline{\xof}$, then the anchor is the center point at this observation will still hold.

We need to figure out how to quantify and compare all the variances resulting from choosing other anchors, i.e., $\a_f < \overline{\xof}$ and $\a_f > \overline{\xof}$.



mac-theobio/effects documentation built on July 6, 2023, 4:19 a.m.