An Intuitive Interpretation of $\hat\beta$

In this vignette we will derive an intuitive interpretation for one of the most basic solutions coming from linear regression, that is, we will derive an intuitive interpretation for $\hat\beta$. We should begin by stating that $\hat\beta = (X^TX)^{-1}X^TY$. The question here is, what is $(X^TX)^{-1}X^TY$?

Assuming $X_i$ and $X_j$ are $n$-dimensional vectors with mean 0 we can compute the empirical Covariance between $X_i$ and $X_j$ using $\mathrm{\hat{Cov}}(X_i, X_j)=\big(\frac{1}{n}\big)X_i^TX_j$. Let's denote with $\hat\Sigma$ the empirical Covariance Matrix of $X = (X_1| X_2|\dots|X_p)$. It follows that

\begin{equation} \hat\Sigma

= \Big(\frac{1}{n}\Big) X^TX

= \Big(\frac{1}{n}\Big) \begin{pmatrix} X_1^TX_1 & X_1^TX_2 & \ldots &X_1^TX_p \ \vdots & \vdots & \ddots & \vdots \ X_p^TX_1 & X_p^TX_2 & \ldots &X_p^TX_p \end{pmatrix}

\end{equation}

Using this simple fact, it is possible to rewrite $\hat\beta$ using only statistical terms. \begin{equation} \hat\beta=(X^TX)^{-1}X^TY

= \frac{1}{n}\hat\Sigma^{-1} X^TY

= \frac{1}{n}\hat\Sigma^{-1} \begin{pmatrix} X_1^TY\ X_2^TY\ \vdots\ X_p^TY \end{pmatrix}

= \hat\Sigma^{-1} \begin{pmatrix} \mathrm{\hat{Cov}}(X_1, Y)\ \mathrm{\hat{Cov}}(X_2, Y)\ \vdots\ \mathrm{\hat{Cov}}(X_p, Y)\ \end{pmatrix}

= \hat\Sigma^{-1}\mathrm{\hat{Cov}}(X, Y)

\end{equation}

As we can see, each $\hat\beta_i$ is a linear combination of the empirical Covariances between the $X_i$'s and $Y$, weighted by elements of $\hat\Sigma^{-1}$.

The beauty of the relationship between $\hat\beta$ and $\hat\Sigma^{-1}$ builds upon the properties of $\hat\Sigma^{-1}$. While the Covariance or Dispersion Matrix is pretty acclaimed, its fraternal inverse, the Precision or Concetration Matrix, is not as well known.

Clearly, the diagonal elements of $\hat\Sigma$ measure how the $X_i$'s disperse around their mean and the off-diagonal elements reflect how the $X_i$'s co-vary linearly with each other. Similarly, $\hat\Sigma^{-1}$'s diagonal elements measure how the $X_i$'s concentrate around their mean. On the other hand, it is not exactly true that the off-diagonal elements of $\Sigma^{-1}$ reflect the extent to which the the $X_i's$ do not co-vary with each other.

Let's think of what would happen if we added or deleted $X_i$'s from our observation universe. If we take out an $X_i$ from our observation set, $\hat\Sigma$ is exactly the same as it was except that the $i$th column and row are dropped. In contrast, all the elements of $\hat\Sigma^{-1}$ change when an $X_i$ is added or deleted from the observation set. This is because $\hat\Sigma^{-1}$ behaves truly in a multivariate fashion rather than in a bivariate fashion, as does $\hat\Sigma$ itself.

Let's continue finding out how $\hat\beta$ relates with Precision Matrix. First we will assume that the $X_i$'s are not correlated, that is, that $\hat\Sigma$ is a diagonal matrix. In this particular case, it is straight forward that $\hat\Sigma^{-1}$ is also a diagonal matrix whose diagonal elements are the empirical precision of the $X_i$'s. Therefore, it is easy to see that when there is no correlation between the $X_i$'s, each $\hat\beta_i$ is exactly the Covariance between $X_i$ and $Y$ weighted by the precision of $X_i$.

We will arrive to very similar conclusions for the case where $X_i$'s do co-vary with each other. To do this, we will use the eigendecomposition of $\hat\Sigma$.

Provided $PDP^T$, the eigendecomposition of $\hat\Sigma$, we know that $P$'s colums are the eigenvectors $P_1, P_2,\ldots, P_p$ of $\hat\Sigma$ and that the diagonal of $D$ has the eigenvalues of $\hat\Sigma$. We also know that the columns of $XP$ are a set of transformed observations $XP_1, XP_2,\ldots, XP_p$ called principal components. Since \begin{equation} \mathrm{PDP^T}=\hat\Sigma=\frac{1}{n}X^TX \end{equation} then \begin{equation} D=\frac{1}{n}P^TX^TXP=\frac{1}{n}(XP)^TXP \end{equation} so that the empirical covariance matrix of the principal components is precisely $D$. In consequence, the principal components are uncorrelated and their variances are the eigenvalues of $\hat\Sigma$.

It follows that $\hat\Sigma^{-1}$ is given by $PD^{-1}P^T$, therefore

\begin{align} \hat\beta &= \hat\Sigma^{-1}\mathrm{\hat{Cov}}(X, Y) \ &= PD^{-1}P^T\mathrm{\hat{Cov}}(X, Y) \ &= PD^{-1}\mathrm{\hat{Cov}}(XP, Y) \ \end{align}

If we transform $\hat\beta$ using $P^T\hat\beta$ then \begin{equation} \mathrm{P}^T\beta = \mathrm{D^{-1}}\mathrm{\hat{Cov}}(XP, Y) \end{equation}

In consequence, it is very nice to see that each transformed $\hat\beta_i$, given by $P_i^T\hat\beta$, is exactly the Covariance between the $i$th principal component and Y weighted by the precision of the $i$th principal component.

Example

We will now take the theory shared and prove the results using code and data. In this example we will use financial time series.

First, let's load required packages. To do this we will use functions load.packages and strspl functions from metodosMultivariados2017 package.

library("metodosMultivariados2017")

load.packages(strspl("RCurl, 
                     lubridate, 
                     xts, 
                     dplyr, 
                     tidyr, 
                     plotly"))

Next we will fetch prices from yahoo.finance.com. To do this we will use function get_prices_yahoo from metodosMultivariados2017 package.

In this example we will get prices for ETFs that follow the S&P500 and Industry Sectors. Industry names are listed within the code below.

yahoo_id <- c(
  "SPY",
  "IYK", #Consumer Goods
  "IYC", #Consumer Services
  "IYE", #Energy
  "IYF", #Financials
  "IYJ", #Industrials
  "IYM", #Materials
  "IDU", #Utilities
  "IYH", #Health Care
  "IYW", #Technology
  "IYZ", #Telecomm
  "IYR" #Real State
)

price <- get_prices_yahoo(yahoo_id, "2010-01-01")

From prices, let's compute the returns of the ETFs and plot their daily changes.

We use functions get_returns and plot_returns from package metodosMultivariados2017.

returns <- get_returns(price)

plot_returns(returns["2016/", ])

If we acumulate the daily changes, we get the cumulative returns which are plotted below. If we had invested in the S&P500 at the beginning of 2016 we would have 20% of effective return.

cum_return <- get_returns(price["2016/", ], is_cumulative = TRUE)

plot_returns(cum_return[, ])

We will now get a linear model that tries to explain the S&P500 with only two sectors. The $\beta$ is printed

model = lm(SPY ~ -1 + IYK + IYC, data = returns)
model$coefficients

Let's do the same for all sectors. The $\beta$ is printed as well.

We can see how the $\beta$ coefficients for IYK and IYC changed after we included more variables to the model.

model = lm(SPY ~ -1 + IYK + IYC + IYE + IYF + IYJ + IYM + IDU + IYH + IYW + IYZ + IYR, data = returns)
model$coefficients

beta <- model$coefficients

It is time to see what happens if we use principal components.

First we have to get the loadings (eigenvectors) and eigenvalues.

X <- returns[, -1]
Y <- returns[, 1]

cov_X <- cov(X, use = "pairwise.complete.obs")
P <- eigen(cov_X)$vectors
colnames(P) <- paste("PC", 1:ncol(P), sep = "")
d <- eigen(cov_X)$values

When we plot the returns of the principal components it is interesting to see how the first principal component shows the most volatility. Evidently the last principal component shows the least volatility.

pp <- X %*% P
pp <- as.xts(pp, order.by = index(returns))

plot_returns(pp["2016/", ])

Take a look at the $\beta$ coefficients for a linear model that uses the first two principal components to explain the S&P500.

pp_returns <- data.frame(returns$SPY, pp)

model = lm(SPY ~ -1 + PC1 + PC2, data = pp_returns)
model$coefficients

Do the same using all the principal components so we can verify how the $\beta$ coefficients for PC1 and PC2 have the same values as the 'two principal component model'.

model = lm(SPY ~ -1 + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11, data = pp_returns)
model$coefficients

At this point think of the transformed $\beta$ and remember how each coefficient is the empirical covariance of each principal component with the S&P500 weigthed by the precision of each principal component.

To do this, let's compute the covariance of the principal components with the S&P500.

cov_ppY <- cov(pp, Y, use = "pairwise.complete.obs")
print(cov_ppY)

Weight those covariances by the precision of the principal components. See how we recover the $\beta$ coefficients that were computed with function lm.

cov_ppY/d

Finally, a nice result is to prove that the first principal component for the industy sectors is the market (i.e. is the S&P500). To do this, we shall compute the correlation of the first principal component with the market. See how it is almost 1?

cor(pp[, 1], Y, use = "pairwise.complete.obs")


mauriciogtec/metodosMultivariados2017 documentation built on May 21, 2019, 1:37 p.m.