In this vignette we will derive an intuitive interpretation for one of the most basic solutions coming from linear regression, that is, we will derive an intuitive interpretation for $\hat\beta$. We should begin by stating that $\hat\beta = (X^TX)^{-1}X^TY$. The question here is, what is $(X^TX)^{-1}X^TY$?
Assuming $X_i$ and $X_j$ are $n$-dimensional vectors with mean 0 we can compute the empirical Covariance between $X_i$ and $X_j$ using $\mathrm{\hat{Cov}}(X_i, X_j)=\big(\frac{1}{n}\big)X_i^TX_j$. Let's denote with $\hat\Sigma$ the empirical Covariance Matrix of $X = (X_1| X_2|\dots|X_p)$. It follows that
\begin{equation} \hat\Sigma
= \Big(\frac{1}{n}\Big) X^TX
= \Big(\frac{1}{n}\Big) \begin{pmatrix} X_1^TX_1 & X_1^TX_2 & \ldots &X_1^TX_p \ \vdots & \vdots & \ddots & \vdots \ X_p^TX_1 & X_p^TX_2 & \ldots &X_p^TX_p \end{pmatrix}
\end{equation}
Using this simple fact, it is possible to rewrite $\hat\beta$ using only statistical terms. \begin{equation} \hat\beta=(X^TX)^{-1}X^TY
= \frac{1}{n}\hat\Sigma^{-1} X^TY
= \frac{1}{n}\hat\Sigma^{-1} \begin{pmatrix} X_1^TY\ X_2^TY\ \vdots\ X_p^TY \end{pmatrix}
= \hat\Sigma^{-1} \begin{pmatrix} \mathrm{\hat{Cov}}(X_1, Y)\ \mathrm{\hat{Cov}}(X_2, Y)\ \vdots\ \mathrm{\hat{Cov}}(X_p, Y)\ \end{pmatrix}
= \hat\Sigma^{-1}\mathrm{\hat{Cov}}(X, Y)
\end{equation}
As we can see, each $\hat\beta_i$ is a linear combination of the empirical Covariances between the $X_i$'s and $Y$, weighted by elements of $\hat\Sigma^{-1}$.
The beauty of the relationship between $\hat\beta$ and $\hat\Sigma^{-1}$ builds upon the properties of $\hat\Sigma^{-1}$. While the Covariance or Dispersion Matrix is pretty acclaimed, its fraternal inverse, the Precision or Concetration Matrix, is not as well known.
Clearly, the diagonal elements of $\hat\Sigma$ measure how the $X_i$'s disperse around their mean and the off-diagonal elements reflect how the $X_i$'s co-vary linearly with each other. Similarly, $\hat\Sigma^{-1}$'s diagonal elements measure how the $X_i$'s concentrate around their mean. On the other hand, it is not exactly true that the off-diagonal elements of $\Sigma^{-1}$ reflect the extent to which the the $X_i's$ do not co-vary with each other.
Let's think of what would happen if we added or deleted $X_i$'s from our observation universe. If we take out an $X_i$ from our observation set, $\hat\Sigma$ is exactly the same as it was except that the $i$th column and row are dropped. In contrast, all the elements of $\hat\Sigma^{-1}$ change when an $X_i$ is added or deleted from the observation set. This is because $\hat\Sigma^{-1}$ behaves truly in a multivariate fashion rather than in a bivariate fashion, as does $\hat\Sigma$ itself.
Let's continue finding out how $\hat\beta$ relates with Precision Matrix. First we will assume that the $X_i$'s are not correlated, that is, that $\hat\Sigma$ is a diagonal matrix. In this particular case, it is straight forward that $\hat\Sigma^{-1}$ is also a diagonal matrix whose diagonal elements are the empirical precision of the $X_i$'s. Therefore, it is easy to see that when there is no correlation between the $X_i$'s, each $\hat\beta_i$ is exactly the Covariance between $X_i$ and $Y$ weighted by the precision of $X_i$.
We will arrive to very similar conclusions for the case where $X_i$'s do co-vary with each other. To do this, we will use the eigendecomposition of $\hat\Sigma$.
Provided $PDP^T$, the eigendecomposition of $\hat\Sigma$, we know that $P$'s colums are the eigenvectors $P_1, P_2,\ldots, P_p$ of $\hat\Sigma$ and that the diagonal of $D$ has the eigenvalues of $\hat\Sigma$. We also know that the columns of $XP$ are a set of transformed observations $XP_1, XP_2,\ldots, XP_p$ called principal components. Since \begin{equation} \mathrm{PDP^T}=\hat\Sigma=\frac{1}{n}X^TX \end{equation} then \begin{equation} D=\frac{1}{n}P^TX^TXP=\frac{1}{n}(XP)^TXP \end{equation} so that the empirical covariance matrix of the principal components is precisely $D$. In consequence, the principal components are uncorrelated and their variances are the eigenvalues of $\hat\Sigma$.
It follows that $\hat\Sigma^{-1}$ is given by $PD^{-1}P^T$, therefore
\begin{align} \hat\beta &= \hat\Sigma^{-1}\mathrm{\hat{Cov}}(X, Y) \ &= PD^{-1}P^T\mathrm{\hat{Cov}}(X, Y) \ &= PD^{-1}\mathrm{\hat{Cov}}(XP, Y) \ \end{align}
If we transform $\hat\beta$ using $P^T\hat\beta$ then \begin{equation} \mathrm{P}^T\beta = \mathrm{D^{-1}}\mathrm{\hat{Cov}}(XP, Y) \end{equation}
In consequence, it is very nice to see that each transformed $\hat\beta_i$, given by $P_i^T\hat\beta$, is exactly the Covariance between the $i$th principal component and Y weighted by the precision of the $i$th principal component.
We will now take the theory shared and prove the results using code and data. In this example we will use financial time series.
First, let's load required packages. To do this we will use functions load.packages
and strspl
functions from metodosMultivariados2017
package.
library("metodosMultivariados2017") load.packages(strspl("RCurl, lubridate, xts, dplyr, tidyr, plotly"))
Next we will fetch prices from yahoo.finance.com
. To do this we will use function get_prices_yahoo
from metodosMultivariados2017
package.
In this example we will get prices for ETFs that follow the S&P500 and Industry Sectors. Industry names are listed within the code below.
yahoo_id <- c( "SPY", "IYK", #Consumer Goods "IYC", #Consumer Services "IYE", #Energy "IYF", #Financials "IYJ", #Industrials "IYM", #Materials "IDU", #Utilities "IYH", #Health Care "IYW", #Technology "IYZ", #Telecomm "IYR" #Real State ) price <- get_prices_yahoo(yahoo_id, "2010-01-01")
From prices, let's compute the returns of the ETFs and plot their daily changes.
We use functions get_returns
and plot_returns
from package metodosMultivariados2017
.
returns <- get_returns(price) plot_returns(returns["2016/", ])
If we acumulate the daily changes, we get the cumulative returns which are plotted below. If we had invested in the S&P500 at the beginning of 2016 we would have 20% of effective return.
cum_return <- get_returns(price["2016/", ], is_cumulative = TRUE) plot_returns(cum_return[, ])
We will now get a linear model that tries to explain the S&P500 with only two sectors. The $\beta$ is printed
model = lm(SPY ~ -1 + IYK + IYC, data = returns) model$coefficients
Let's do the same for all sectors. The $\beta$ is printed as well.
We can see how the $\beta$ coefficients for IYK
and IYC
changed after we included more variables to the model.
model = lm(SPY ~ -1 + IYK + IYC + IYE + IYF + IYJ + IYM + IDU + IYH + IYW + IYZ + IYR, data = returns) model$coefficients beta <- model$coefficients
It is time to see what happens if we use principal components.
First we have to get the loadings (eigenvectors) and eigenvalues.
X <- returns[, -1] Y <- returns[, 1] cov_X <- cov(X, use = "pairwise.complete.obs") P <- eigen(cov_X)$vectors colnames(P) <- paste("PC", 1:ncol(P), sep = "") d <- eigen(cov_X)$values
When we plot the returns of the principal components it is interesting to see how the first principal component shows the most volatility. Evidently the last principal component shows the least volatility.
pp <- X %*% P pp <- as.xts(pp, order.by = index(returns)) plot_returns(pp["2016/", ])
Take a look at the $\beta$ coefficients for a linear model that uses the first two principal components to explain the S&P500.
pp_returns <- data.frame(returns$SPY, pp) model = lm(SPY ~ -1 + PC1 + PC2, data = pp_returns) model$coefficients
Do the same using all the principal components so we can verify how the $\beta$ coefficients for PC1
and PC2
have the same values as the 'two principal component model'.
model = lm(SPY ~ -1 + PC1 + PC2 + PC3 + PC4 + PC5 + PC6 + PC7 + PC8 + PC9 + PC10 + PC11, data = pp_returns) model$coefficients
At this point think of the transformed $\beta$ and remember how each coefficient is the empirical covariance of each principal component with the S&P500 weigthed by the precision of each principal component.
To do this, let's compute the covariance of the principal components with the S&P500.
cov_ppY <- cov(pp, Y, use = "pairwise.complete.obs") print(cov_ppY)
Weight those covariances by the precision of the principal components. See how we recover the $\beta$ coefficients that were computed with function lm
.
cov_ppY/d
Finally, a nice result is to prove that the first principal component for the industy sectors is the market (i.e. is the S&P500). To do this, we shall compute the correlation of the first principal component with the market. See how it is almost 1?
cor(pp[, 1], Y, use = "pairwise.complete.obs")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.