knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(rotateS21)

Intro

Stock markets are made up of many different stocks. Trying to analyze all of the stocks at once can turn into a multi-dimensional nightmare. One possible solution is 'reducing' the dimensions of the problem with a PCA. With a PCA one can summarize the data using a series of orthogonal linear combinations that capture most of the variation in the data. We will recreate example 8.10 to go through an analysis of how a PCA can be used for stock market data collected for 5 different stocks over a span of 103 weeks.

stock <- read.table("../inst/df/T8-4.DAT")
names(stock) <- c("JP", "Citi", "Wells", "Royal", "Exxon")
head(stock)

We can use the eigenPCA function to perform most of the analysis.

(pca <- eigenPCA(stock))

Part a

The first thing that we want to do is perform the PCA itself. This is done by finding the covariance matrix and analyzing the eigenvalues and eigenvectors. This is stored as 'Covariance_Matrix' in the objects returned by eigen PCA.

pca$Covariance_Matrix

The sample principal components are stored as 'Principal_Component_Matrix' where each column is a principal component linear combination and each row is a variable that makes up that linear combination. Each i,j entry is the coefficient for variable i in principal component j.

pca$Principal_Component_Matrix

Part b

The proportion of the total variance explained by the first three principal components is .8988.

pca$Cummulative_Variance[3]

Part c

The simultaneous (1-$\alpha$)% bonferroni confidence intervals for the lambda values of the different principal components are constructed using the formula:

$$ \frac{\hat{\lambda}i}{1+z{\frac{\alpha}{2m}}\sqrt\frac{2}{n}} \leq \lambda_i \leq \frac{\hat{\lambda}i}{1-z{\frac{\alpha}{2m}}\sqrt\frac{2}{n}} $$ This formula is a modification of the Anderson and Girshick confidence interval where the bonferroni confidence interval is applied to the $z$ score to include $z_{\frac{\alpha}{2m}}$ for $m$ number of simultaneous intervals.

bonCI(stock, 3)

Part d

Given the analysis, the stock-rate-return data can be summarized in 3 dimensions. The first 3 principal components capture almost 90% of the variance in the data, so we end up with a good idea of the spread of the data. However, stocks are generally compared to each other, and performing PCA is not a good way to understand how stocks compare.



s-huebler/rotateS21 documentation built on Dec. 22, 2021, 8:21 p.m.