library(ati) library(PortfolioAnalytics) library(matlab) library(corrplot) library(tidyverse) library(RColorBrewer) library(skimr) library(learnr) library(fontawesome) tutorial_options(exercise.timelimit = 60) tutorial_options(exercise.eval = TRUE) knitr::opts_chunk$set(echo = FALSE,warning=FALSE, message=FALSE)
In finance, empirical covariance matrices are often numerically ill-conditioned, as a result of small number of independent observations used to estimate a large number of parameters. Working with those matrices directly, without treatment, is not recommended.
Even if the covariance matrix is non singular
{width="50%" align="center"}
, and therefore invertible, the small determinant all but guarantees that the estimations error will be greatly magnified by the inversion process.
The practical implication is that these estimation errors cause misallocation of assets and substantial transaction costs due to unnecessary rebalancing. Furthermore, denoising the matrix $\bf{XX^{'}}$ before inverting it should help reduce the variance of regression estimates, and improve the power of statistical tests of hypothesis. For the same reason, covariance matrices derived from regressed factors (also known as factor-based covariance matrices) also require denoising, and should not be used without numerical treatment.
If you don't want to use this instance use your local machine RStudio IDE but it is your responsibility to keep it up-to-date. For local set see this set-up workshop here
Engage your Yoda growth mindset
{width="30%"}
In this workshop you can learn:
I have preloaded the packages for this tutorial with
library(tidyverse) # loads dplyr, ggplot2, and others library(PortfolioAnalytics) library(matlab) library(fontawesome) library(corrplot) library(RColorBrewer) library(skimr) library(ati)
Register for account on GitHub (https://github.com/). We recommend using a username that incorporates your name (barryquinn1,ckelly66)
If you haven't already click on this invite https://classroom.github.com/a/GCR_J0yx to clone the repository for workshop 1.
{width="50%"}
Terminal
console, then repeat step 4```{bash, eval=FALSE}
git config --global user.email "
git config --global user.name "
## Simulating fake data In quantitative finance we do not have a laboratory where we can securely experiment in an environment that is controlled. Most financial research is carried out on *Real* or **Big World** data which is complex, misbehaves and are uncontrollable. Experimentation in finance is achieved by simulating **Small World** data with know statistical properties which can be controlled. Portfolio data from the **Big World** is usually insufficient to produce meaningful results, this insufficiency can be illustrated but create some **Small World** random data. ### Ex 1: Fake portfolio data >Creates a portfolio in independently and identically distributed *fake* stock returns. Click `Run Code` to see a fake portfolio created: ```r stocks=20 trading_days=40 fake_port <- array( rnorm(trading_days*stocks, mean = 0.01,sd = 0.01), dim = c(trading_days,stocks)) %>% as.tibble() fake_port %>% skim()
Describe the data?
The data is a sample of individual and identically distributed stock returns for 20 stocks over 40 trading days. The sample is drawn from a random normal distribution with mean 0.01 and standard deviation 0.01. This is the assumed data generating process of daily stock returns that the analyst has postulated.
question("what do you expect the correlation matrix of these portfolio to look like if the are drawn to be independent and identically distributed ?", answer("I expect there to be no pairwise correlation as the data is random"), answer("I expect there to be some real pairwise correlation as the data is random"), answer("I expect there to be some spurious pairwise correlation as the data is random", correct = TRUE), answer("I expect there to boe some real pairwise correlation as the data is nonrandom") ,allow_retry = TRUE )
%>%
Firstly, I will introduce the process of piping code in R. The point of the pipe is to help you write code in a way that is easier to read and understand. To see why the pipe is so useful, we’re going to explore a number of ways of writing the same code. The pipe operator in R is %>%
from the magrittr
pacakge. For more details see Hadley 2020 "R for Data Science) Chapter 18
leave_home(get_dressed(get_out_of_bed(wake_up(me,time="6:30"),side="left"),trousers=TRUE,shirt=TRUE),car=FALSE,bike=TRUE,pandemic=FALSE)
me %>% wake_up(time="6:30") %>% get_out_of_bed(side="left") %>% get_dressed(trousers=TRUE,shirt=TRUE) %>% leave_house(car=FALSE,bike=TRUE,pandemic=FALSE)
So the piping operator allows the code to be more readable and logic.
Rearrange this code using piping
## Recode this using piping summarise(group_by(mutate(fake_port,Type="Fake"),by="Type"),meanV1=mean(V1))
## Recode this using piping fake_port %>% mutate(Type="Fake")
## Recode this using piping fake_port %>% mutate(Type="Fake") %>% group_by(Type)
## Recode this using piping fake_port %>% mutate(Type="Fake") %>% group_by(Type) %>% summarise(meanV1=mean(V1))
Given the fake portfolio was created by drawing independent and identically distributed random normal observations, by definition there should be no correlation between the fake stock returns.
Write some code to evaluate and visualise the correlation of the fake portfolio returns which can be access in the object fake_port
cor(fake_port)
cor(fake_port) %>% corrplot()
cor(fake_port) %>% corrplot(type="upper", method = "number", order="hclust", col=brewer.pal(n=8, name="RdYlBu"))
r fa("r-project")
functionsR, at its heart, is a high level functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions.
Write a function to add two numbers together then test the function with numbers 1 and 2
add_numbers <- function(a, b) { }
# Write a function to add two numbers together add_numbers <- function(a, b) { a + b } add_numbers(1,2)
Create a function in R for marcenko pastur distribution estimates
The Marcenko-Pastur distribution can be defined as:
$$\rho\left(\lambda \right) = \begin{cases} \frac{T}{N}\frac{\sqrt {\left( {{\lambda {+}} - \lambda} \right)\left( {\lambda - {\lambda {- }}} \right)}}{2\pi \lambda {\sigma ^2}}, & \text{if } \lambda \in [\lambda {+},\lambda {-}] \ 0, & \text{if } \lambda \notin [\lambda {+},\lambda {-}] \end{cases}$$
where the maximum expected eigenvalue is $\lambda_{+}=\sigma^2(1+\sqrt{N/T})^2$ and the minimum expected eigenvalue is $\lambda_{-}=\sigma^2(1-\sqrt{N/T})^2$
The following translates the above maths into R
code.
mp_pdf<-function(var,t,m,pts) { q=t/m eMin<-var*(1-(1./q)^.5)^2 eMax<-var*(1+(1./q)^.5)^2 eVal<-linspace(eMin,eMax,pts) pd<-q/(2*pi*var*eVal)*((eMax-eVal)*(eVal-eMin))^.5 pdf<-tibble(pd=pd,e=eVal) return(pdf) }
Test function to create the Marcenko Pastur distribution for the fake portfolio when the variance=1.
mp_pdf<-function(var,t,m,pts) { q=t/m eMin<-var*(1-(1./q)^.5)^2 eMax<-var*(1+(1./q)^.5)^2 eVal<-linspace(eMin,eMax,pts) pd<-q/(2*pi*var*eVal)*((eMax-eVal)*(eVal-eMin))^.5 pdf<-tibble(pd=pd,e=eVal) return(pdf) }
mp<-mp_pdf(1,trading_days,stocks,stocks)
Research how the package
ggplot2
works and then attempt to plot the distribution created earlier.
mp %>% ggplot(aes(x=e,y=pd)) + geom_line()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.