library(ati)
library(PortfolioAnalytics)
library(matlab)
library(corrplot)
library(tidyverse)
library(RColorBrewer)
library(skimr)
library(learnr)
library(fontawesome)
tutorial_options(exercise.timelimit = 60)
tutorial_options(exercise.eval = TRUE)
knitr::opts_chunk$set(echo = FALSE,warning=FALSE, message=FALSE)

Introduction

In finance, empirical covariance matrices are often numerically ill-conditioned, as a result of small number of independent observations used to estimate a large number of parameters. Working with those matrices directly, without treatment, is not recommended.

Even if the covariance matrix is non singular What is a singular matrice?{width="50%" align="center"} , and therefore invertible, the small determinant all but guarantees that the estimations error will be greatly magnified by the inversion process.

The practical implication is that these estimation errors cause misallocation of assets and substantial transaction costs due to unnecessary rebalancing. Furthermore, denoising the matrix $\bf{XX^{'}}$ before inverting it should help reduce the variance of regression estimates, and improve the power of statistical tests of hypothesis. For the same reason, covariance matrices derived from regressed factors (also known as factor-based covariance matrices) also require denoising, and should not be used without numerical treatment.

Before we begin

{width="30%"}

Outline

In this workshop you can learn:

Tools you will use

I have preloaded the packages for this tutorial with

library(tidyverse) # loads dplyr, ggplot2, and others
library(PortfolioAnalytics)
library(matlab)
library(fontawesome)
library(corrplot)
library(RColorBrewer)
library(skimr)
library(ati)

Git integration

Ex 1: First time set-up

Ex 2: Create RStudio project using Git

Use this video for guidance on the above set-up{width="50%"}

```{bash, eval=FALSE} git config --global user.email ""

This is the email you used to register with GitHub.

git config --global user.name ""

## Simulating fake data
In quantitative finance we do not have a laboratory where we can securely experiment in an environment that is controlled.  Most financial research is carried out on *Real* or **Big World** data which is complex, misbehaves and are uncontrollable.  Experimentation in finance is achieved by simulating **Small World** data with know statistical properties which can be controlled.

Portfolio data from the **Big World** is usually insufficient to produce meaningful results, this insufficiency can be illustrated but create some **Small World** random data.  

### Ex 1: Fake portfolio data

>Creates a portfolio in independently and identically distributed *fake* stock returns. Click `Run Code` to see a fake portfolio created:

```r
stocks=20
trading_days=40
fake_port <- array(
  rnorm(trading_days*stocks,
        mean = 0.01,sd = 0.01),
  dim = c(trading_days,stocks)) %>% 
  as.tibble()
fake_port %>% skim()

Describe the data?

The data is a sample of individual and identically distributed stock returns for 20 stocks over 40 trading days. The sample is drawn from a random normal distribution with mean 0.01 and standard deviation 0.01. This is the assumed data generating process of daily stock returns that the analyst has postulated.

Ex 3: Test your knowledge

question("what do you expect the correlation matrix of these portfolio to look like if the are drawn to be independent and identically distributed ?",
  answer("I expect there to be no pairwise correlation as the data is random"),
  answer("I expect there to be some real pairwise correlation as the data is random"),
  answer("I expect there to be some spurious pairwise correlation as the data is random", correct = TRUE),
  answer("I expect there to boe some real pairwise correlation as the data is nonrandom")
  ,allow_retry = TRUE
)
**Hint:** use `?rnorm()` in the console to understand the output of this function

Code pipes %>%

Firstly, I will introduce the process of piping code in R. The point of the pipe is to help you write code in a way that is easier to read and understand. To see why the pipe is so useful, we’re going to explore a number of ways of writing the same code. The pipe operator in R is %>% from the magrittr pacakge. For more details see Hadley 2020 "R for Data Science) Chapter 18

leave_home(get_dressed(get_out_of_bed(wake_up(me,time="6:30"),side="left"),trousers=TRUE,shirt=TRUE),car=FALSE,bike=TRUE,pandemic=FALSE)
me %>%
  wake_up(time="6:30") %>%
  get_out_of_bed(side="left") %>%
  get_dressed(trousers=TRUE,shirt=TRUE) %>%
  leave_house(car=FALSE,bike=TRUE,pandemic=FALSE)

So the piping operator allows the code to be more readable and logic.

Your turn

Rearrange this code using piping

## Recode this using piping 
summarise(group_by(mutate(fake_port,Type="Fake"),by="Type"),meanV1=mean(V1))
## Recode this using piping 
fake_port %>%
  mutate(Type="Fake") 
## Recode this using piping 
fake_port %>%
  mutate(Type="Fake") %>%
  group_by(Type)
## Recode this using piping 
fake_port %>%
  mutate(Type="Fake") %>%
  group_by(Type) %>%
  summarise(meanV1=mean(V1))

Pairwise correlation of fake data

Given the fake portfolio was created by drawing independent and identically distributed random normal observations, by definition there should be no correlation between the fake stock returns.

Write some code to evaluate and visualise the correlation of the fake portfolio returns which can be access in the object fake_port


cor(fake_port)
cor(fake_port) %>%
  corrplot()
cor(fake_port) %>%
  corrplot(type="upper",
           method = "number",
           order="hclust",
           col=brewer.pal(n=8, name="RdYlBu"))

Building r fa("r-project") functions

Ex 1: simple function

R, at its heart, is a high level functional programming (FP) language. This means that it provides many tools for the creation and manipulation of functions.

Write a function to add two numbers together then test the function with numbers 1 and 2

add_numbers <- function(a, b) {

}
# Write a function to add two numbers together
add_numbers <- function(a, b) {
 a + b 
}
add_numbers(1,2)

Ex 2: Advanced function

Create a function in R for marcenko pastur distribution estimates

The Marcenko-Pastur distribution can be defined as:

$$\rho\left(\lambda \right) = \begin{cases} \frac{T}{N}\frac{\sqrt {\left( {{\lambda {+}} - \lambda} \right)\left( {\lambda - {\lambda {- }}} \right)}}{2\pi \lambda {\sigma ^2}}, & \text{if } \lambda \in [\lambda {+},\lambda {-}] \ 0, & \text{if } \lambda \notin [\lambda {+},\lambda {-}] \end{cases}$$

where the maximum expected eigenvalue is $\lambda_{+}=\sigma^2(1+\sqrt{N/T})^2$ and the minimum expected eigenvalue is $\lambda_{-}=\sigma^2(1-\sqrt{N/T})^2$

The following translates the above maths into R code.

mp_pdf<-function(var,t,m,pts) {
  q=t/m
  eMin<-var*(1-(1./q)^.5)^2 
  eMax<-var*(1+(1./q)^.5)^2 
  eVal<-linspace(eMin,eMax,pts)
  pd<-q/(2*pi*var*eVal)*((eMax-eVal)*(eVal-eMin))^.5
  pdf<-tibble(pd=pd,e=eVal) 
  return(pdf)  
}

Ex 3: Test mp_pdf

Test function to create the Marcenko Pastur distribution for the fake portfolio when the variance=1.

mp_pdf<-function(var,t,m,pts) {
  q=t/m
  eMin<-var*(1-(1./q)^.5)^2 
  eMax<-var*(1+(1./q)^.5)^2 
  eVal<-linspace(eMin,eMax,pts)
  pd<-q/(2*pi*var*eVal)*((eMax-eVal)*(eVal-eMin))^.5
  pdf<-tibble(pd=pd,e=eVal) 
  return(pdf)  
}
mp<-mp_pdf(1,trading_days,stocks,stocks)

Ex 4: plot distributoin

Research how the package ggplot2 works and then attempt to plot the distribution created earlier.

mp %>% 
  ggplot(aes(x=e,y=pd)) + 
  geom_line()


barryquinn1/ATI documentation built on May 10, 2021, 10:47 a.m.