knitr::opts_chunk$set(out.width="100%", fig.height = 4.5, split=FALSE, fig.align = 'default', comment = NA)
options(dplyr.summarise.inform = FALSE)
knitr::opts_chunk$set(echo=TRUE, error=FALSE, message=FALSE, warning=FALSE)
options(scipen=999)

Introduction

This Markdown is part of a package where company financial data from S&P 500 companies is explored using a Shiny App. The aim of this Markdown is to perform all core back-end analysis of the application without any reactivity. This document will therefore be the template then for developing the Shiny App. The goal of the whole package is to learn how to develop an Shiny App as a package using the golem framework.

The dataset which is used in this project includes 505 rows with 14 variables.

Goals:

Preperations {.tabset}

Load Packages

First, relevant packages are loaded. We load a range of libraries for general data wrangling and general visualisation.

library(here)         #used for folder navigation
library(readr)        #used to read data as tibble
library(skimr)        #used to get overview of data
library(janitor)      #used to clean variable names
library(dplyr)        #used for data wrangling
library(ggplot2)      #used for graphs
library(gridExtra)    #used for graphs (arrange in gridges)

Load Data

The data was downloaded from kaggle and stored locally (login necessary). The here package is used to locate the files relative to the project root. 10 out of 505 observations have at least one missing value.

financials <- read_csv(here("data-raw","company_financials.csv"))
cat(dim(financials[!complete.cases(financials),])[1], "out of", nrow(financials), "observations have at least one missing value.")

Data Overview and Preprocessing {.tabset}

Overview

We have a first look at the dataset:

skim(financials)

Notes:

Preprocessing

Based on the notes from the Overview, we will clean the names and add a sales and a net profit variable to the data. We then again look at the final variable names:

#clean names and get sales variable
names_original <- c(names(financials), "Sales", "Net Profit")
names(financials) <- make_clean_names(names(financials))
financials <- financials %>% mutate(sales = market_cap/price_sales, net_profit = earnings_share * (market_cap/price))
names(financials)

Univariate Data Exploration {.tabset}

Numeric Variables

First, let us have a look at the distribution of the numeric variables (Company Financials):

#Save indicator whether variable is numeric or not
nums <- unlist(lapply(financials, is.numeric))
#save cleaned names and original names in vector (original names are used as title of x axis)
nums_names <- names(financials)[nums]
nums_original_names <- names_original[nums]

#loop through numeric variable names to plot each numeric variable (distribution)
p <- list()
j = 1
for(i in c(nums_names)){
  p[[j]] <- ggplot(financials, aes_string(x=i)) + 
              geom_histogram(aes(y=..density..), colour="black", fill="white")+
              geom_density(alpha=.2, fill="#009966")+ 
              labs(x = nums_original_names[j], y = "") +
              theme_classic() + 
              theme(axis.text.x = element_blank(),
                    axis.text.y = element_blank())
  j = j+1
}
do.call(grid.arrange,p)

Notes:

Character Variable

Second, let us have a closer look at the character variables containing company Information. We will only look at sector, since this is the only character variable which not only has unique values for each datapoint.

ggplot(financials, aes(sector)) +
  geom_bar(fill = "#009966", alpha = 0.2, color = "black") +
  theme_classic() +
  labs(x = "Sector", y = "Count") +
  coord_flip()

Notes:

Multivariate Data Exploration

In this sections, we will explore the relationships of our character variable sector with the company financials as well as the relationships between different company financials. The goal here is to give a quick overview of the relationships in the data, there will be no further statistics like effect sizes to further explore the size of effects. Since we are dealing with a lot of extreme values and skeweness, a Cube Root Transfomration will be used for some analysis. We will only concentrate on financials which do have a desired meaning for us. For example, price will not be explored, since this is only gives information about the price per share, which is not meaningful for our analysis without looking also at the number of shares issued.

Cube root transformation:

The cube root transformation involves converting x to x^(1/3). This is a fairly strong transformation with a substantial effect on distribution shape: but is weaker than the logarithm. It can be applied to negative and zero values too.

Top/Bottom 10 Companies {.tabset}

First, let us look at the top and botton 10 companies of several financials:

selected_financials <- c("price_earnings", "market_cap", "ebitda", "price_sales","price_book", "sales","net_profit")
indicator <- nums_names %in%  selected_financials
selected_original_names <- nums_original_names[indicator]

top <- list()
bot <- list()
j = 1
for(i in c(selected_financials)){
  top[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% head(10) %>%
                  ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) +
                  geom_bar(stat = "identity", fill = "#009966", alpha = 0.2, color = "black") + 
                  labs(x = "", y = "") +
                  ggtitle(paste("Top 10 -", selected_original_names[j])) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

j = 1
for(i in c(selected_financials)){
  bot[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% tail(10) %>%
                  ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) +
                  geom_bar(stat = "identity", fill = "indianred", alpha = 0.2, color = "black") + 
                  labs(x = "", y = "") +
                  ggtitle(paste("Bottom 10 -", selected_original_names[j])) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

Price/Earnings

top[[1]]
bot[[1]]

Market Cap

top[[2]]
bot[[2]]

EBITDA

top[[3]]
bot[[3]]

Price/Sales

top[[4]]
bot[[4]]

Price/Book

top[[5]]
bot[[5]]

Sales

top[[6]]
bot[[6]]

Net Profit

top[[7]]
bot[[7]]

Relationship Sector/Financials {.tabset}

#make function for cube_root formatting
cube_root <- function(x) {
  sign(x) * abs(x)^(1/3)
}

#loop through numeric variable names to plot each numeric variable cube root transformatted with sector
pt <- list()
j = 1
for(i in c(nums_names)){
  pt[[j]] <- financials %>% filter(sector != "Telecommunication Services") %>% 
                  ggplot(aes(x=sector, y = cube_root(!!sym(i)))) +
                  geom_boxplot(outlier.shape = NA) + 
                  geom_jitter(width=0.1, alpha=0.2, color = "#009966") +
                  labs(x = "", y = paste(nums_original_names[j], "- Cube Root transformatted")) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

Price/Earnings

pt[[2]]

Dividend Yield

financials %>% filter(sector != "Telecommunication Services") %>% 
                  ggplot(aes(x=sector, y = dividend_yield)) +
                  geom_boxplot(outlier.shape = NA) + 
                  geom_jitter(width=0.1, alpha=0.2, color = "#009966") +
                  labs(x = "", y = "Dividend Yield") +
                  theme_classic() +
                  coord_flip()

Market Cap

pt[[7]]

EBITDA

pt[[8]]

Price/Sales

pt[[9]]

Price/Book

pt[[10]]

Sales

pt[[11]]

Net Profit

pt[[12]]

Relationship Financials {.tabset}

Similar to the graphs with sector, we will now for each relevant financial variable plot its relationship to the others. For all graphs in this section Cube root transformation is used. In general, we see a positiv relationship between the market cap, revenue (sales) and profit (EBITDA and net profit) variables, as expected.

Price/Earnings

indicator <- nums_names %in%  selected_financials[selected_financials != "price_earnings"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_earnings"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_earnings^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Earnings vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Market Cap

indicator <- nums_names %in%  selected_financials[selected_financials != "market_cap"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "market_cap"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=market_cap^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Market Cap vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

EBITDA

indicator <- nums_names %in%  selected_financials[selected_financials != "ebitda"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "ebitda"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=ebitda^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("EBITDA vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Price/Sales

indicator <- nums_names %in%  selected_financials[selected_financials != "price_sales"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_sales"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_sales^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Sales vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Price/Book

indicator <- nums_names %in%  selected_financials[selected_financials != "price_book"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_book"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_book^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Book vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Sales

indicator <- nums_names %in%  selected_financials[selected_financials != "sales"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "sales"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=sales^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Sales vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Net Profit

indicator <- nums_names %in%  selected_financials[selected_financials != "net_profit"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names to plot each numeric variable cube root transformated with sector
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "net_profit"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=net_profit^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Net Profit vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)


Stefan1896/Company_Financials documentation built on March 19, 2023, 1:05 p.m.