knitr::opts_chunk$set(out.width="100%", fig.height = 4.5, split=FALSE, fig.align = 'default', comment = NA) options(dplyr.summarise.inform = FALSE) knitr::opts_chunk$set(echo=TRUE, error=FALSE, message=FALSE, warning=FALSE) options(scipen=999)
This Markdown is part of a package where company financial data from S&P 500 companies is explored using a Shiny App. The aim of this Markdown is to perform all core back-end analysis of the application without any reactivity. This document will therefore be the template then for developing the Shiny App. The goal of the whole package is to learn how to develop an Shiny App as a package using the golem framework.
The dataset which is used in this project includes 505 rows with 14 variables.
Goals:
First, relevant packages are loaded. We load a range of libraries for general data wrangling and general visualisation.
library(here) #used for folder navigation library(readr) #used to read data as tibble library(skimr) #used to get overview of data library(janitor) #used to clean variable names library(dplyr) #used for data wrangling library(ggplot2) #used for graphs library(gridExtra) #used for graphs (arrange in gridges)
The data was downloaded from kaggle and stored locally (login necessary). The here package is used to locate the files relative to the project root. 10 out of 505 observations have at least one missing value.
financials <- read_csv(here("data-raw","company_financials.csv")) cat(dim(financials[!complete.cases(financials),])[1], "out of", nrow(financials), "observations have at least one missing value.")
We have a first look at the dataset:
skim(financials)
Notes:
There is some valuable information in this quick overview. The missing datapoints are in the variables Price/Earnings and Price/Book.
The variables are already classified as numeric and character. From the character variables, Sector is the only one which has not all unique characters.
Nearly all of the company Financials are right skewed.
The variable names are not clean, since they include spaces and variables which start with a number We clean them directly using the janitor package.
It is possible to calculate a sales variable, using the market cap and price/sales information. We can also calculate a Net Profit variable using earnings per share, market cap and price per share.
Based on the notes from the Overview, we will clean the names and add a sales and a net profit variable to the data. We then again look at the final variable names:
#clean names and get sales variable names_original <- c(names(financials), "Sales", "Net Profit") names(financials) <- make_clean_names(names(financials)) financials <- financials %>% mutate(sales = market_cap/price_sales, net_profit = earnings_share * (market_cap/price)) names(financials)
First, let us have a look at the distribution of the numeric variables (Company Financials):
#Save indicator whether variable is numeric or not nums <- unlist(lapply(financials, is.numeric)) #save cleaned names and original names in vector (original names are used as title of x axis) nums_names <- names(financials)[nums] nums_original_names <- names_original[nums] #loop through numeric variable names to plot each numeric variable (distribution) p <- list() j = 1 for(i in c(nums_names)){ p[[j]] <- ggplot(financials, aes_string(x=i)) + geom_histogram(aes(y=..density..), colour="black", fill="white")+ geom_density(alpha=.2, fill="#009966")+ labs(x = nums_original_names[j], y = "") + theme_classic() + theme(axis.text.x = element_blank(), axis.text.y = element_blank()) j = j+1 } do.call(grid.arrange,p)
Notes:
Again, we see that many of the company financials are right skewed.
We will use cube root transformation in exploring relationships later.
Second, let us have a closer look at the character variables containing company Information. We will only look at sector, since this is the only character variable which not only has unique values for each datapoint.
ggplot(financials, aes(sector)) + geom_bar(fill = "#009966", alpha = 0.2, color = "black") + theme_classic() + labs(x = "Sector", y = "Count") + coord_flip()
Notes:
We have for all sectors at least 20 datapoints except for Telecommunication Services.
When exploring the relationship of sector with company financials, we will exclude Telecommunaction Services since the number of datapoints is too low.
In this sections, we will explore the relationships of our character variable sector with the company financials as well as the relationships between different company financials. The goal here is to give a quick overview of the relationships in the data, there will be no further statistics like effect sizes to further explore the size of effects. Since we are dealing with a lot of extreme values and skeweness, a Cube Root Transfomration will be used for some analysis. We will only concentrate on financials which do have a desired meaning for us. For example, price will not be explored, since this is only gives information about the price per share, which is not meaningful for our analysis without looking also at the number of shares issued.
Cube root transformation:
The cube root transformation involves converting x to x^(1/3). This is a fairly strong transformation with a substantial effect on distribution shape: but is weaker than the logarithm. It can be applied to negative and zero values too.
First, let us look at the top and botton 10 companies of several financials:
selected_financials <- c("price_earnings", "market_cap", "ebitda", "price_sales","price_book", "sales","net_profit") indicator <- nums_names %in% selected_financials selected_original_names <- nums_original_names[indicator] top <- list() bot <- list() j = 1 for(i in c(selected_financials)){ top[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% head(10) %>% ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) + geom_bar(stat = "identity", fill = "#009966", alpha = 0.2, color = "black") + labs(x = "", y = "") + ggtitle(paste("Top 10 -", selected_original_names[j])) + theme_classic() + coord_flip() j = j+1 } j = 1 for(i in c(selected_financials)){ bot[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% tail(10) %>% ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) + geom_bar(stat = "identity", fill = "indianred", alpha = 0.2, color = "black") + labs(x = "", y = "") + ggtitle(paste("Bottom 10 -", selected_original_names[j])) + theme_classic() + coord_flip() j = j+1 }
top[[1]]
bot[[1]]
top[[2]]
bot[[2]]
top[[3]]
bot[[3]]
top[[4]]
bot[[4]]
top[[5]]
bot[[5]]
top[[6]]
bot[[6]]
top[[7]]
bot[[7]]
#make function for cube_root formatting cube_root <- function(x) { sign(x) * abs(x)^(1/3) } #loop through numeric variable names to plot each numeric variable cube root transformatted with sector pt <- list() j = 1 for(i in c(nums_names)){ pt[[j]] <- financials %>% filter(sector != "Telecommunication Services") %>% ggplot(aes(x=sector, y = cube_root(!!sym(i)))) + geom_boxplot(outlier.shape = NA) + geom_jitter(width=0.1, alpha=0.2, color = "#009966") + labs(x = "", y = paste(nums_original_names[j], "- Cube Root transformatted")) + theme_classic() + coord_flip() j = j+1 }
pt[[2]]
financials %>% filter(sector != "Telecommunication Services") %>% ggplot(aes(x=sector, y = dividend_yield)) + geom_boxplot(outlier.shape = NA) + geom_jitter(width=0.1, alpha=0.2, color = "#009966") + labs(x = "", y = "Dividend Yield") + theme_classic() + coord_flip()
pt[[7]]
pt[[8]]
pt[[9]]
pt[[10]]
pt[[11]]
pt[[12]]
Similar to the graphs with sector, we will now for each relevant financial variable plot its relationship to the others. For all graphs in this section Cube root transformation is used. In general, we see a positiv relationship between the market cap, revenue (sales) and profit (EBITDA and net profit) variables, as expected.
indicator <- nums_names %in% selected_financials[selected_financials != "price_earnings"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "price_earnings"])){ ps[[j]] <- financials %>% ggplot(aes(x=price_earnings^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Price/Earnings vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "market_cap"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "market_cap"])){ ps[[j]] <- financials %>% ggplot(aes(x=market_cap^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Market Cap vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "ebitda"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "ebitda"])){ ps[[j]] <- financials %>% ggplot(aes(x=ebitda^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("EBITDA vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "price_sales"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "price_sales"])){ ps[[j]] <- financials %>% ggplot(aes(x=price_sales^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Price/Sales vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "price_book"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "price_book"])){ ps[[j]] <- financials %>% ggplot(aes(x=price_book^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Price/Book vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "sales"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "sales"])){ ps[[j]] <- financials %>% ggplot(aes(x=sales^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Sales vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
indicator <- nums_names %in% selected_financials[selected_financials != "net_profit"] selected_original_names <- nums_original_names[indicator] #loop through selected financial variable names to plot each numeric variable cube root transformated with sector ps <- list() j = 1 for(i in c(selected_financials[selected_financials != "net_profit"])){ ps[[j]] <- financials %>% ggplot(aes(x=net_profit^(1/3), y = cube_root(!!sym(i)))) + geom_point(col = "black", alpha = 0.2) + geom_smooth(method='lm', formula= y~x, color = "#009966") + labs(x = "", y = "") + ggtitle(paste("Net Profit vs", selected_original_names[j])) + theme_classic() j = j+1 } do.call(grid.arrange,ps)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.