In Stefan1896/Company_Financials: Company Financials of S&P 500 companies

knitr::opts_chunk$set(out.width="100%", fig.height = 4.5, split=FALSE, fig.align = 'default', comment = NA)
options(dplyr.summarise.inform = FALSE)
knitr::opts_chunk$set(echo=TRUE, error=FALSE, message=FALSE, warning=FALSE)
options(scipen=999)

Introduction

This Markdown is part of a package where company financial data from S&P 500 companies is explored using a Shiny App. The aim of this Markdown is to perform all core back-end analysis of the application without any reactivity. This document will therefore be the template then for developing the Shiny App. The goal of the whole package is to learn how to develop an Shiny App as a package using the golem framework.

The dataset which is used in this project includes 505 rows with 14 variables.

Goals:

Exploratory Overview of S&P 500 companies using Markdown and Shiny
Learning the Golem Framework for developing Shiny Apps as a package

Preperations {.tabset}

Load Packages

First, relevant packages are loaded. We load a range of libraries for general data wrangling and general visualisation.

library(here)         #used for folder navigation
library(readr)        #used to read data as tibble
library(skimr)        #used to get overview of data
library(janitor)      #used to clean variable names
library(dplyr)        #used for data wrangling
library(ggplot2)      #used for graphs
library(gridExtra)    #used for graphs (arrange in gridges)

Load Data

The data was downloaded from kaggle and stored locally (login necessary). The here package is used to locate the files relative to the project root. 10 out of 505 observations have at least one missing value.

financials <- read_csv(here("data-raw","company_financials.csv"))
cat(dim(financials[!complete.cases(financials),])[1], "out of", nrow(financials), "observations have at least one missing value.")

Data Overview and Preprocessing {.tabset}

Overview

We have a first look at the dataset:

skim(financials)

Notes:

There is some valuable information in this quick overview. The missing datapoints are in the variables Price/Earnings and Price/Book.
The variables are already classified as numeric and character. From the character variables, Sector is the only one which has not all unique characters.
Nearly all of the company Financials are right skewed.
The variable names are not clean, since they include spaces and variables which start with a number We clean them directly using the janitor package.
It is possible to calculate a sales variable, using the market cap and price/sales information. We can also calculate a Net Profit variable using earnings per share, market cap and price per share.

Preprocessing

Based on the notes from the Overview, we will clean the names and add a sales and a net profit variable to the data. We then again look at the final variable names:

#clean names and get sales variable
names_original <- c(names(financials), "Sales", "Net Profit")
names(financials) <- make_clean_names(names(financials))
financials <- financials %>% mutate(sales = market_cap/price_sales, net_profit = earnings_share * (market_cap/price))
names(financials)

Univariate Data Exploration {.tabset}

Numeric Variables

First, let us have a look at the distribution of the numeric variables (Company Financials):

#Save indicator whether variable is numeric or not
nums <- unlist(lapply(financials, is.numeric))
#save cleaned names and original names in vector (original names are used as title of x axis)
nums_names <- names(financials)[nums]
nums_original_names <- names_original[nums]

#loop through numeric variable names to plot each numeric variable (distribution)
p <- list()
j = 1
for(i in c(nums_names)){
  p[[j]] <- ggplot(financials, aes_string(x=i)) + 
              geom_histogram(aes(y=..density..), colour="black", fill="white")+
              geom_density(alpha=.2, fill="#009966")+ 
              labs(x = nums_original_names[j], y = "") +
              theme_classic() + 
              theme(axis.text.x = element_blank(),
                    axis.text.y = element_blank())
  j = j+1
}
do.call(grid.arrange,p)

Notes:

Again, we see that many of the company financials are right skewed.
We will use cube root transformation in exploring relationships later.

Character Variable

Second, let us have a closer look at the character variables containing company Information. We will only look at sector, since this is the only character variable which not only has unique values for each datapoint.

ggplot(financials, aes(sector)) +
  geom_bar(fill = "#009966", alpha = 0.2, color = "black") +
  theme_classic() +
  labs(x = "Sector", y = "Count") +
  coord_flip()

Notes:

We have for all sectors at least 20 datapoints except for Telecommunication Services.
When exploring the relationship of sector with company financials, we will exclude Telecommunaction Services since the number of datapoints is too low.

Multivariate Data Exploration

In this sections, we will explore the relationships of our character variable sector with the company financials as well as the relationships between different company financials. The goal here is to give a quick overview of the relationships in the data, there will be no further statistics like effect sizes to further explore the size of effects. Since we are dealing with a lot of extreme values and skeweness, a Cube Root Transfomration will be used for some analysis. We will only concentrate on financials which do have a desired meaning for us. For example, price will not be explored, since this is only gives information about the price per share, which is not meaningful for our analysis without looking also at the number of shares issued.

Cube root transformation:

The cube root transformation involves converting x to x^(1/3). This is a fairly strong transformation with a substantial effect on distribution shape: but is weaker than the logarithm. It can be applied to negative and zero values too.

Top/Bottom 10 Companies {.tabset}

First, let us look at the top and botton 10 companies of several financials:

selected_financials <- c("price_earnings", "market_cap", "ebitda", "price_sales","price_book", "sales","net_profit")
indicator <- nums_names %in%  selected_financials
selected_original_names <- nums_original_names[indicator]

top <- list()
bot <- list()
j = 1
for(i in c(selected_financials)){
  top[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% head(10) %>%
                  ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) +
                  geom_bar(stat = "identity", fill = "#009966", alpha = 0.2, color = "black") + 
                  labs(x = "", y = "") +
                  ggtitle(paste("Top 10 -", selected_original_names[j])) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

j = 1
for(i in c(selected_financials)){
  bot[[j]] <- financials %>% arrange(desc(!!sym(i))) %>% tail(10) %>%
                  ggplot(aes(x=reorder(name, !!sym(i)), y = !!sym(i))) +
                  geom_bar(stat = "identity", fill = "indianred", alpha = 0.2, color = "black") + 
                  labs(x = "", y = "") +
                  ggtitle(paste("Bottom 10 -", selected_original_names[j])) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

Price/Earnings

top[[1]]

bot[[1]]

Market Cap

top[[2]]

bot[[2]]

EBITDA

top[[3]]

bot[[3]]

Price/Sales

top[[4]]

bot[[4]]

Price/Book

top[[5]]

bot[[5]]

Sales

top[[6]]

bot[[6]]

Net Profit

top[[7]]

bot[[7]]

Relationship Sector/Financials {.tabset}

#make function for cube_root formatting
cube_root <- function(x) {
  sign(x) * abs(x)^(1/3)
}

#loop through numeric variable names to plot each numeric variable cube root transformatted with sector
pt <- list()
j = 1
for(i in c(nums_names)){
  pt[[j]] <- financials %>% filter(sector != "Telecommunication Services") %>% 
                  ggplot(aes(x=sector, y = cube_root(!!sym(i)))) +
                  geom_boxplot(outlier.shape = NA) + 
                  geom_jitter(width=0.1, alpha=0.2, color = "#009966") +
                  labs(x = "", y = paste(nums_original_names[j], "- Cube Root transformatted")) +
                  theme_classic() +
                  coord_flip()
  j = j+1
}

Price/Earnings

pt[[2]]

Dividend Yield

financials %>% filter(sector != "Telecommunication Services") %>% 
                  ggplot(aes(x=sector, y = dividend_yield)) +
                  geom_boxplot(outlier.shape = NA) + 
                  geom_jitter(width=0.1, alpha=0.2, color = "#009966") +
                  labs(x = "", y = "Dividend Yield") +
                  theme_classic() +
                  coord_flip()

Market Cap

pt[[7]]

EBITDA

pt[[8]]

Price/Sales

pt[[9]]

Price/Book

pt[[10]]

Sales

pt[[11]]

Net Profit

pt[[12]]

Relationship Financials {.tabset}

Similar to the graphs with sector, we will now for each relevant financial variable plot its relationship to the others. For all graphs in this section Cube root transformation is used. In general, we see a positiv relationship between the market cap, revenue (sales) and profit (EBITDA and net profit) variables, as expected.

Price/Earnings

indicator <- nums_names %in%  selected_financials[selected_financials != "price_earnings"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_earnings"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_earnings^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Earnings vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Market Cap

indicator <- nums_names %in%  selected_financials[selected_financials != "market_cap"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "market_cap"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=market_cap^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Market Cap vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

EBITDA

indicator <- nums_names %in%  selected_financials[selected_financials != "ebitda"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "ebitda"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=ebitda^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("EBITDA vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Price/Sales

indicator <- nums_names %in%  selected_financials[selected_financials != "price_sales"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_sales"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_sales^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Sales vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Price/Book

indicator <- nums_names %in%  selected_financials[selected_financials != "price_book"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "price_book"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=price_book^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Price/Book vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Sales

indicator <- nums_names %in%  selected_financials[selected_financials != "sales"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names without sales to plot each numeric variable cube root transformated with selected variable
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "sales"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=sales^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Sales vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Net Profit

indicator <- nums_names %in%  selected_financials[selected_financials != "net_profit"]
selected_original_names <- nums_original_names[indicator]

#loop through selected financial variable names to plot each numeric variable cube root transformated with sector
ps <- list()
j = 1
for(i in c(selected_financials[selected_financials != "net_profit"])){
  ps[[j]] <- financials %>% 
                  ggplot(aes(x=net_profit^(1/3), y = cube_root(!!sym(i)))) +
                  geom_point(col = "black", alpha = 0.2) + 
                  geom_smooth(method='lm', formula= y~x, color = "#009966") +
                  labs(x = "", y = "") +
                  ggtitle(paste("Net Profit vs", selected_original_names[j])) +
                  theme_classic() 
  j = j+1
}
do.call(grid.arrange,ps)

Stefan1896/Company_Financials documentation built on March 19, 2023, 1:05 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Stefan1896/Company_Financials Company Financials of S&P 500 companies

In Stefan1896/Company_Financials: Company Financials of S&P 500 companies

Introduction

Preperations {.tabset}

Load Packages

Load Data

Data Overview and Preprocessing {.tabset}

Overview

Preprocessing

Univariate Data Exploration {.tabset}

Numeric Variables

Character Variable

Multivariate Data Exploration

Top/Bottom 10 Companies {.tabset}

Price/Earnings

Market Cap

EBITDA

Price/Sales

Price/Book

Sales

Net Profit

Relationship Sector/Financials {.tabset}

Price/Earnings

Dividend Yield

Market Cap

EBITDA

Price/Sales

Price/Book

Sales

Net Profit

Relationship Financials {.tabset}

Price/Earnings

Market Cap

EBITDA

Price/Sales

Price/Book

Sales

Net Profit

R Package Documentation

Browse R Packages

We want your feedback!

Stefan1896/Company_Financials
Company Financials of S&P 500 companies