title: "Handling of company hierarchies" author: "Mark Zwart" date: "2019-04-17" output: rmarkdown::html_vignette: css: graydon.css df_print: paged vignette: > %\VignetteIndexEntry{Vignette Title} %\VignetteEngine{knitr::knitr} %\VignetteEncoding{UTF-8}
In our database we can find the relations between companies by a parent child relation, but there is a lot of information that can be gained from the total total tree of company relations. This package contains several functions that is useful for extracting this kind of information, and for plotting the total company hierarchy.
First let's load the package:
library(graydon.package)
The package contains a data frame, tbl_company_relations which is an ex(s)ample of how company relations are typically represented in our database (id_company, id_company_parent), with some extra added data about each of the companies. The data frame contains the following columns:
|Column.names | |:-----------------| |id_company | |id_company_parent | |code_sbi | |size_company | |qty_employees |
Company relations in our database are represented in parent-child relations, but this doesn't give you the complete overview of how companies fit together in their complete hierarchy. [Graphs](https://en.wikipedia.org/wiki/Graph_(discrete_mathematics) provide a way of getting those overviews. Let's load the igraph library first:
library(igraph)
With a simple function call we can create a graph, based on those parent-child relationships expressed in a table. Make sure the child and the parent ID columns are the first and second column in the table respectively.
graph_company_hierarchies <- create_graph_company_hierarchies(tbl_company_relations)
The resulting graph represents a network of all individual company hierarchies in one big 'space' which can be graphically represented like below. You can see this plot is pretty useless in in itself, but this 'space' of hierarchies can be used as a 'database' to easily retrieve the desired company hierarchies which are usefull for plotting and doing calculations within those.
The plot_graydon_graph function is used to plot graphs using graydon colors, the extra parameters vertex.label, vertex.size and edge.arrow.size are set so they override the defaults.
plot_graydon_graph(graph_company_hierarchies,
vertex.label = "",
vertex.size = 4,
edge.arrow.size = 0)
There are a few ways which we can select the distinct company hierarchies so we can make useful calculations or plots about them: you can either select them individually or by bulk.
If you have a single company that you want to find out about more you can select it using the function find_company_hierarchy:
id_company_selected <- "931238099"
graph_company_hierarchy <- find_company_hierarchy(graph_company_hierarchies, id_company_selected)
The resulting graph has the logical attribute is_searched_company set to TRUE of the searched for company. Let's plot this graph, highlighting the searched for companies:
igraph::V(graph_company_hierarchy)$color <- ifelse(igraph::V(graph_company_hierarchy)$is_searched_company,
col_graydon[2],
col_graydon[4])
plot_graydon_graph(graph_company_hierarchy)
A customer's file contains multiple companies of which we might want to calculate, or show (parts) of their hierarchy. The list_selected_hierarchies function can be used to put the graphs of all these companies in a list. In this example we have a data frame tbl_customers with the column id_company, of which we want to retrieve all the graphs.
list_selected_hierarchies <- select_graph_hierarchies(graph_company_hierarchies,
tbl_customers$id_company)
Note that the tbl_customers data-frame contains 300, while the list_selected_hierarchies contains 248 graphs; this is because there are customers that fall within the same company hierarchy. The companies in the graphs in this list now also contain an extra logical attribute, is_searched_company, which indicates whether the company was in the id_company column. Let's take a look at one of the graphs that contain multiple customers. The customer 'vertices' are colored orange here:
igraph::V(graph_example)$color <- ifelse(igraph::V(graph_example)$is_searched_company,
col_graydon[2],
col_graydon[4])
plot_graydon_graph(graph_example, vertex.label = "")
You can also select parts of company hierarchies by selecting it's neighbourhood. These kinds of graphs are called Neighborhood- or Ego graphs. You can find the neighbourhood of companies with the function select_ego_graphs. In this example we are looking for two ego graphs, where are neighbourhood is defined as being two steps away as a maximum (set by the function argument distance, where it's default is 1), and we want only the companies (grand)children (as set by the argument only_children, which default value is FALSE). Both requested company ID's come from the same company hierarchy, but result in two ego graph's nonetheless.
lst_results <- select_ego_graphs(graph_company_hierarchies,
id_companies = c("169072", "910716048"),
distance = 2,
only_children = TRUE)
For whatever reason, you might want to creating a list of all of the seperate company hierarchies out of the complete company hierarchy 'space'. This will take some time, so take some coffee....
list_all_graphs <- list_company_hierarchy_graphs(graph_company_hierarchies)
Having graphs is nice for creating beautiful plots and useful calculations, but for the data to be useful for further processing, getting data from the graph into a data-frame format is handy.
To turn a graph into a data-frame youy can use the function hierarchy_as_data_frame. We created the graph graph_company_hierarchy by searching hierarchies by company IDs and set the colors to them being in the searched-for companies, this is reflected by the graph_company_hierarchy and color columns respectively.
df_single_hierarchy <- hierarchy_as_data_frame(graph_company_hierarchy)
|name | id_company_parent|code_sbi | size_company| qty_employees|is_searched_company |color | |:---------|-----------------:|:--------|------------:|-------------:|:-------------------|:-------| |1003667 | 931238099|7022 | 2| 15|FALSE |#8BB0DE | |931238099 | 933898487|6420 | 1| 0|TRUE |#EB6E08 | |933898487 | 911277358|7022 | 1| 0|FALSE |#8BB0DE | |936298138 | 931238099|7311 | 2| 2|FALSE |#8BB0DE | |911277358 | NA|6420 | 2| 1|FALSE |#8BB0DE |
A list of graphs can also be converted into a data-frame using hierarchy_list_as_data_frame:
df_selected_hierarchies <- hierarchy_list_as_data_frame(list_selected_hierarchies)
Totalling values from across the network can sometimes come in handy for further calculations. You can use the function total_hierarchy_value in combination with aggregation functions like sum, mean, min or median for this
graph_company_hierarchy <- total_hierarchy_value(graph_company_hierarchy,
name_attribute = "qty_employees", name_total = "qty_employees_sum",
FUN = sum, na.rm = TRUE)
The result can be found can be shown in a graph, where we can see the original values and the totalled values.
igraph::V(graph_company_hierarchy)$label <- paste0("# ",
igraph::V(graph_company_hierarchy)$qty_employees,
" -> Sum # ",
igraph::V(graph_company_hierarchy)$qty_employees_sum)
plot_graydon_graph(graph_company_hierarchy)
It might come in handy to cumulate values in the hierarchy throughout the network in specific directions (bottom-up, top-down or all directions) and across specific distances. For this purpose you can use the propagate_hierarchy_value function.
In this example the companies will get a new attribute qty_employees_cum which will contain the sum of all the qty_employees of the companies that below the company within the hierarchy.
graph_company_hierarchy <- propagate_hierarchy_value(graph = graph_company_hierarchy,
name_attribute = "qty_employees",
name_propagate = "qty_employees_cum",
distance = Inf,
direction = "in",
FUN = sum,
na.rm = TRUE)
The result can be found can be shown in a graph, where we can see the original values and the rolled up values.
In this example the companies will get a new attribute qty_employees_cum which will contain the sum of all the qty_employees of the companies that are 2 companies up-stream within the hierarchy.
graph_company_hierarchy <- propagate_hierarchy_value(graph = graph_company_hierarchy,
name_attribute = "qty_employees",
name_propagate = "qty_employees_cum",
distance = 2,
direction = "out",
FUN = sum,
na.rm = TRUE)
In this example the companies will get a new attribute qty_employees_cum which will contain the sum of all the qty_employees of the companies that are 2 companies up-stream within the hierarchy.
graph_company_hierarchy <- propagate_hierarchy_value(graph = graph_company_hierarchy,
name_attribute = "qty_employees",
name_propagate = "qty_employees_cum",
distance = 1,
direction = "all",
FUN = sum,
na.rm = TRUE)
Some overall statistics about the hierarchy can be added to the individual companies by using the add_company_hierarchy_stats function. This function will add the following attributes is_tree - A logical indicating whether the company hierarchy is a tree (which should always be the case). qty_comapny_hierarchy - The the number of companies in the hierarchy. id_company_top - The company ID of the highest company in the hierachy (ultimate mother). distance_to_top - The number of companies between the current company and the ultimate mother company qty_child_companies - The number of child companies. qty_sister_companies - The number of sibling companies.
graph_company_hierarchy <- add_company_hierarchy_stats(graph_company_hierarchy)
Let's look at the statistics:
df_single_hierarchy <- hierarchy_as_data_frame(graph_company_hierarchy)
|name | id_company_parent|code_sbi | size_company| qty_employees|is_searched_company | qty_employees_sum| qty_employees_cum|is_tree | qty_hierarchy_companies|id_company_top | distance_to_top| qty_child_companies| qty_sister_companies| |:---------|-----------------:|:--------|------------:|-------------:|:-------------------|-----------------:|-----------------:|:-------|-----------------------:|:--------------|---------------:|-------------------:|--------------------:| |1003667 | 931238099|7022 | 2| 15|FALSE | 18| 15|TRUE | 5|911277358 | 3| 0| 2| |931238099 | 933898487|6420 | 1| 0|TRUE | 18| 17|TRUE | 5|911277358 | 2| 2| 1| |933898487 | 911277358|7022 | 1| 0|FALSE | 18| 1|TRUE | 5|911277358 | 1| 1| 1| |936298138 | 931238099|7311 | 2| 2|FALSE | 18| 2|TRUE | 5|911277358 | 3| 0| 2| |911277358 | NA|6420 | 2| 1|FALSE | 18| 1|TRUE | 5|911277358 | 0| 1| 0|
You can mark companies within a graphthat meet certain categorical criteria with the function mark_companies_logical. This can be useful for derivations or coloring graphs. As a somewhat travial example, let's mark the companies that have SBI codes (from code_sbi) that are a associated with holdings (that are in the vector vec_sbi_holdings). The new company attribute name will be is_holding.
vec_sbi_holdings <- c("64", "642", "6420")
graph_company_hierarchy <- mark_companies_logical(graph_company_hierarchy,
name_logical = "is_holding",
name_filter = "code_sbi",
set_criteria = vec_sbi_holdings)
You can get the siblings IDs of a company using the function get_sibling_ids, which in turn can be used to mark companies and plot them.
id_siblings <- get_sibling_ids(graph_company_hierarchy, "1003667")
graph_company_hierarchy <- mark_companies_logical(graph_company_hierarchy,
name_logical = "is_sibling",
name_filter = "id_company",
set_criteria = id_siblings
)
igraph::V(graph_company_hierarchy)$label <- igraph::V(graph_company_hierarchy)$name
V(graph_company_hierarchy)$color <- ifelse(V(graph_company_hierarchy)$is_sibling,
col_graydon[2],
col_graydon[4])
plot_graydon_graph(graph_company_hierarchy)
Remember the table of customers? If we want to get all siblings of those companies we can use the function get_siblings_df.
tbl_siblings <- get_siblings_df(graph_company_hierarchies, tbl_customers$id_company)
Below you can see a sample of the resulting data-frame, where you can see the company with it's siblings and the qty_siblings indicating the total number of siblings the company of id_company has.
|id_company |id_sibling | qty_siblings| |:----------|:----------|------------:| |942286189 |940892499 | 3| |942286189 |942694163 | 3| |942286189 |942695313 | 3| |917758536 |929312430 | 9| |917758536 |931364582 | 9| |917758536 |933317948 | 9|
When dealing with customers you'll ofen encounter them having economic activity codes that mark them as financial holdings. This is often the case because this is the company you send the bills too. But this is not really informative when analysing the type of companies you have as customers. In this case you might want to recode them to better reflect what kind of business they are really in.
Here you see an example of a company hierarchy with some holdings, which are colored orange here:
vec_sbi_holdings <- c("64", "642", "6420")
graph_company_hierarchy <- mark_companies_logical(graph_company_hierarchy,
name_logical = "is_holding",
name_filter = "code_sbi",
set_criteria = vec_sbi_holdings)
igraph::V(graph_company_hierarchy)$label <- igraph::V(graph_company_hierarchy)$code_sbi
V(graph_company_hierarchy)$color <- ifelse(V(graph_company_hierarchy)$is_holding,
col_graydon[2],
col_graydon[4])
plot_graydon_graph(graph_company_hierarchy)
vec_sbi_holdings <- c("64", "642", "6420")
library("png")
img_holding <- readPNG("~/R scripts/hierarchy_changes/money-svg-hand-icon-png-3.png")
img_regular <- readPNG("~/R scripts/hierarchy_changes/vector-apartments-business-building-6.png")
graph_company_hierarchy <- mark_companies_logical(graph_company_hierarchy,
name_logical = "is_holding",
name_filter = "code_sbi",
set_criteria = vec_sbi_holdings)
V(graph_company_hierarchy)$raster <- list(img_regular, img_holding)[V(graph_company_hierarchy)$is_holding+1]
plot_graydon_graph(graph_company_hierarchy,
vertex.shape="raster",
vertex.label=NA,
vertex.size=24,
vertex.size2=24,
edge.width=2)
You can count the number of financial holdings in a company hierarchy like this:
count_companies_by_set(graph = graph_company_hierarchy,
name_filter = "code_sbi",
set_criteria = vec_sbi_holdings)
#> [1] 2
You can use the function recode_holding_codes to recode the holdings so they reflect the economic activity codes of the majority of the children:
graph_company_hierarchy <- recode_holding_codes(graph_company_hierarchy,
name_activity_code = "code_sbi",
vec_holding_codes = c("64", "642", "6420"))
This would result in this recoded company hierarchy:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.