This is an R package aimed at helping in extracting data from companies house: https://www.gov.uk/government/organisations/companies-house
It particular, it provides a way to search for companies and extract a set of company numbers. These company numbers can then be used to identify company directors.
This package also provides functions which allow you to build a network of interlocking directors, that is a network of individuals and the companies, linked by board membership. Other networks are also created - such as director networks, this is a set of individuals linked by sitting on (at least one of) the same company board of directors. Company networks - a set of companies linked by having (at least one of) the same directors sitting on the board.
To install follow these steps
#Install CompaniesHouse:
library(devtools)
devtools::install_github("MatthewSmith430/CompaniesHouse")
library(CompaniesHouse)
To extract data from companies house (with the API), you will need to get an authorisation key.
The instructions on how to obtain your key can be found at: https://developer.company-information.service.gov.uk/
When using the package, save your key as mkey
:
mkey<-"ENTER YOUR AUTHORISATION KEY"
The following function allows you to search for companies in Companies
House (using the API). You use the search term with your authorisation
key, and it returns a list of companies that match the search term. It
also give the Companies House Company number, the company address and
various other information. The company number is important, as it is
used to identify the firm, and is used in many of the other package
functions. There are three versions for this command:
1. CompanySearch_limit_first
This returns the first company from the
search results
2. CompanySearch_limit
This return the first page of search results
3. CompanySearch
This returns all search results
In the following example I will use CompanySearch_limit
(yet I will
only display the first three results to save space)
#Search for a "COMPANY SEARCH TERM"
#In this example we use "unilever"
CompanySearchList<-CompanySearch_limit("unilever",mkey)
id.search.term
company.name
company.number
Date.of.Creation
company.type
company.status
address
Locality
postcode
unilever
UNILEVER PLC
00041424
1894-06-21
plc
active
Port Sunlight, Wirral, Merseyside, CH62 4ZD
Merseyside
CH62 4ZD
unilever
UNILEVER AUSTRALIA INVESTMENTS LIMITED
00137659
1914-09-12
ltd
active
Unilever House, 100 Victoria Embankment, London, EC4Y 0DY
London
EC4Y 0DY
unilever
UNILEVER AUSTRALIA PARTNERSHIP LIMITED
00315312
1936-06-17
ltd
active
Unilever House, 100 Victoria Embankment, London, EC4Y 0DY
London
EC4Y 0DY
This function extracts director information for a company numbers. Where it gives a dataframe containing a list of directors and director information for the company number. In this example, I will ouput a small selection of the directors from Unilever Plc.
#We conintue to use Uniever as a example, we know that the company
#number for Unilever Plc is "00041424".
#Therefore we can extract the director information
#for Unilvever Plc
DirectorInformation<-company_ExtractDirectorsData("00041424", mkey)
company.id
director.id
directors
start.date
end.date
occupation
role
residence
postcode
nationality
birth.year
birth.month
former.name
download.date
00041424
i-a1nTc06VZikEBTLGW9DYwuANM
SOTAMAA, Ritva
2018-01-01
NA
NA
secretary
NA
EC4Y 0DY
NA
NA
NA
NULL
2020-10-11
00041424
ZI4TtLjPrlcnIckJGNlqCLV2s\_Y
ANDERSEN, Nils Smedegaard
2015-04-30
NA
None
director
Denmark
EC4 0DY
Danish
1958
7
NULL
2020-10-11
00041424
E3FTMwTYyFn9\_AXshohmRCws23c
CHA, Laura May Lung
2013-05-15
NA
Deputy Chairman Hsbc Asia Pacific
director
Hong Kong
EC4Y 0DY
Chinese
1949
12
NULL
2020-10-11
##Company Sector Code This function finds the sector a company operates in - where it gives its SIC code. The function requires the company number.
#Again we use Unilever Plc as an example - using their company number
CompanySIC<-CompanySIC("00041424", mkey)
CompanySIC
x
70100
In CompaniesHouse
you can also examine the boards that a director sits
on, if you have the director id. The indiv_ExtractDirectorsData
function returns a list of firms where the individual has served as a
member of the board.
You can also search for directors by name - in a similar way to company
searches. Where you can search by director name. Similar to the company
search there are three options:
1. DirectorSearch_limit_first
This returns the first director from
the search results
2. DirectorSearch_limit
This return the first page of search
results
3. DirectorSearch
This returns all search results
An example of the director search function can be used for examining Boris Johnson and the firms he has previously acted as a director.
The first steps is to use the function to identify his director id. His date of birth is June 1964 - this information can be used to identify the correct person and id.
##Use of make use of tidyverse package to process the dataframe
library(tidyverse)
boris_search<-DirectorSearch_limit("boris johnson",mkey) %>%
filter(month.of.birth==6 &
year.of.birth==1964)
id.search.term
director.id
person.name
addess.snippet
locality
month.of.birth
year.of.birth
boris johnson
EZWa9WI6ur100VnMhfHT6EP4twA
Alexander Boris De Pfeffel JOHNSON
13 Furlong Road, London, N7 8LS
London
6
1964
Now we have his id, we can extract the list of firms where he has been a director.
boris_info<-indiv_ExtractDirectorsData(boris_search$director.id,mkey)
company.id
comapny.name
director.id
directors
director.forename
director.surname
start.date
end.date
occupation
role
residence
postcode
nationality
birth.year
birth.month
appointment.kind
download.date
05774105
FINLAND STATION LIMITED
EZWa9WI6ur100VnMhfHT6EP4twA
Alexander Boris De Pfeffel JOHNSON
Alexander
JOHNSON
2006-04-07
2008-05-23
Editor/Politician
director
NA
N7 8LS
British
1964
6
personal-appointment
11/10/2020 19:09:53
The package can be used to create a set of networks. - Interlocking directorates network: a set of companies and individuals, where individuals are tied to companies where they sit on the board of directors. - Director network: a set of directors, where they are linked if they sit on the same company board. - Company network: a set of companies, where they are linked if they share a director.
The following functions create the various networks. Where a list of company numbers is required to create these networks.
When creating the interlock network - you need to specify the years that you want to cover - a start year and end year There are two ways to create the interlocking directorates network:
1.) From a list of company numbers
INTERLOCKS1<-InterlockNetwork(List.of.company.numbers,mkey)
##Example for all company numbers associated with the
##Unilever search term for 2015 -2020
##The first steps is to remove companbies that are no longer active from the list, then create the interlock network
library(tidyverse)
COMP_LIST<-CompanySearchList%>%
filter(company.status=="active")
INTERLOCKS1<-InterlockNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)
2.) From a data frame produced using the indiv_ExtractDirectorsData
function. This dataframe can be edited manually to use company names (or
perhaps another id system) in the network.
INTERLOCKS2<-make_interlock(DataFrame)
##Example for all company numbers associated with the
##Unilever search term - the dataframe created with indiv_ExtractDirectorsData
INTERLOCKS2<-make_interlock(MultilpleDirectorInfo)
The next network that can be created with the CompaniesHouse
package
is the company network. This is a one-mode projection of th interlocking
directorates network. It is a set of companies that are linked when they
share a director.
CompanyNET<-CompanyNetwork(List.of.company.numbers,mkey,start = 2015,end = 2020)
##Example for all company numbers associated with the
##Unilever search term:
CompanyNET<-CompanyNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)
The next network that can be created with the CompaniesHouse
package
is the director network. This is a one-mode projection of the
interlocking directorates network, but for directors instead of
companies. It is a set of direcotrs that are linked when they share a
sit on the same board.
DirNET<-DirectorNetwork(List.of.company.numbers,mkey,start = 2015,end = 2020)
##Example for all company numbers associated with the
##Unilever search term:
DirNET<-DirectorNetwork(COMP_LIST$company.number,mkey,start = 2015,end = 2020)
The network (igraph
object) is required for these functions. These are
calculated using the commands from the “Create Networks” section.
For each network we can calculate a range of centrality measures. The director and company networks are one-mode networks, so a wider range of centrality measures can be calculated.
INTERLOCKcent<-InterlockCentrality(INTERLOCKS1)
NAMES
Degree.Centrality
00041424
00041424
20
00137659
00137659
4
00315312
00315312
4
COMPANYcent<-one_mode_centrality(CompanyNET)
name
Weighted.Degree
Binary.Degree
Betweenness
Closeness
Eigenvector
NAMES
00041424
00041424
2
2
0.0000
0.0110
0.0018
00041424
00137659
00137659
25
11
3.1333
0.0152
1.0000
00137659
00315312
00315312
25
11
3.1333
0.0152
1.0000
00315312
DIRcent<-one_mode_centrality(DirNET)
name
Weighted.Degree
Binary.Degree
Betweenness
Closeness
Eigenvector
NAMES
p9NWLNpKrF1rsf9hRuxo6j0YbJQ
p9NWLNpKrF1rsf9hRuxo6j0YbJQ
19
19
0
0.0012
0.9103
p9NWLNpKrF1rsf9hRuxo6j0YbJQ
bJK4sl0SPT-Zxzq88lC1ouqrtl8
bJK4sl0SPT-Zxzq88lC1ouqrtl8
19
19
0
0.0012
0.9103
bJK4sl0SPT-Zxzq88lC1ouqrtl8
\-dJ\_v\_xnd71ByCzbr1g-uLqafak
\-dJ\_v\_xnd71ByCzbr1g-uLqafak
19
19
0
0.0012
0.9103
\-dJ\_v\_xnd71ByCzbr1g-uLqafak
We can calculate the properties of the director and company networks.
COMPANYprop<-CompanyNetworkProperties(CompanyNET)
One-Mode Company Network
Size
17.0000
Density
0.4338
Diameter
5.0000
Average.path.lenth
1.7905
Average.node.stregnth
6.1176
Average.Degree
3.4706
Betweenness.Centralisation
0.2502
Closeness.Centralisation
0.1160
Eigenvector.Centralisation
0.4333
Degree.Centralisation
0.2537
Clustering.coefficent.transitivity
0.8491
Clustering.Weighted
0.9158
DIRprop<-DirectorNetworkProperties(DirNET)
One-Mode Director Network
Size
66.0000
Density
0.1716
Diameter
6.0000
Average.path.lenth
2.6284
Average.node.stregnth
6.0606
Average.Degree
5.5758
Betweenness.Centralisation
0.3776
Closeness.Centralisation
0.0298
Eigenvector.Centralisation
0.7157
Degree.Centralisation
0.3669
Clustering.coefficent.transitivity
0.8848
Clustering.Weighted
0.8231
The following function create plots of various networks. The TRUE/FALSE option indicates whether node labels should be included in the plots or not. The network plots are created from a list of company numbers for a quick inspection of the networks. There are a number of other commands and packages that can be used to create high quality network visualisations from the network objects in R. You can also specify the node size - in the following examples we use size 6.
InterlockNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)
#Directors Plot
DirectorNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)
#Company Plot
CompanyNetworkPLOT(COMP_LIST$company.number,mkey,FALSE,NodeSize = 6,start = 2015,end = 2020)
You can also create grid plots - showing a plot of all three networks on a
single grid using the cowplot
library. In the example below we plot
the networks in a grid, setting node size to degree centrality.
library(cowplot)
library(CompaniesHouse)
##Create plot objects with node size based on centrality
interlock.plot<-InterlockNetworkPLOT(COMP_LIST$company.number,
mkey,FALSE,NodeSize = "CENTRALITY",
start = 2015,end = 2020)
director.plot<-DirectorNetworkPLOT(COMP_LIST$company.number,
mkey,FALSE,NodeSize = "CENTRALITY",
start = 2015,end = 2020)
company.plot<-CompanyNetworkPLOT(COMP_LIST$company.number,
mkey,FALSE,NodeSize = "CENTRALITY",
start = 2015,end = 2020)
##Plot as a grid
plot_grid(interlock.plot,director.plot,company.plot,
labels=c("Interlocks","Directors","Companies"))
If your research require to examine the gender of directors, and how patterns of interlocking directorates differ for males and female, you will need additional information, as companies house does not provide gender information. However, there are a number of R packages that estimate the likelihood that a individual is male or female based on their first names. Although this is restricted to English first names, it still remains a useful tool to proxy gender information.
The available packages include gender
and genderize
. In the
following example, we make use of the genderize
package. We extract
the gender information for all actors in the example Unilever director
network, and then plot this network with the gender information.
This examples shows how you can identify gender for a network of
directors where the director name is present. These commands cannot be
directly to the object created by the CompaniesHouse
package, as the
directors are identified by their id only in this network.
##Load the relevant packages
library(igraph)
library(magrittr)
library(intergraph)
library(network)
library(GGally)
#devtools::install_github("kalimu/genderizeR")
library(genderizeR)
##Create name dataframe from director network
directornames<-V(DirNET)$name%>%as.data.frame(.,stringsAsFactors=FALSE)
colnames(directornames)<-"Names"
##Split the names into first and last names
names.split <- strsplit(unlist(directornames$Names), ",")
name.last <- sapply(names.split, function(x) x[1])
name.first <- sapply(names.split, function(x)
# this deals with empty name slots in your original list, returning NA
if(length(x) == 0) {
NA
} else if (x[length(x)] %in% c("Jr.", "Jr", "Sr.", "Sr",
"Dr", " Dr", " Baron","Dr.", " Dr.",
"Professor", " Professor")) {
gsub("[[:punct:]]", "", x[length(x) - 1])
} else {
x[length(x)]
})
##Create new names dataframe
nameDF<-data.frame(id=1:length(name.first),
name.first=name.first,
name.last=name.last)
nameDF$name.first<-as.character(nameDF$name.first)
nameDF$name.first<-trimws(nameDF$name.first)
nameDF$name.last<-as.character(nameDF$name.last)
##Extract first word of first name vectors (as this can also include middle/multiple names etc)
##This will be used as matching key later.
name1<-gsub(" .*", '', nameDF$name.first)
nameDF<-cbind(nameDF,name1)
nameDF$name1<-as.character(nameDF$name1)%>%tolower()
##Implement genderizeR
xPrepared = textPrepare(nameDF$name.first)
givenNames = findGivenNames(xPrepared, progress = FALSE) %>% as.data.frame(.,stringsAsFactors=FALSE)
##From this create a gender-name key
nameKEY<-givenNames
nameKEY$probability<-NULL
nameKEY$count<-NULL
colnames(nameKEY)<-c("name1","gender")
##Merge this will the director name dataframe
nameDF <- merge(nameDF, nameKEY, by = "name1",all.x = TRUE)
nameDF$name1<-NULL
nameDF[is.na(nameDF)]<-"na"
##Add these as igraph network attributes
V(DirNET)$gender<-nameDF$gender
numericgender<-as.factor(nameDF$gender)%>%as.numeric() #Add numeric attribute
V(DirNET)$gendernumeric<-numericgender
##plot director network with gender information
DIRnetwork<-asNetwork(DirNET)
ggnet2(DIRnetwork,color.palette="Set1",
node.size=4,color.legend = "Gender",
node.color = get.vertex.attribute(DIRnetwork,"gender"),
label = FALSE,edge.color = "grey50",arrow.size=0)
####NOTE
##This can implemented for other languages (not just english names),
##if the following is implemented.
Sys.setlocale("LC_ALL", "Polish") #Polish example
##see the genderizeR documentation for further details
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.