As part of Companies House Data Strategy, Data Governance and Data Quality is being reviewed. From this, one of the issues identified with Companies House data, is that around 1% of companies do not supply an accurate postcode, or do not provide a postcode at all.
The current legislation is that Companies have to provide a physical address, however it is not a legal requirement to provide a postcode.
Companies House also states the below regarding the information available for public use: “We carry out basic checks on documents received to make sure that they have been fully completed and signed, but we do not have the statutory power or capability to verify the accuracy of the information that companies send to us.”
This means some data can be incorrectly input and isn’t verified, leading to a lower data quality, particularly when some fields on the registering form are not mandatory. In order to gain further insight on the postcode inaccuracies in Companies House data, a scoring matrix for the quality of address and postcode data given by a company. Companies address details were compared to the Postcode Address File (PAF) and scores given when the address details met a certain criteria.
Table: (#tab:unnamed-chunk-1)Postcode Scoring Matrix | Version 1
| Score|Definition | |-----:|:----------------------------------------------------------| | 0|No postcode given | | 1|Postcode given but doesn’t exist in PAF | | 2|Partial postcode given | | 3|Postcode matches PAF but given in the incorrect field | | 4|Full correct postcode given, address doesn't match PAF | | 5|Full correct postcode given, address partially matches PAF | | 6|Address and postcode details match PAF |
The data from Companies House was from the Companies House Free Company Data Product containing basic details of live companies on the register.
This is saved on the Companies House website. The file used for this data set was named ‘BasicCompanyDataAsOneFile-2022-03-01.zip (415Mb)’ and is publicly available at:
http://download.companieshouse.gov.uk/en_output.html
The PAF is a database containing all known postcodes in the UK, containing the Royal Mail postal addressed. This was downloaded from MongoDB.
The data used in this analysis containing the Glasgow addresses is saved in the below Companies House sharepoint page (access may be restricted to DAR).
The Software/programs used for analysis were Microsoft Excel and Microsoft Power BI.
This data was the current position of Companies House and the PAF on 11/03/2022 therefore any subsequent updates in the Companies House data or Postcode Address File will not be reflected in this analysis.
The filters used within the data were:
Not all address data was complete so the filters were applied to fit one or more of the above criteria.
Import the data files onto Power BI (desktop).
Open the PAF and highlight the ‘postcode.stripped’ column. Replace the spaces with nothing to remove any whitespace.
Open the CH data file and highlight the ‘RegAddress.PostCode’ column. Replace the spaces with nothing to remove any whitespace.
Open the Companies House Data in the Query editor
Add a conditional column and insert the below conditions to populate the condition with the phrase “BIRMINGHAM”, else “Other”.
‘RegAddress.PostCode’ beings with ‘B1’,’B2’,’B3’,’B4’,’B5’,’B6’,’B7’
OR
RegAddress.AddressLine1, RegAddress. AddressLine2, RegAddress.PostTown, RegAddress.County contains ‘Birmingham’.
Filter any remaining rows out (in this data, there were a few locations on ‘Birmingham Road’ with a different postcode that were not filtered out using the above conditions and some postcodes beginning with be such as B80 which are not in the Birmingham PAF).
See Glasgow instructions
These were scored while scoring postcodes with a ‘4’. Any that did not score 4, were given the score of 5. These were The same conditions in the address line details, but with the PAF Match column filtered to those that were ‘True’ and matched the postcode field in Companies House Data.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.