Conclusions from the Trial {#cons-trial}

Although a very thorough and detailed analysis, the address verification trial conducted in early 2022 uncovered some problems and limitations, which are summarised below.

Limitation on Power BI

Power BI, like Excel, doesn't cope well with large file sizes. Whilst in theory, Power BI has no file size limit (Excel has a row limit of 1,048,576), anything larger than around 1GB will severely affect performance and will eventually stall the applications completely. For this reason the postcode address file was split into sections, so each postal town could be worked on independently. Although this improved the performance of Power BI, it indirectly led to the second problem of not having all the data to work on.

Not having all the data

Working on one postal town at a time certainly helped in improving the performance of Power BI and Excel, but it threw up other issues with postcode filtering. In particular, postcodes exist in B7 (Birmingham) and B74 (not Birmingham), so filtering using this method proved problematic, as explained in Birmingham Address Matching Recommendations (paragraph 4).

Defining what a 'good' address means

The trial exposed some questions about what exactly constitutes a 'good' address. Do we mean that the address is deliverable (in other words, a letter sent to the address would arrive), or do we mean that issues exist in the consistency of the address field (for example, a postal town is missing or doesn't correspond to the postcode)? Companies House don't currently define what is meant by an address, or any of the fields that make up the customer address. We don't define either what we mean by a customer. These problems exist because we don't currently have a business glossary^[https://dataedo.com/blog/business-glossary-vs-data-dictionary] that defines all the terms used within Companies House.

Our current address verficiation process

The law says that a company must provide a registered office address when setting up a limited company, which must be:

The government website showing this information^[https://www.gov.uk/limited-company-formation/company-address#content] gives no further guidance about which elements of an address must be provided. According to our Filing Services Service Owner, "neither legislation nor back end systems require post code - I believe the minimum for an address is a building, on a road, in a place".

Next steps

Given these limitations on the trial conducted in early 2022, we decided to move away from Excel and Power BI, partly due to the software constraints on file sizes, but also because:

For these reasons we decided to move the project from Power BI and Excel into Python.



companieshouse/DARr documentation built on Oct. 22, 2022, 8:26 p.m.