In DFE-Capital/Blockbuster-2: Deterioration Model of the School Estate

Can I have a P please, Bob?. (via)

Modelling deterioration of the School Estate using Discrete-Time Markov chains

A useful package of functions and sample data to forecast the future condition of School blocks (or buildings) in the UK.

Modelling overview

Markov chains represent a class of stochastic processes of great interest for the wide spectrum of practical applications. In particular, discrete time Markov chains (DTMC) permit modelling the transition probabilities between discrete states by the aid of matrices. The states are represented by the grade factor which is the surveyed condition of a building element, determined by a quantity surveyor through grades (states); A, B, C, D, E. The transition probabilities are 6 by 6 matrices with the average deterioration rate per timestep between states provided by expert building consultant opinion (see documentation for full details).

Business questions it can answer

This package is useful for decision makers as it provides a modelling approach for predicting the average condition of an individual building component (elementid) through time. This modelling approach can be scaled through a decision making hierarchy, with modelling the deterioration at the level of a School block (buildingid), for each site (siteid), the School (which may be made of many blocks, URN), at the Local Authority (LA) level (LAN) and at the National level given suitable input data (filter desired rows using the dplyr package is recommended).

The Blockbuster function can be used on data containing one or more components in one or more school blocks. Individual component level information is stored as a data.frame with strict conditions on columns while block-level information is held in another data.frame with strictly defined columns. It produces snapshots of the component and block conditions at each timestep. The model can be applied to individual components, buildings, collections of buildings, or the entire estate. This flexibility allows decision makers to ask many and varied questions of the output data.

Using the package

Check you have these packages installed and then load them into memory using library() or require().

library(blockbuster2)

To run a 10 year simulation of the provided simulated dataset, with no rebuilding and £1bn of repair funds per year, the command is

Blockbuster(simulated_elements, forecast.horizon = 10, rebuild.money = 0, repair.money = 1000000000)

Passing numeric vectors to the rebuild.money and repair.money arguments allows control of the amount of money available for rebuilding and repairing each year respectively, and forecast.horizon allows you to specify the number of years to simulate.

The model outputs a list containing a summary of the area and backlog at each grade by component, by building along with the raw state of all individual components at the end of the simulation. It is possible to save to file the states of individual components in interim years by passing save = TRUE as an argument.

Remember to use the R help for all of the functions and data mentioned in the vignette for extra detail (e.g. try typing ?Blockbuster into the console).

Input format

The first argument for the Blockbuster function is a data.frame containing the data for the elements that you wish to simulate. The package contains some simulated data in the correct format under the name simulated_elements.

data("simulated_elements", package = "blockbuster2")
dplyr::glimpse(simulated_elements)

For use with blockbuster, the data.frame must contain the following columns:

Column Description -------------- ----------------------------------------------------------- buildingid A number that uniquely identifies the building a component is part of

A The probability the element is at grade A

B The probability the element is at grade B

C The probability the element is at grade C

D The probability the element is at grade D

E The probability the element is at grade E

ab The probability that the component will deteriorate from grade A to grade B in one year

bc The probability that the component will deteriorate from grade B to grade C in one year

cd The probability that the component will deteriorate from grade C to grade D in one year

de The probability that the component will deteriorate from grade D to grade E in one year

unit_area A quantification of the size of the component - this is usually an area but could be a count or length

B.repair.cost The cost to repair one unit_area of this component at grade B

C.repair.cost The cost to repair one unit_area of this component at grade C

D.repair.cost The cost to repair one unit_area of this component at grade D

E.repair.cost The cost to repair one unit_area of this component at grade E

gifa The Gross Internal Floor Area of the building a component is part of. This is used to generate the cost of rebuilding the entire building

The simulated data provided with the package also contains the following column that is not used internally but could be useful when analyzing outputs:

Column Description

elementid A number that identifies the component type

The package is designed to be used with PDS data which also includes the following columns that can be useful when looking at outputs:

Column Description

URN A number that uniquely identifies a school

siteid A number that uniquely identifies a school site (schools may have more than one site)

LAN A number that uniquely identifies a Local Authority

The package also includes a block-level summary of the simulated data as simulated_blocks

data("simulated_blocks", package = "blockbuster2")
dplyr::glimpse(simulated_blocks)

This is the same format as the block-level summary outputs of the model.

Output format

The model output is a list with four entries:

element summary A tidy format data frame of area and backlog by component type, grade and year.
building summary A tidy format data frame of area and backlog by building, grade and year.
element A data frame containing the final state of all simulated element.
block A data frame containing the final state of all simulated buildings.

So - presuming the output has been saved as an object called output - the command output$"element summary" will return the element-level summary, output$"building summary" will return the block-level summary and so on.

The element-level data frame in the output will contain the same columns as the data frame passed to the element.data argument, so additional data irrelevant to the model (such as siteid) will be passed through unchanged.

The block-level data frame will contain the columns buildingid, block.rebuild.cost, *.block.repair.cost and ratio. If a data frame is passed to the block.data argument then other columns will be passed through unchanged.

If you want to access the raw element-level data for interim years, it is possible to set the Blockbuster function to save interim states to file. Setting save = TRUE will turn this feature on and the filelabel and path arguments allow you to set the path and labels for the saved files. Due to memory restrictions, separate files are saved for each year. Files can be loaded individually into your session using something akin to load("./output/blockbuster_element_2.rds") depending on where you set the save, path and filelabel arguments.

Example analysis

Here we simulate the deterioration of one building through one year.

#  output demonstration

# construct single block input
my_output <- Blockbuster(simulated_elements, save = FALSE)
dplyr::glimpse(my_output) # this function makes the output nicer to read

We start by comparing the initial overall backlog and the backlog after one year.

sum(my_output$"element summary"$backlog[my_output$"element summary"$year == 0])
sum(my_output$"element summary"$backlog[my_output$"element summary"$year == 1])

We recommend using the pipe operator %>% for coding, as used in dplyr as it results in code that is much simpler and easier to read. For example, the above code can be replaced with

my_output$"element summary" %>%
  group_by(year) %>%
  summarise(backlog = sum(backlog))

We suggest using the same style code for visualisations. The following uses the ggplot2 to plot the change in expected cost of repairing all components at the associated grade.

library(ggplot2)
my_output$"element summary" %>%
  group_by(year, grade) %>%
  summarise(backlog = sum(backlog)) %>%
  ggplot() +
  geom_line(aes(x = year, y = backlog, colour = grade))

Although the output groups summary statistics by element or building, any analysis related to component type or buildings will be highly suspect. This is because the Blockbuster model does not differentiate between different buildings, schools or components when making repair and rebuild decisions at present. In reality the decisions on what funds are spent on what projects is highly dependent on school-level factors and the deterioration and repair of components is also correlated. We therefore strongly recommend aggregating outputs across the complete simulated estate.

The default arguments

In the example above we passed the blockbuster function only one argument and it still worked. That is because if you do not explicitly state some of the arguments it will assume default values for these. We can look at the defaults using the formals function.

formals(Blockbuster)

The default behaviour of the model is to simulate deterioration over one year without any rebuilding or repairing and no inflation. The argument element.data does not have default behaviour thus we must pass it to the blockbuster function.

For more details try typing ?blockbuster into the console.

The transition probabilities and the component repair costs

The deterioration rates and component repair costs are passed to blockbuster within the input element.data data.frame, allowing for individually tailored deterioration rates for all components if desired. Let's inspect some rates and costs in simulated_elements.

simulated_elements[1:2, ] %>%
  select(elementid, ab, bc, cd, de, B.repair.cost, C.repair.cost, D.repair.cost, E.repair.cost)

Every component type is identified by elementid. The "two-letter" variables describe the transition rate or probability from the first letter condition grade to the second letter condition grade. For example,cd describes the proportion of the unit_area at grade "C" that will transition to grade D through one timestep or year.

The *.repair.cost columns contain the unit cost of repairing components at the associated grades. The individual expected component repair costs - stored in the *.repair.total columns - are computed by multiplying these by unit_area and the probability of being at that grade (columns A, B, C, D, and E).