In DFE-Capital/blockbuster: Deterioration model of the School Estate

Can I have a P please, Bob?. (via)

Modelling deterioration of the School Estate using Discrete-Time Markov chains

A useful package of functions and sample data to forecast the future condition of School blocks (or buildings) in the UK.

Modelling overview

Markov chains represent a class of stochastic processes of great interest for the wide spectrum of practical applications. In particular, discrete time Markov chains (DTMC) permit to model the transition probabilities between discrete states by the aid of matrices. The states are represented by the grade factor which is the surveyed condition of a building element, determined by a quantity surveyor through grades (states); N, A, B, C, D, E. The transition probabilities are 6 by 6 matrices with the average deterioration rate per timestep between states provided by expert building consultant opinion (see documentation for full details).

Business questions it can answer

This package is useful for decision makers as it provides a modelling approach for predicting the average condition of an individual building component (elementid) through time. This modelling approach can be scaled through a decision making hierarchy, with modelling the deterioration at the level of a School block (buildingid), for each site (siteid), the School (which may be made of many blocks, urn; omitted for anonymity), at the Local Authority (LA) level (lano) and at the National level given suitable input data (filter desired rows using the dplyr package is recommended).

The blockbuster function can be used on more than one row of what I call a blockbuster_tibble through more than one year, storing all the data of intermediary years. This flexibility allows decision makers to ask many and varied questions of the output data.

Loading the packages

Check you have these packages installed and then load them into memory using library() require().

#  You could load tidyverse or you can load the individual packages, one-at-a-time.

library(dplyr)  #  need pipe %>%
library(ggplot2)  #  for visualisation
library(markovchain)  #  Recommended for markovchain S4 object handling (advanced).

library(blockbuster)  #  Required.

library(ggthemes)  #  Make pretty output with little effort.

Inputs and Outputs Overview

Remember to use the R help for all of the functions and data mentioned in the vignette for extra detail (e.g. try typing ?blockbuster and ?blockbuster_pds into the console).

Input variables of each building component

Provided with this package is ten percent sampled data with identifying features removed (the school Unique Reference Number). We suggest loading the tidyverse package when handling this data. This object is a tibble called by blockbuster_pds. For creating a subset of this data to make it manageable for simulation, use the dplyr package.

The areafy2 function was used on data queried from the Department for Education Education Funding Authority Property Data Survey SQL data tables (see 01_read_and_tidy_data.R for details). This provides estimates for the unit_area (a quantification of how much of a building component there is) variable. This is crucial for the DTMC methodology and for quantifying the costs of repairs.

We take a look at the transpose of the data table which describes the dimensions of our sample data and the type of variables found therein. Note the cost_sum variable which is an aggregation of the initial costs estimated in the block or buildingid that that row is associated with (this variable is set to zero for timestep greater than zero).

dplyr::glimpse(blockbuster_pds)

The output variables compared to the input

The output of the blockbuster function is a list of dataframes. The number of dataframes is equivalent to the number of years simulated plus one. The output is of the custom class blockbuster_list (derivative to list) and contains all the variables from the input initial blockbuster_tibble. Here we simulate the deterioration of ten building components through one year. We are interested in the variables of a simulated PDS produced by our blockbuster model using default arguments.

#  output demonstration
dplyr::glimpse(  #  this function makes the output nicer to read
  blockbuster(blockbuster_tibble = blockbuster_pds[1:10, ],  #  ten rows for demo
                           forecast_horizon = 1)  #  We deteriorate through one year
               [[2]])  #  The initial data at timestep 0 is held in [[1]]

This output has foreign key variables with the suffix -id, that could be joined to other data from relevant SQL data tables (e.g. the user could find the urn by joining on siteid from an appropriate data table). The urn is omitted to maintain the anonymity of each school's condition included in the sample.

Importantly the simulated PDS output has an additional variable called block_rebuild_cost; that is the expected cost of rebuild for the whole buildingid associated with that building component given the expected rebuild_cost_rate argument passed to the blockbuster function (we use the default here which is £ 1274 / metre squared).

Generic methods

The generic functions plot(), boxplot() and summary() can be used on the ouput object of class blockbuster_list. These are detailed later in the vignette.

The default arguments

In the example above we passed the blockbuster function only two arguments and it still worked! That is because if you do not explicitly state some of the arguments it will assume default values for these. We can look at the defaults using the formals function.

formals(blockbuster)

The arguments blockbuster_tibble and forecast_horizon do not have defaults thus we must pass them to the blockbuster function.

For more detail try typing ?blockbuster into the console.

The transition probability data

The deterioration rates can be called by using either the tibble blockbuster_det_data or the markovchainList object blockbuster_mc_list. For example let's inspect an example of each.

blockbuster_det_data[2, ]

This identifies the building component through the element, sub_elment and const_type. The "two-letter" variables describe the transition rate or probability from the first letter condition grade to the second letter condition grade. For example, for this building component (here concated_det), the cd describes the proportion of the unit_area at grade == C that will transition to grade == D through one timestep or year (na means from New to A (excellent condition) and is not to be confused with NA).

blockbuster_mc_list@markovchains[[2]]
# isS4(blockbuster_mc_list@markovchains[[1]])  #  TRUE

blockbuster_mc_list is a S4 object so the data needs to be accessed through its slots. They hold the same data in different ways. The S4 method may be useful in the future when incorporating some functions from the markovchain package.

The blockbuster functions

We consider the first row of this sampled data which is a roof of excellent grade A condition with a unit_area of 308.4.

x <- blockbuster_pds[1, ]
dplyr::select(x, element, sub_element, const_type, grade, unit_area)

Inspecting the associated deterioration rates

To determine which transition matrix is associated with each row we use the det_what_tm() function. This produces an error if the markovchain object does not exist. This used to work by concatenating the element, sub_elment and const_type and matching with the concated_det we saw earlier. However, to speed up the efficiency of the code, the concated variable was also cached before the for loop in the blockbuster function. This required a modified input for this function prior to look up of the associated deterioration rates. This has since been replaced by using a lookup table based on elementid. This is faster as

# old method matched on strings of concated
mc1 <- det_what_tm(dplyr::mutate_(x, 
                                  concated = ~gsub(pattern = "[^[:alnum:] ]",
                                    replacement = "",
                                    paste(element, sub_element, const_type,
                                     sep = ""))))
mc1

# new method tests for equality using elementid
mc2 <- det_what_tm(x)

identical(mc1, mc2)  #  they are the same, lookup no longer requires concated

This is a markovchain object as described in the documentation of that package. It is a S4 object and we can access different components using slots and [ sub-setting.

mc1@transitionMatrix[, ]  #  [i, j] i for row, j column

Deteriorating through time

Typically the user will be interested in a subset of the data. Consult the dplyr vignette for handling and sub-setting your data based on desired filter conditions.

Here we demonstrate the deterioration of a single building component through time. We chose a critical building feature of one particular block or building we are interested in. CAVEAT: this modelling approach should be used at national level; this example is illustrative.

my_wall <- dplyr::filter(blockbuster_pds,
                         buildingid == 4382, sub_element == "Walls - Structure", 
                         const_type == "Brick / block")

my_wall_next_year <- blockbuster(my_wall, 1)
my_wall_next_year

my_wall_20year <- blockbuster(my_wall, 20)

This produces a list of tibbles. Each tibble provides the condition of the input tibble at that timestep. Notice how the number of rows has increased. That's because we have a row for each grade state and associated unit_area. The first tibble in the list is for the input data or timestep == 0. It is prudent to check the timestep by inspecting it.

my_wall_next_year[[2]]
head(my_wall_next_year[[2]]$timestep, 1)  #  1 timestep later than initial

Blockbusting through many rows

The same principle applies as above where we pass the blockbuster function an input tibble object and the number of years into the future (relative to when the data was collected) we want to model to.

y <- blockbuster_pds[1:20, ]  #  select all rows associated with decision level of interest, perhaps one block or one LA?
my_block_10years <- blockbuster(y, 10)
dplyr::select(my_block_10years[[11]],
              buildingid, elementid, grade, unit_area) %>%
  dplyr::arrange(elementid, grade)

Computation time

I suggest you spend time thinking about the precise question you are asking rather than using the whole blockbuster_pds tibble as input due to non-trivial computation time. As a rule of thumb the amount of rows you start with will be multiplied by five (as each elementid deteriorates to other states), with each of these tibbles being multiplied by the forecast_horizon. That's quite a lot of data and may tax your computer's RAM (although the unit_area, timestep and cost will be the only variables changing through time).

Plotting output basics

The tibble format is compatible with the tidyverse particularly reshape2-ing and plotting using ggplot2. The purrr package and the excellent map_df() function makes it easy to plot data from multiple dataframes across a list, as is the case with our blockbuster output.

Blockbusting

We can investigate the rate at which our critical elements are deteriorating or even what proportion of the unit_area is decommisioned. Given the flexibility of the model output we can ask anything of the data.

Here we are interested in tracking the deterioration of critical building components in our block of interest. We recycle the my_wall_20year object for this question. The wall starts as all condition grade B then gradually deteriorates.

p4 <- my_wall_20year %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, grade) %>%
  dplyr::summarise(sum_area = sum(unit_area, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(y = sum_area, x = grade)) +
  ggplot2::geom_bar(stat = "identity") +
  ggplot2::facet_wrap(
    ~timestep
  )

p4 + ylab("Estimated area (m^2)") + xlab("Condition grade") +
  govstyle::theme_gov()

Critical building elements

Typically we may be more interested in critical, structural building-features such as the roof or walls and whether the unit_area exceeds a critical threshold determined by expert opinion. For example, is our wall's total unit_area after 10 years comprised of over 20% grade D and E? We plot then inspect the numbers.

my_wall_10 <- blockbuster(my_wall, 10)
#  We are interested in the 10 th (+ 1) timestep
df <- my_wall_10[[11]]

#  Use gov style for plot, install these if you don't have them
# devtools::install_github('mangothecat/visualTest')  #  required for govstyle
# devtools::install_github("UKGov-Data-Science/govstyle")

p <- ggplot2::ggplot(data = df, aes(x = grade, y = unit_area)) +
  ggplot2::geom_bar(stat = "identity", fill = govstyle::gov_cols['turquoise']) +
  xlab("Condition") +
  ylab(expression(paste(
  "Unit area (",m^2,
  ")", sep = ""))) +
  govstyle::theme_gov()

p

We use dplyr to elucidate and determine that our wall is below our arbitrary 20% threshold, if we wanted to scale this we could create a new variable using mutate which is based on a logical test. The only limit is your imagination (and coding skills).

total_wall_area <- sum(df$unit_area)
dplyr::mutate(df, prop_area = unit_area / total_wall_area) %>%
  dplyr::select(grade, unit_area, prop_area)

Costing

The costing part of the blockbuster function is built using an internal function called blockcoster_lookup. This takes the building component elementid and condition grade information and looks this up in the blockbuster_pds_repair_costs tibble, thus finding the appropriate repair costs constant to multiply the unit_area of that building component by.

Originally it returned NA for grade E. This was due to lack of data on repairing a building component from grade E back to A, furthermore as grade E is meant to encapsulate a decommisioned status for now it was deemed more appropriate to only be able to get E back to grade N through rebuilding. However, user feedback suggested it more useful to attempt to approximate the grade E cost using a crude estimate for now. This was achieved by modifying the code in 04_read_tidy_costs_data.R. Costs for E are now simply the costs for D plus five percent. This approach was recommended by Adam Bray until we get empirical estimates.

Costs are calculated for each row. The cost_sum variable is set to zero as this can be recalculated later if required, as it requires an aggregation by buildingid step (this is somewhat technical, see the blockbuster code by the costing step, where aggregation occurs).

Blockcosting

We continue with the examples from above by considering the aggregated cost of repairs through time. Let us plot the distribution of costs by condition grade using a Tukey box plot through each year (R and ggplot make it easy to look at the data in different ways). Repair costs from grade A should always be zero as they are already in excellent condition and do not need any work. This plot shows how the median costs for each grade steadily increase year on year with some building components in particularly deteriorating quickly and in a costly fashion (the outliers indicated by the black circles). Note: the comments in the code are there to teach you the basis of this coding approach using the map_df function from the purrr package.

Boxplot interpretation

The red "dots" at the end of the boxplot represent outliers. These observations contribute to the total cost of the block much more than others. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:
less than Q1 - 1.5 x IQR
greater than Q3 + 1.5 x IQR
then that point is classed as an "outlier". The line goes to the first data point before the "1.5" cut-off. Note: the inter-quartile range; IQR = Q3 - Q1.

We have added a scattered jitterplot on top using geom_point() so that you can see how all the observations contribute to the observed cost distribution.

#  http://www.machinegurning.com/rstats/map_df/

# 
# cool_cars <- list(mtcars, mtcars, mtcars) %>%
#   map_df(
#     ~ .x ,
#     .id = "year"
#   ) %>%
#   group_by("year") %>%
#   ggplot(aes(x = cyl, y = disp, group = cyl)) +
#   geom_boxplot() +
#   facet_wrap(
#     ~year
#   )
# cool_cars

p2 <- my_block_10years %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by("timestep") %>%
  ggplot2::ggplot(aes(x = grade, y = cost)) +
  ggplot2::geom_boxplot(outlier.colour = "red") +  #  Set to NA to avoid double plot
  ggplot2::facet_wrap(
    ~timestep
  )

p2 + ylab("Cost of repairs (£)") + xlab("Condition grade") +
  govstyle::theme_gov() + geom_point(position = position_jitter(width = 0.05),
                                     alpha = 0.15)

The same plot is produced using the generic method of boxplot().

Total Blockcosting

We take our hypothetical building and plot the total costs year on year by condition grade. Notice how the costs to repair the hypothetical block increases, accelerating as more and more unit_area gets into grade E.

p3 <- my_block_10years %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, grade) %>%
  dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(x = grade, y = sum_cost)) +
  ggplot2::geom_bar(stat = "identity") +
  ggplot2::facet_wrap(
    ~timestep
  )

p3 + ylab("Total cost of repairs (£)") + xlab("Condition grade") +
  govstyle::theme_gov()

Line plot

The blockbuster output is flexible and amenable to data manipulation and or transformation prior to plotting. The user is only limited by their imagination.

Below we present a standard output that is seen in many reports of the School Estate. Here we plot the total costs by aggregated by condition grade for each year and give it a slightly prettier finish. Note how the condition grade A do not contribute to the cost. However, the repair cost of E (set at D repair cost plus five percent) accelerates the total cost as more building components' unit_area transitions into grade E from D. It's important users are aware of this when interpretating these graphics as the estiamted cost of B, C and D should be considered more reliable as they are based on expert opinion and bespoke costs for each building component (whereas E grade is just 5% on D which may not be appropriate for every building component).

p4 <- my_block_10years %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, grade) %>%
  dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(x = timestep, y = sum_cost, group = grade, colour = grade)) +
  ggplot2::geom_line(size = 1.3) 


p4 + ylab("Total cost of repairs (£)") + xlab("Years after PDS") +
  govstyle::theme_gov() +
  theme(legend.position = "bottom") +
  # ggthemes::theme_hc() + ggthemes::scale_colour_hc() +
  ggtitle("Cost of repairs through time for our example block")

The same plot is produced using the generic method of plot() on the blockbuster_list object.

Digging down

Better to draw several approximate graphics saying something about the right question than to draw one precise graphic relating to the wrong question.

The blockbuster simulation is time expensive thus we do not want ot generate every possible outcome variable that a user might need. Instead we generate variables that are required for the simulation, for decision making regarding which blocks should be rebuilt or repaired. Instead you can derive variables yourself. Be imaginative and try plenty of variety in your plots.

Costing the rebuilding of a block

The repair costs had previously ignored decommisioned building components (grade E unit_area of a building component). To incorporate this into our analysis of when it is better to rebuild a block rather than to maintain it we create a new variable in the blockbuster_tibble called block_rebuild_cost. This multiplies the gifa, of the block that the building component in question is part of, by the argument (blockbuster input) rebuild_cost_rate (this has a default value). This does not fully capture the complexity of estimating the rebuild cost of a block and could be enhanced in future iterations of the blockbuster. At the moment it can accept a constant rebuild_cost_rate or varying rebuild_cost_rate (a vector the same length as the argument forecast_horizon that can adjust for inflation). This value is the same for all building components in the same block as it is a function of the block's gifa (it only makes sense considered at the block level).

two_blocks_4_years_inflation <- blockbuster(dplyr::filter(blockbuster_pds,
                                                              buildingid == 4382 |
                                                                buildingid == 4472),
                                                forecast_horizon = 4,
                                                rebuild_cost_rate = c(1274, 1281, 1289, 1296) 
                                                )

two_blocks_4_years_inflation %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(buildingid, timestep, block_rebuild_cost) %>%
  dplyr::tally()

Total costs by building element

p5 <- my_block_10years %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, element) %>%
  dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(x = element, y = sum_cost)) +
  ggplot2::geom_bar(stat = "identity") +
  ggplot2::facet_wrap(
    ~timestep
  )

p5 + ylab("Total cost of repairs (£)") + xlab("Building element") +
  govstyle::theme_gov() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  theme(plot.margin = unit(c(20,10,10,10),"mm"))

Subsetting the blockbuster list output for specified years of interest

Perhaps we are interested in just the last couple of years of the simulation? We can subset a single year of interest or dataframe from the list using the normal R syntax.

The following graph also highlights a flaw in the earlier generation of the deterioration model which just considered the unit_area of a building component and ignored the building component element, sub_element or const_type type in its calculations. The blockbuster model considers this when estimating repair costs. In the figure below we aggregate across building element the total unit_area and cost of repairs and plot using the extra dimensions afforded by point area and colour. In short this identifies an interaction term between element and grade for cost of repairs (the trajectory of gradient is different between building components as you go between condition grades).

p6 <- my_block_10years[11] %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, grade, element_shrt = as.factor(abbreviate(element, 14))) %>%
  dplyr::summarise(sum_unit_area = sum(unit_area, na.rm = TRUE),
                   sum_cost = sum(cost, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(x = grade, y = sum_cost, colour = element_shrt, size = sum_unit_area)) +
  ggplot2::geom_point(alpha = 0.7) +
  ggplot2::facet_wrap(
    ~ timestep
  )

p6 + ylab("Cost of repairs (£)") + xlab("Condition grade")  +
  theme_economist() + 
  scale_size_continuous(breaks = c(1000, 2000, 3000), range = c(3,10))

Rebuilding

Rebuilding is carried out by the rebuild function which is internal to blockbuster. The rebuilding of the blockbuster_tibble you pass as an argument is dependent on two other arguments; the rebuild_monies and rebuild_cost_rate. Both can be passed as a vector equal in length to the number of years you are simulating (forecast_horizon) or as a number. This allows you to control for inflation in the construction industry and variable rebuilding spending from year to year. Technically the blockbuster function makes use of the rebuild_cost_rate in estimating how much it would cost to rebuild a block (rebuild_cost_rate multiplied by block gifa), this is then passed into rebuild for a decision on how to spend the available rebuild_monies. All the blocks (unique based on buildingid) in the blockbuster_tibble are ranked in descending order of the ratio of the total cost of repairs to the cost of rebuilding the block (where a score higher than one means the block is more expensive to repair than it is to rebuild). The block with the highest internal statistic of cost_to_rebuild_ratio is inspected for rebuilding first. If enough monies are available it is rebuilt by converting all its building components back to new and recalculating the unit_area (using areafy2). The leftover money is then cycled through the list rebuilding as appropriate until the remaining money is less than the cheapest block to rebuild, at which point it is discarded (NOT passed to repair_monies).

Example

We inspect the effect of rebuilding on the cost of repairs of three buildings with a non-constant rebuilding investment profile and rebuild_cost_rate adjusted for 0% (year 2) then 1% inflation (year 3) (you can pass this as a vector rather than typing it manually of course). We inspect the effect of the rebuild on total repair costs using a faceted stacked bar chart for the different building elements in all the blocks (try increasing the rebuild_monies by one pound, what happens?).

#  We filter our PDS sample for just three blocks to keep things simple
x <- dplyr::filter(blockbuster::blockbuster_pds,  buildingid == 4382 | buildingid == 4472
                   | buildingid == 4487)
#  Rebuild spending profile
y <- blockbuster(x, forecast_horizon = 8, rebuild_monies = c(0, 5e6, 0, 0, 0, 0, 0, 0),  #  five million
                 rebuild_cost_rate = c(1274, 1274 + 0, 1274 + 12.74, 1286.74, 1286.74,
                                       1286.74, 1286.74, 1286.74))  #  £/m^2

p7 <- y %>%
  purrr::map_df(
    ~ .x ,
    .id = NULL
  ) %>%
  dplyr::group_by(timestep, element, grade) %>%
  dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>%
  ggplot2::ggplot(aes(x = element, y = sum_cost, fill = grade)) +
  ggplot2::geom_bar(stat = "identity") +
  ggplot2::facet_wrap(
    ~timestep
  ) + scale_fill_discrete(drop = FALSE) + scale_x_discrete(drop = FALSE)  #  gives E a colour

p7 + ylab("Total cost of repairs (£)") + xlab("Building element") +
  govstyle::theme_gov() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
  theme(plot.margin = unit(c(20,10,10,10),"mm")) +
  theme(legend.position = "top")

This also provides a sense check. The rebuild happens during timestep 2 (because rebuild_monies = c(0, 5e6, 0)), all building components are rebuilt to grade = N thus have zero cost. During the next timestep there will be some deterioration to grade = A, however all building components of excellent condition do not contribute to the repair costs of a block, hence no cost of repairs in timestep 3. We start to see condition grade = B & D at timestep 5 as expected given our working knowledge of the model.

Repairs

Repairs are carried out by the repair function which is internal to blockbuster. The repairing of the blockbuster_tibble you pass as an argument is dependent on the other argument; repair_monies. The latter can be passed as a vector equal in length to the number of years you are simulating (forecast_horizon) or as a number. This allows you to control for inflation in the construction industry and variable repairs spending from year to year. Technically the blockbuster function makes use of the repair_monies by dividing it up evenly between each block in the input blockbuster_tibble.

All the blocks (unique based on buildingid) in the blockbuster_tibble are ranked in descending order by buildingid, grade (from E to N) and cost (expensive to cheapest repairs within grade). Each block is repaired with repair_monies divided by the number of blocks in the blockbuster_tibble. Money is spent on worse condition grade (ignoring E as it cannot be repaired [ignoring N and A also]) and most expensive repairs first until the remaining money is less than the cheapest building component to repair, at which point it is discarded (NOT passed to rebuild_monies; assumed spent by the school on other things) (see the help function what_needs_repair_within_block() for details on the decision making).

Example - compare different spending profiles on condition of blocks

The purpose of this package is to inform policy decision making to determine which out of competing spending profiles will better meet the assessment criteria. Here we aim to minimise the total condition cost of repairs in the 3 school blocks. We develop two spending profiles using the same amount of money over five years (ignoring inflation for simplicity). Remember, the user is free to develop their own assessment criteria given the flexibility of the model output (e.g. you could use the unit_area of critical building components).

We could compare using some graphics already demonstrated in the vignette. Instead we prefer to reorganise the data to answer our specific question: which of our competing spending profiles produces the lowest estimated total repairs cost in year 5? We use arbitary blocks and total repair spend profile (this is not intended to be realistic but instead demonstrative). Inspect the code chunk, specifically the repair_monies argument for each spending profile, as it is where they differ.

Drip - slow and steady - consistent amount spent each year.
Flood - sooner rather than later - spending more sooner, less later.

Run the simulations

#  We filter our PDS sample for just three blocks to keep things simple
#  using identical blocks to our rebuild example above.

x <- dplyr::filter(blockbuster::blockbuster_pds,  buildingid == 4382 | buildingid == 4472
                   | buildingid == 4487)
#  Repair spending profile of different strategies
drip_repairs <- blockbuster(x, forecast_horizon = 5,  #  sum(rep(3e4, 5))
                            repair_monies = rep(3e4, 5))  #  £10k / block / year = £ 15e5

flood_repairs <- blockbuster(x, forecast_horizon = 5,
                             repair_monies = c(6e4, 6e4,
                                               1e4, 1e4, 1e4))  
#  (20k, 20k, 5k, 5k, 5k)* 3 blocks = £ 10.5e5

Sometimes it is convenient to save these objects and then use them again later. If we were to saveRDS then readRDS we could use the attributes() function to find out the spending profile inputs of the blockbuster_list.

attributes(drip_repairs)
attributes(flood_repairs)

Manipulate data for assessment statistic of interest (total cost of repairs)

This consistent repair spend profile can be visually compared by the user to a spending profile that invests more heavily sooner rather than later. However, we need to organise the data into a convenient form for comparison.

First we convert our list of dataframes into one dataframe (for each spending profile).

drip <- dplyr::bind_rows(drip_repairs)
flood <- dplyr::bind_rows(flood_repairs)

Then we group by timestep and sum the total cost of repairs across all the blocks. We pretend you might want to compare many spending profiles here and thus demonstrate an approach that can be scaled. The model comparison of spending profiles suggests that the flood approach is slightly better when judged by the year 5 total_cost of repairs statistic generated using simple data manipulation and aggregation by the analyst. Not much difference at year 1 suggests that the "spend early" strategy may have over spent at year 1. This could be explored by further investigation or even automation to optimise the problem, given enough time or computation power.

dripping <- drip %>%  #  extracat::visna(drip, sort = "b")
  group_by(timestep) %>%  #  drip_missing <- drip %>% filter(!complete.cases(.))
  #  table(drip_missing$timestep)  #  the originals rows are not being given this variable
  #  maybe this is OK and makes sense, block can't be rebuilt at timestep 0, therefore NA cost
  summarise(total_cost = sum(cost))

flooding <- flood %>%
  group_by(timestep) %>%
  summarise(total_cost = sum(cost))

#  we can join them and create an ID variable
spending_profile_ts <- dplyr::bind_rows("drip" = dripping, "flood" = flooding,
                                        .id = "spend_profiles")

spending_profile_ts

Visualise

p8 <- ggplot(spending_profile_ts, aes(timestep,
                                total_cost,
                                colour = spend_profiles,
                                group = spend_profiles)) +
  geom_line(size = 2)

  p8 <- p8 + ylab("Total cost of repairs (£)") + xlab("Years") +
  govstyle::theme_gov() + 
  theme(plot.margin = unit(c(20,10,10,10),"mm")) +
  theme(legend.position = "top")

print(p8)

The simulations suggest that on average investing sooner rather than later with the "flood" strategy is slightly superior at reducing total repair costs in five years time.

Rebuilding versus repairing - how to best invest?

Now we have both the rebuild and repair functions we can assess spending profiles that prioritise rebuilding over repairing or vice versa. Again to compare meaningfully we need an assessment criteria to compare our different approaches.

Let's use all our money to rebuild one of the blocks or use all the money to repair each block over the five years instead.

Rebuild one block at year one then do nothing
Repair each block over the five years using block rebuild investment

We use the same arbitary grouping of blocks to demonstrate. We do our analysis in one code chunk this now that you are familiar.

#  Repair spending profile of different strategies

#  rebuild most expensive block in first year
rebuild_once <- blockbuster(x, forecast_horizon = 5,  
                            rebuild_monies = c(2440984, rep(0, 4))) %>%
  dplyr::bind_rows() %>%
  group_by(timestep) %>%
  summarise(total_cost = sum(cost, na.rm = TRUE))

#  repair funding using same funds spread over five years
drip_fix <- blockbuster(x, forecast_horizon = 5,
                             repair_monies = rep(2440984/5, 5)) %>% 
  dplyr::bind_rows() %>%
  group_by(timestep) %>%
  summarise(total_cost = sum(cost, na.rm = TRUE))

#  we can join them and create an ID variable
spending_profile_ts <- dplyr::bind_rows("Repairs only" = drip_fix,
                                        "Rebuild only" = rebuild_once,
                                        .id = "spend_profiles")

p9 <- ggplot(spending_profile_ts, aes(timestep,
                                total_cost,
                                colour = spend_profiles,
                                group = spend_profiles)) + geom_line(size = 2)

  p9 <- p9 + ylab("Total cost of repairs (£)") + xlab("Years") +
  govstyle::theme_gov() + 
  theme(plot.margin = unit(c(20,10,10,10),"mm")) +
  theme(legend.position = "top")

print(p9)

The rebuild takes care of the one rebuilt block but not the other two! As a result the costs of repair begin to accelerate without any money invested in repairing the blocks. The repair focused spending profile quickly brings the condition cost down to zero after two years. The repair profile is probably overspending in the subsequent years and funding could be reduced further with some investigation.

CAVEAT: this is just a demonstration and should not be considered a general rule. The best strategy will depend on your assessment criteria and the blocks of interest.

Summary

This modelling approach has the potential to make use of all the data.
It uses a robust statistical modelling approach.
The input structure is preserved in the output tibbles.
The model can be iteratively improved as more data is collected.
There is flexibility in the transition matrices that are used that should be estimated empirically in the future, data collection permitting.
The writing of a package allows us to enshrine business knowledge into a corpus of code that is inseparable from the documentation of that code.
Quality assurance is built in through the use of continuous integration and the test driven development.

sessionInfo()

DFE-Capital/blockbuster documentation built on May 26, 2019, 7:23 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

DFE-Capital/blockbuster Deterioration model of the School Estate