Can I have a P please, Bob?. (via)
A useful package of functions and sample data to forecast the future condition of School blocks (or buildings) in the UK.
Markov chains represent a class of stochastic processes of great interest for
the wide spectrum of practical applications. In particular, discrete time
Markov chains (DTMC) permit modelling the transition probabilities
between discrete states by the aid of matrices. The states are represented by
the grade
factor which is the surveyed condition of a building element,
determined by a quantity surveyor through grades (states); A, B, C, D, E. The
transition probabilities are 6 by 6 matrices with the average deterioration rate
per timestep between states provided by expert building consultant opinion (see
documentation for full details).
This package is useful for decision makers as it provides a modelling approach
for predicting the average condition of an individual building component
(elementid
) through time. This modelling approach can be scaled through a
decision making hierarchy, with modelling the deterioration at the level of a
School block (buildingid
), for each site (siteid
), the School (which may be
made of many blocks, URN
), at the Local Authority (LA)
level (LAN
) and at the National level given suitable input data (filter
desired rows using the dplyr
package is recommended).
The Blockbuster
function can be used on data containing one or more components
in one or more school blocks. Individual component level information is stored as a data.frame with strict conditions on columns
while block-level information is held in another
data.frame with strictly defined columns. It produces snapshots of the
component and block conditions at each timestep. The model can be applied to
individual components, buildings, collections of buildings, or the entire
estate. This flexibility allows decision makers to ask many and varied questions
of the output data.
Check you have these packages installed and then load them into memory using
library()
or require()
.
library(blockbuster2)
To run a 10 year simulation of the provided simulated dataset, with no rebuilding and £1bn of repair funds per year, the command is
Blockbuster(simulated_elements, forecast.horizon = 10, rebuild.money = 0, repair.money = 1000000000)
Passing numeric vectors to the rebuild.money
and repair.money
arguments allows control of the amount of money available for rebuilding and repairing each year respectively, and forecast.horizon
allows you to specify the number of years to simulate.
The model outputs a list containing a summary of the area and backlog at each grade by component, by building along with the raw state of all individual components at the end of the simulation. It is possible to save to file the states of individual components in interim years by passing save = TRUE
as an
argument.
Remember to use the R help for all of the functions and data mentioned in the
vignette for extra detail (e.g. try typing ?Blockbuster
into the console).
The first argument for the Blockbuster function is a data.frame containing the data for the elements that you wish to simulate. The package contains some simulated data in the correct format under the name simulated_elements
.
data("simulated_elements", package = "blockbuster2") dplyr::glimpse(simulated_elements)
For use with blockbuster, the data.frame must contain the following columns:
Column Description -------------- ----------------------------------------------------------- buildingid A number that uniquely identifies the building a component is part of
A The probability the element is at grade A
B The probability the element is at grade B
C The probability the element is at grade C
D The probability the element is at grade D
E The probability the element is at grade E
ab The probability that the component will deteriorate from grade A to grade B in one year
bc The probability that the component will deteriorate from grade B to grade C in one year
cd The probability that the component will deteriorate from grade C to grade D in one year
de The probability that the component will deteriorate from grade D to grade E in one year
unit_area A quantification of the size of the component - this is usually an area but could be a count or length
B.repair.cost The cost to repair one unit_area of this component at grade B
C.repair.cost The cost to repair one unit_area of this component at grade C
D.repair.cost The cost to repair one unit_area of this component at grade D
E.repair.cost The cost to repair one unit_area of this component at grade E
gifa The Gross Internal Floor Area of the building a component is part of. This is used to generate the cost of rebuilding the entire building
The simulated data provided with the package also contains the following column that is not used internally but could be useful when analyzing outputs:
Column Description
The package is designed to be used with PDS data which also includes the following columns that can be useful when looking at outputs:
Column Description
URN A number that uniquely identifies a school
siteid A number that uniquely identifies a school site (schools may have more than one site)
The package also includes a block-level summary of the simulated data as simulated_blocks
data("simulated_blocks", package = "blockbuster2") dplyr::glimpse(simulated_blocks)
This is the same format as the block-level summary outputs of the model.
The model output is a list with four entries:
element summary
A tidy format data frame of area and backlog by component type, grade and year.building summary
A tidy format data frame of area and backlog by building, grade and year.element
A data frame containing the final state of all simulated element.block
A data frame containing the final state of all simulated buildings.So - presuming the output has been saved as an object called output
- the command output$"element summary"
will return the element-level summary, output$"building summary"
will return the block-level summary and so on.
The element-level data frame in the output will contain the same columns as the data frame passed to the element.data
argument, so additional data irrelevant to the model (such as siteid
) will be passed through unchanged.
The block-level data frame will contain the columns buildingid
, block.rebuild.cost
, *.block.repair.cost
and ratio
. If a data frame is passed to the block.data
argument then other columns will be passed through unchanged.
If you want to access the raw element-level data for interim years, it is possible to set the Blockbuster function to save interim states to file. Setting save = TRUE
will turn this feature on and the filelabel
and path
arguments allow you to set the path and labels for the saved files. Due to memory restrictions, separate files are saved for each year. Files can be loaded individually into your session using something akin to load("./output/blockbuster_element_2.rds")
depending on where you set the save
, path
and filelabel
arguments.
Here we simulate the deterioration of one building through one year.
# output demonstration # construct single block input my_output <- Blockbuster(simulated_elements, save = FALSE) dplyr::glimpse(my_output) # this function makes the output nicer to read
We start by comparing the initial overall backlog and the backlog after one year.
sum(my_output$"element summary"$backlog[my_output$"element summary"$year == 0]) sum(my_output$"element summary"$backlog[my_output$"element summary"$year == 1])
We recommend using the pipe operator %>%
for coding, as used in dplyr
as it results in code that is much simpler and easier to read. For example, the above code can be replaced with
my_output$"element summary" %>% group_by(year) %>% summarise(backlog = sum(backlog))
We suggest using the same style code for visualisations. The following uses the ggplot2
to plot the change in expected cost of repairing all components at the associated grade.
library(ggplot2) my_output$"element summary" %>% group_by(year, grade) %>% summarise(backlog = sum(backlog)) %>% ggplot() + geom_line(aes(x = year, y = backlog, colour = grade))
Although the output groups summary statistics by element or building, any analysis related to component type or buildings will be highly suspect. This is because the Blockbuster model does not differentiate between different buildings, schools or components when making repair and rebuild decisions at present. In reality the decisions on what funds are spent on what projects is highly dependent on school-level factors and the deterioration and repair of components is also correlated. We therefore strongly recommend aggregating outputs across the complete simulated estate.
In the example above we passed the blockbuster
function only one argument and
it still worked. That is because if you do not explicitly state some of the
arguments it will assume default values for these. We can look at the defaults
using the formals
function.
formals(Blockbuster)
The default behaviour of the model is to simulate deterioration over one year
without any rebuilding or repairing and no inflation. The argument
element.data
does not have default behaviour thus we must pass it to
the blockbuster function.
For more details try typing ?blockbuster
into the console.
The deterioration rates and component repair costs are passed to blockbuster within the input element.data
data.frame, allowing for individually tailored deterioration rates for all components if desired. Let's inspect some rates and costs in simulated_elements
.
simulated_elements[1:2, ] %>% select(elementid, ab, bc, cd, de, B.repair.cost, C.repair.cost, D.repair.cost, E.repair.cost)
Every component type is identified by elementid
. The "two-letter" variables describe the transition rate or probability from the first letter condition grade to the second letter condition grade. For example,cd
describes the proportion of the unit_area
at grade "C" that will transition to grade D
through one timestep or year.
The *.repair.cost
columns contain the unit cost of repairing components at the associated grades. The individual expected component repair costs - stored in the *.repair.total
columns - are computed by multiplying these by unit_area
and the probability of being at that grade (columns A
, B
, C
, D
, and E
).
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.