Can I have a P please, Bob?. (via)
A useful package of functions and sample data to forecast the future condition of School blocks (or buildings) in the UK.
Markov chains represent a class of stochastic processes of great interest for the wide spectrum of practical applications. In particular, discrete time Markov chains (DTMC) permit to model the transition probabilities between discrete states by the aid of matrices. The states are represented by the grade
factor which is the surveyed condition of a building element, determined by a quantity surveyor through grades (states); N, A, B, C, D, E. The transition probabilities are 6 by 6 matrices with the average deterioration rate per timestep between states provided by expert building consultant opinion (see documentation for full details).
This package is useful for decision makers as it provides a modelling approach for predicting the average condition of an individual building component (elementid
) through time. This modelling approach can be scaled through a decision making hierarchy, with modelling the deterioration at the level of a School block (buildingid
), for each site (siteid
), the School (which may be made of many blocks, urn
; omitted for anonymity), at the Local Authority (LA) level (lano
) and at the National level given suitable input data (filter desired rows using the dplyr
package is recommended).
The blockbuster
function can be used on more than one row of what I call a blockbuster_tibble
through more than one year, storing all the data of intermediary years. This flexibility allows decision makers to ask many and varied questions of the output data.
Check you have these packages installed and then load them into memory using library()
require()
.
# You could load tidyverse or you can load the individual packages, one-at-a-time. library(dplyr) # need pipe %>% library(ggplot2) # for visualisation library(markovchain) # Recommended for markovchain S4 object handling (advanced). library(blockbuster) # Required. library(ggthemes) # Make pretty output with little effort.
Remember to use the R help for all of the functions and data mentioned in the vignette for extra detail (e.g. try typing ?blockbuster
and ?blockbuster_pds
into the console).
Provided with this package is ten percent sampled data with identifying features removed (the school Unique Reference Number). We suggest loading the tidyverse
package when handling this data. This object is a tibble called by blockbuster_pds
. For creating a subset of this data to make it manageable for simulation, use the dplyr package.
The areafy2
function was used on data queried from the Department for Education Education Funding Authority Property Data Survey SQL data tables (see 01_read_and_tidy_data.R for details). This provides estimates for the unit_area
(a quantification of how much of a building component there is) variable. This is crucial for the DTMC methodology and for quantifying the costs of repairs.
We take a look at the transpose of the data table which describes the dimensions of our sample data and the type of variables found therein. Note the cost_sum
variable which is an aggregation of the initial costs estimated in the block or buildingid
that that row is associated with (this variable is set to zero for timestep
greater than zero).
dplyr::glimpse(blockbuster_pds)
The output of the blockbuster
function is a list of dataframes. The number of dataframes is equivalent to the number of years simulated plus one. The output is of the custom class blockbuster_list
(derivative to list) and contains all the variables from the input initial blockbuster_tibble
. Here we simulate the deterioration of ten building components through one year. We are interested in the variables of a simulated PDS produced by our blockbuster
model using default arguments.
# output demonstration dplyr::glimpse( # this function makes the output nicer to read blockbuster(blockbuster_tibble = blockbuster_pds[1:10, ], # ten rows for demo forecast_horizon = 1) # We deteriorate through one year [[2]]) # The initial data at timestep 0 is held in [[1]]
This output has foreign key variables with the suffix -id, that could be joined to other data from relevant SQL data tables (e.g. the user could find the urn
by joining on siteid
from an appropriate data table). The urn
is omitted to maintain the anonymity of each school's condition included in the sample.
Importantly the simulated PDS output has an additional variable called block_rebuild_cost
; that is the expected cost of rebuild for the whole buildingid
associated with that building component given the expected rebuild_cost_rate
argument passed to the blockbuster
function (we use the default here which is £ 1274 / metre squared).
The generic functions plot()
, boxplot()
and summary()
can be used on the ouput object of class blockbuster_list
. These are detailed later in the vignette.
In the example above we passed the blockbuster
function only two arguments and it still worked! That is because if you do not explicitly state some of the arguments it will assume default values for these. We can look at the defaults using the formals
function.
formals(blockbuster)
The arguments blockbuster_tibble
and forecast_horizon
do not have defaults thus we must pass them to the blockbuster function.
For more detail try typing ?blockbuster
into the console.
The deterioration rates can be called by using either the tibble blockbuster_det_data
or the markovchainList
object blockbuster_mc_list
. For example let's inspect an example of each.
blockbuster_det_data[2, ]
This identifies the building component through the element
, sub_elment
and const_type
. The "two-letter" variables describe the transition rate or probability from the first letter condition grade to the second letter condition grade. For example, for this building component (here concated_det
), the cd
describes the proportion of the unit_area
at grade == C
that will transition to grade == D
through one timestep or year (na
means from New to A (excellent condition) and is not to be confused with NA
).
blockbuster_mc_list@markovchains[[2]] # isS4(blockbuster_mc_list@markovchains[[1]]) # TRUE
blockbuster_mc_list
is a S4 object so the data needs to be accessed through its slots. They hold the same data in different ways. The S4 method may be useful in the future when incorporating some functions from the markovchain
package.
We consider the first row of this sampled data which is a roof of excellent grade
A condition with a unit_area
of 308.4.
x <- blockbuster_pds[1, ] dplyr::select(x, element, sub_element, const_type, grade, unit_area)
To determine which transition matrix is associated with each row we use the det_what_tm()
function. This produces an error if the markovchain
object does not exist. This used to work by concatenating the element
, sub_elment
and const_type
and matching with the concated_det
we saw earlier. However, to speed up the efficiency of the code, the concated
variable was also cached before the for loop in the blockbuster
function. This required a modified input for this function prior to look up of the associated deterioration rates. This has since been replaced by using a lookup table based on elementid
. This is faster as
# old method matched on strings of concated mc1 <- det_what_tm(dplyr::mutate_(x, concated = ~gsub(pattern = "[^[:alnum:] ]", replacement = "", paste(element, sub_element, const_type, sep = "")))) mc1 # new method tests for equality using elementid mc2 <- det_what_tm(x) identical(mc1, mc2) # they are the same, lookup no longer requires concated
This is a markovchain
object as described in the documentation of that package. It is a S4 object and we can access different components using slots and [
sub-setting.
mc1@transitionMatrix[, ] # [i, j] i for row, j column
Typically the user will be interested in a subset of the data. Consult the dplyr
vignette for handling and sub-setting your data based on desired filter conditions.
Here we demonstrate the deterioration of a single building component through time. We chose a critical building feature of one particular block or building we are interested in. CAVEAT: this modelling approach should be used at national level; this example is illustrative.
my_wall <- dplyr::filter(blockbuster_pds, buildingid == 4382, sub_element == "Walls - Structure", const_type == "Brick / block") my_wall_next_year <- blockbuster(my_wall, 1) my_wall_next_year my_wall_20year <- blockbuster(my_wall, 20)
This produces a list of tibbles. Each tibble provides the condition of the input tibble at that timestep. Notice how the number of rows has increased. That's because we have a row for each grade
state and associated unit_area
. The first tibble in the list is for the input data or timestep == 0
. It is prudent to check the timestep by inspecting it.
my_wall_next_year[[2]] head(my_wall_next_year[[2]]$timestep, 1) # 1 timestep later than initial
The same principle applies as above where we pass the blockbuster
function an input tibble object and the number of years into the future (relative to when the data was collected) we want to model to.
y <- blockbuster_pds[1:20, ] # select all rows associated with decision level of interest, perhaps one block or one LA? my_block_10years <- blockbuster(y, 10) dplyr::select(my_block_10years[[11]], buildingid, elementid, grade, unit_area) %>% dplyr::arrange(elementid, grade)
I suggest you spend time thinking about the precise question you are asking rather than using the whole blockbuster_pds
tibble as input due to non-trivial computation time. As a rule of thumb the amount of rows you start with will be multiplied by five (as each elementid
deteriorates to other states), with each of these tibbles being multiplied by the forecast_horizon
. That's quite a lot of data and may tax your computer's RAM (although the unit_area
, timestep
and cost
will be the only variables changing through time).
The tibble format is compatible with the tidyverse
particularly reshape2
-ing and plotting using ggplot2
. The purrr
package and the excellent map_df()
function makes it easy to plot data from multiple dataframes across a list, as is the case with our blockbuster
output.
We can investigate the rate at which our critical elements are deteriorating or even what proportion of the unit_area
is decommisioned. Given the flexibility of the model output we can ask anything of the data.
Here we are interested in tracking the deterioration of critical building components in our block of interest. We recycle the my_wall_20year
object for this question. The wall starts as all condition grade B then gradually deteriorates.
p4 <- my_wall_20year %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, grade) %>% dplyr::summarise(sum_area = sum(unit_area, na.rm = TRUE)) %>% ggplot2::ggplot(aes(y = sum_area, x = grade)) + ggplot2::geom_bar(stat = "identity") + ggplot2::facet_wrap( ~timestep ) p4 + ylab("Estimated area (m^2)") + xlab("Condition grade") + govstyle::theme_gov()
Typically we may be more interested in critical, structural building-features such as the roof or walls and whether the unit_area
exceeds a critical threshold determined by expert opinion. For example, is our wall's total unit_area
after 10 years comprised of over 20% grade D and E? We plot then inspect the numbers.
my_wall_10 <- blockbuster(my_wall, 10) # We are interested in the 10 th (+ 1) timestep df <- my_wall_10[[11]] # Use gov style for plot, install these if you don't have them # devtools::install_github('mangothecat/visualTest') # required for govstyle # devtools::install_github("UKGov-Data-Science/govstyle") p <- ggplot2::ggplot(data = df, aes(x = grade, y = unit_area)) + ggplot2::geom_bar(stat = "identity", fill = govstyle::gov_cols['turquoise']) + xlab("Condition") + ylab(expression(paste( "Unit area (",m^2, ")", sep = ""))) + govstyle::theme_gov() p
We use dplyr
to elucidate and determine that our wall is below our arbitrary 20% threshold, if we wanted to scale this we could create a new variable using mutate
which is based on a logical test. The only limit is your imagination (and coding skills).
total_wall_area <- sum(df$unit_area) dplyr::mutate(df, prop_area = unit_area / total_wall_area) %>% dplyr::select(grade, unit_area, prop_area)
The costing part of the blockbuster
function is built using an internal function called blockcoster_lookup
. This takes the building component elementid
and condition grade
information and looks this up in the blockbuster_pds_repair_costs
tibble, thus finding the appropriate repair costs constant to multiply the unit_area
of that building component by.
Originally it returned NA
for grade E. This was due to lack of data on repairing a building component from grade E back to A, furthermore as grade E is meant to encapsulate a decommisioned status for now it was deemed more appropriate to only be able to get E back to grade N through rebuilding. However, user feedback suggested it more useful to attempt to approximate the grade E cost using a crude estimate for now. This was achieved by modifying the code in 04_read_tidy_costs_data.R
. Costs for E are now simply the costs for D plus five percent. This approach was recommended by Adam Bray until we get empirical estimates.
Costs are calculated for each row. The cost_sum
variable is set to zero as this can be recalculated later if required, as it requires an aggregation by buildingid
step (this is somewhat technical, see the blockbuster
code by the costing step, where aggregation occurs).
We continue with the examples from above by considering the aggregated cost of repairs through time. Let us plot the distribution of costs by condition grade using a Tukey box plot through each year (R and ggplot make it easy to look at the data in different ways). Repair costs from grade A should always be zero as they are already in excellent condition and do not need any work. This plot shows how the median costs for each grade steadily increase year on year with some building components in particularly deteriorating quickly and in a costly fashion (the outliers indicated by the black circles). Note: the comments in the code are there to teach you the basis of this coding approach using the map_df
function from the purrr
package.
The red "dots" at the end of the boxplot represent outliers. These observations contribute to the total cost of the block much more than others. There are a number of different rules for determining if a point is an outlier, but the method that R and ggplot use is the "1.5 rule". If a data point is:
less than Q1 - 1.5 x IQR
greater than Q3 + 1.5 x IQR
then that point is classed as an "outlier". The line goes to the first data point before the "1.5" cut-off. Note: the inter-quartile range; IQR = Q3 - Q1
.
We have added a scattered jitterplot on top using geom_point()
so that you can see how all the observations contribute to the observed cost
distribution.
# http://www.machinegurning.com/rstats/map_df/ # # cool_cars <- list(mtcars, mtcars, mtcars) %>% # map_df( # ~ .x , # .id = "year" # ) %>% # group_by("year") %>% # ggplot(aes(x = cyl, y = disp, group = cyl)) + # geom_boxplot() + # facet_wrap( # ~year # ) # cool_cars p2 <- my_block_10years %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by("timestep") %>% ggplot2::ggplot(aes(x = grade, y = cost)) + ggplot2::geom_boxplot(outlier.colour = "red") + # Set to NA to avoid double plot ggplot2::facet_wrap( ~timestep ) p2 + ylab("Cost of repairs (£)") + xlab("Condition grade") + govstyle::theme_gov() + geom_point(position = position_jitter(width = 0.05), alpha = 0.15)
The same plot is produced using the generic method of boxplot()
.
We take our hypothetical building and plot the total costs year on year by condition grade. Notice how the costs to repair the hypothetical block increases, accelerating as more and more unit_area
gets into grade E.
p3 <- my_block_10years %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, grade) %>% dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>% ggplot2::ggplot(aes(x = grade, y = sum_cost)) + ggplot2::geom_bar(stat = "identity") + ggplot2::facet_wrap( ~timestep ) p3 + ylab("Total cost of repairs (£)") + xlab("Condition grade") + govstyle::theme_gov()
The blockbuster output is flexible and amenable to data manipulation and or transformation prior to plotting. The user is only limited by their imagination.
Below we present a standard output that is seen in many reports of the School Estate. Here we plot the total costs by aggregated by condition grade for each year and give it a slightly prettier finish. Note how the condition grade
A do not contribute to the cost. However, the repair cost of E (set at D repair cost plus five percent) accelerates the total cost as more building components' unit_area
transitions into grade E from D. It's important users are aware of this when interpretating these graphics as the estiamted cost of B, C and D should be considered more reliable as they are based on expert opinion and bespoke costs for each building component (whereas E grade is just 5% on D which may not be appropriate for every building component).
p4 <- my_block_10years %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, grade) %>% dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>% ggplot2::ggplot(aes(x = timestep, y = sum_cost, group = grade, colour = grade)) + ggplot2::geom_line(size = 1.3) p4 + ylab("Total cost of repairs (£)") + xlab("Years after PDS") + govstyle::theme_gov() + theme(legend.position = "bottom") + # ggthemes::theme_hc() + ggthemes::scale_colour_hc() + ggtitle("Cost of repairs through time for our example block")
The same plot is produced using the generic method of plot()
on the blockbuster_list
object.
Better to draw several approximate graphics saying something about the right question than to draw one precise graphic relating to the wrong question.
The blockbuster
simulation is time expensive thus we do not want ot generate every possible outcome variable that a user might need. Instead we generate variables that are required for the simulation, for decision making regarding which blocks should be rebuilt or repaired. Instead you can derive variables yourself. Be imaginative and try plenty of variety in your plots.
The repair costs had previously ignored decommisioned building components (grade E unit_area
of a building component). To incorporate this into our analysis of when it is better to rebuild a block rather than to maintain it we create a new variable in the blockbuster_tibble
called block_rebuild_cost
. This multiplies the gifa
, of the block that the building component in question is part of, by the argument (blockbuster input) rebuild_cost_rate
(this has a default value). This does not fully capture the complexity of estimating the rebuild cost of a block and could be enhanced in future iterations of the blockbuster
. At the moment it can accept a constant rebuild_cost_rate
or varying rebuild_cost_rate
(a vector the same length as the argument forecast_horizon
that can adjust for inflation). This value is the same for all building components in the same block as it is a function of the block's gifa
(it only makes sense considered at the block level).
two_blocks_4_years_inflation <- blockbuster(dplyr::filter(blockbuster_pds, buildingid == 4382 | buildingid == 4472), forecast_horizon = 4, rebuild_cost_rate = c(1274, 1281, 1289, 1296) ) two_blocks_4_years_inflation %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(buildingid, timestep, block_rebuild_cost) %>% dplyr::tally()
p5 <- my_block_10years %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, element) %>% dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>% ggplot2::ggplot(aes(x = element, y = sum_cost)) + ggplot2::geom_bar(stat = "identity") + ggplot2::facet_wrap( ~timestep ) p5 + ylab("Total cost of repairs (£)") + xlab("Building element") + govstyle::theme_gov() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(plot.margin = unit(c(20,10,10,10),"mm"))
Perhaps we are interested in just the last couple of years of the simulation? We can subset a single year of interest or dataframe from the list using the normal R syntax.
The following graph also highlights a flaw in the earlier generation of the deterioration model which just considered the unit_area
of a building component and ignored the building component element
, sub_element
or const_type
type in its calculations. The blockbuster model considers this when estimating repair costs. In the figure below we aggregate across building element
the total unit_area
and cost
of repairs and plot using the extra dimensions afforded by point area and colour. In short this identifies an interaction term between element
and grade
for cost
of repairs (the trajectory of gradient is different between building components as you go between condition grades).
p6 <- my_block_10years[11] %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, grade, element_shrt = as.factor(abbreviate(element, 14))) %>% dplyr::summarise(sum_unit_area = sum(unit_area, na.rm = TRUE), sum_cost = sum(cost, na.rm = TRUE)) %>% ggplot2::ggplot(aes(x = grade, y = sum_cost, colour = element_shrt, size = sum_unit_area)) + ggplot2::geom_point(alpha = 0.7) + ggplot2::facet_wrap( ~ timestep ) p6 + ylab("Cost of repairs (£)") + xlab("Condition grade") + theme_economist() + scale_size_continuous(breaks = c(1000, 2000, 3000), range = c(3,10))
Rebuilding is carried out by the rebuild
function which is internal to blockbuster
. The rebuilding of the blockbuster_tibble
you pass as an argument is dependent on two other arguments; the rebuild_monies
and rebuild_cost_rate
. Both can be passed as a vector equal in length to the number of years you are simulating (forecast_horizon
) or as a number. This allows you to control for inflation in the construction industry and variable rebuilding spending from year to year. Technically the blockbuster
function makes use of the rebuild_cost_rate
in estimating how much it would cost to rebuild a block (rebuild_cost_rate
multiplied by block gifa
), this is then passed into rebuild
for a decision on how to spend the available rebuild_monies
. All the blocks (unique based on buildingid
) in the blockbuster_tibble
are ranked in descending order of the ratio of the total cost of repairs to the cost of rebuilding the block (where a score higher than one means the block is more expensive to repair than it is to rebuild). The block with the highest internal statistic of cost_to_rebuild_ratio
is inspected for rebuilding first. If enough monies are available it is rebuilt by converting all its building components back to new and recalculating the unit_area
(using areafy2
). The leftover money is then cycled through the list rebuilding as appropriate until the remaining money is less than the cheapest block to rebuild, at which point it is discarded (NOT passed to repair_monies
).
We inspect the effect of rebuilding on the cost of repairs of three buildings with a non-constant rebuilding investment profile and rebuild_cost_rate
adjusted for 0% (year 2) then 1% inflation (year 3) (you can pass this as a vector rather than typing it manually of course). We inspect the effect of the rebuild on total repair costs using a faceted stacked bar chart for the different building elements in all the blocks (try increasing the rebuild_monies
by one pound, what happens?).
# We filter our PDS sample for just three blocks to keep things simple x <- dplyr::filter(blockbuster::blockbuster_pds, buildingid == 4382 | buildingid == 4472 | buildingid == 4487) # Rebuild spending profile y <- blockbuster(x, forecast_horizon = 8, rebuild_monies = c(0, 5e6, 0, 0, 0, 0, 0, 0), # five million rebuild_cost_rate = c(1274, 1274 + 0, 1274 + 12.74, 1286.74, 1286.74, 1286.74, 1286.74, 1286.74)) # £/m^2 p7 <- y %>% purrr::map_df( ~ .x , .id = NULL ) %>% dplyr::group_by(timestep, element, grade) %>% dplyr::summarise(sum_cost = sum(cost, na.rm = TRUE)) %>% ggplot2::ggplot(aes(x = element, y = sum_cost, fill = grade)) + ggplot2::geom_bar(stat = "identity") + ggplot2::facet_wrap( ~timestep ) + scale_fill_discrete(drop = FALSE) + scale_x_discrete(drop = FALSE) # gives E a colour p7 + ylab("Total cost of repairs (£)") + xlab("Building element") + govstyle::theme_gov() + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + theme(plot.margin = unit(c(20,10,10,10),"mm")) + theme(legend.position = "top")
This also provides a sense check. The rebuild happens during timestep 2
(because rebuild_monies = c(0, 5e6, 0)
), all building components are rebuilt to grade = N
thus have zero cost
. During the next timestep there will be some deterioration to grade = A
, however all building components of excellent condition do not contribute to the repair costs of a block, hence no cost of repairs in timestep 3
. We start to see condition grade = B & D
at timestep 5
as expected given our working knowledge of the model.
Repairs are carried out by the repair
function which is internal to blockbuster
. The repairing of the blockbuster_tibble
you pass as an argument is dependent on the other argument; repair_monies
. The latter can be passed as a vector equal in length to the number of years you are simulating (forecast_horizon
) or as a number. This allows you to control for inflation in the construction industry and variable repairs spending from year to year. Technically the blockbuster
function makes use of the repair_monies
by dividing it up evenly between each block in the input blockbuster_tibble
.
All the blocks (unique based on buildingid
) in the blockbuster_tibble
are ranked in descending order by buildingid
, grade
(from E to N) and cost
(expensive to cheapest repairs within grade
). Each block is repaired with repair_monies
divided by the number of blocks in the blockbuster_tibble
. Money is spent on worse condition grade (ignoring E as it cannot be repaired [ignoring N and A also]) and most expensive repairs first until the remaining money is less than the cheapest building component to repair, at which point it is discarded (NOT passed to rebuild_monies
; assumed spent by the school on other things) (see the help function what_needs_repair_within_block()
for details on the decision making).
The purpose of this package is to inform policy decision making to determine which out of competing spending profiles will better meet the assessment criteria. Here we aim to minimise the total condition cost of repairs in the 3 school blocks. We develop two spending profiles using the same amount of money over five years (ignoring inflation for simplicity). Remember, the user is free to develop their own assessment criteria given the flexibility of the model output (e.g. you could use the unit_area
of critical building components).
We could compare using some graphics already demonstrated in the vignette. Instead we prefer to reorganise the data to answer our specific question: which of our competing spending profiles produces the lowest estimated total repairs cost in year 5? We use arbitary blocks and total repair spend profile (this is not intended to be realistic but instead demonstrative). Inspect the code chunk, specifically the repair_monies
argument for each spending profile, as it is where they differ.
# We filter our PDS sample for just three blocks to keep things simple # using identical blocks to our rebuild example above. x <- dplyr::filter(blockbuster::blockbuster_pds, buildingid == 4382 | buildingid == 4472 | buildingid == 4487) # Repair spending profile of different strategies drip_repairs <- blockbuster(x, forecast_horizon = 5, # sum(rep(3e4, 5)) repair_monies = rep(3e4, 5)) # £10k / block / year = £ 15e5 flood_repairs <- blockbuster(x, forecast_horizon = 5, repair_monies = c(6e4, 6e4, 1e4, 1e4, 1e4)) # (20k, 20k, 5k, 5k, 5k)* 3 blocks = £ 10.5e5
Sometimes it is convenient to save these objects and then use them again later. If we were to saveRDS
then readRDS
we could use the attributes()
function to find out the spending profile inputs of the blockbuster_list
.
attributes(drip_repairs) attributes(flood_repairs)
This consistent repair spend profile can be visually compared by the user to a spending profile that invests more heavily sooner rather than later. However, we need to organise the data into a convenient form for comparison.
First we convert our list of dataframes into one dataframe (for each spending profile).
drip <- dplyr::bind_rows(drip_repairs) flood <- dplyr::bind_rows(flood_repairs)
Then we group by timestep and sum the total cost of repairs across all the blocks. We pretend you might want to compare many spending profiles here and thus demonstrate an approach that can be scaled. The model comparison of spending profiles suggests that the flood approach is slightly better when judged by the year 5 total_cost
of repairs statistic generated using simple data manipulation and aggregation by the analyst. Not much difference at year 1 suggests that the "spend early" strategy may have over spent at year 1. This could be explored by further investigation or even automation to optimise the problem, given enough time or computation power.
dripping <- drip %>% # extracat::visna(drip, sort = "b") group_by(timestep) %>% # drip_missing <- drip %>% filter(!complete.cases(.)) # table(drip_missing$timestep) # the originals rows are not being given this variable # maybe this is OK and makes sense, block can't be rebuilt at timestep 0, therefore NA cost summarise(total_cost = sum(cost)) flooding <- flood %>% group_by(timestep) %>% summarise(total_cost = sum(cost)) # we can join them and create an ID variable spending_profile_ts <- dplyr::bind_rows("drip" = dripping, "flood" = flooding, .id = "spend_profiles") spending_profile_ts
p8 <- ggplot(spending_profile_ts, aes(timestep, total_cost, colour = spend_profiles, group = spend_profiles)) + geom_line(size = 2) p8 <- p8 + ylab("Total cost of repairs (£)") + xlab("Years") + govstyle::theme_gov() + theme(plot.margin = unit(c(20,10,10,10),"mm")) + theme(legend.position = "top")
print(p8)
The simulations suggest that on average investing sooner rather than later with the "flood" strategy is slightly superior at reducing total repair costs in five years time.
Now we have both the rebuild
and repair
functions we can assess spending profiles that prioritise rebuilding over repairing or vice versa. Again to compare meaningfully we need an assessment criteria to compare our different approaches.
Let's use all our money to rebuild one of the blocks or use all the money to repair each block over the five years instead.
We use the same arbitary grouping of blocks to demonstrate. We do our analysis in one code chunk this now that you are familiar.
# Repair spending profile of different strategies # rebuild most expensive block in first year rebuild_once <- blockbuster(x, forecast_horizon = 5, rebuild_monies = c(2440984, rep(0, 4))) %>% dplyr::bind_rows() %>% group_by(timestep) %>% summarise(total_cost = sum(cost, na.rm = TRUE)) # repair funding using same funds spread over five years drip_fix <- blockbuster(x, forecast_horizon = 5, repair_monies = rep(2440984/5, 5)) %>% dplyr::bind_rows() %>% group_by(timestep) %>% summarise(total_cost = sum(cost, na.rm = TRUE)) # we can join them and create an ID variable spending_profile_ts <- dplyr::bind_rows("Repairs only" = drip_fix, "Rebuild only" = rebuild_once, .id = "spend_profiles") p9 <- ggplot(spending_profile_ts, aes(timestep, total_cost, colour = spend_profiles, group = spend_profiles)) + geom_line(size = 2) p9 <- p9 + ylab("Total cost of repairs (£)") + xlab("Years") + govstyle::theme_gov() + theme(plot.margin = unit(c(20,10,10,10),"mm")) + theme(legend.position = "top")
print(p9)
The rebuild takes care of the one rebuilt block but not the other two! As a result the costs of repair begin to accelerate without any money invested in repairing the blocks. The repair focused spending profile quickly brings the condition cost down to zero after two years. The repair profile is probably overspending in the subsequent years and funding could be reduced further with some investigation.
CAVEAT: this is just a demonstration and should not be considered a general rule. The best strategy will depend on your assessment criteria and the blocks of interest.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.