I'm trying to free your mind, Neo. But I can only show you the door. You're the one that has to walk through it. (via)

A BeginneR's Guide to blockbuster

This vignette focuses on providing a user with all the skills necessary to provide the blockbuster function with the desired inputs, thus asking it the right question. This will produce an output of the class blockbuster_list. The user will be shown how to handle this object for basic business questions.

The tidyverse is a bunch of useful packages all rolled into one. We rely on it throughout this vignette as it provides a more beginner friendly approach to R. We load it now.

require(tidyverse)  #  if not already loaded, load it
require(blockbuster)  #  we need this for obvious reasons

blockbuster basics

How to get started? blockbuster() is the main function from the blockbuster package that we will use to carry out the deterioration, rebuild and repair of the school blocks of interest. We can inspect the help using a ? prefix.

? blockbuster

Remember, our business question above? How do we solve this using this function? We can get the formal arugments of the function using the formals function. This tells us that there are default values for three of the five arguments but two need us to set the values.

formals(blockbuster)

If we don't provide these two arguments, what happens?

blockbuster(blockbuster_pds[1, ], )
blockbuster(, 1)

Given our question, the second argument forecast_horizon should be obvious. Using the help what format should this argument take?

blockbuster(, forecast_horizon = 1)

Now we need to provide the first argument which is the blockbuster_tibble. What form might this take? Look at the example in the help.

blockbuster_pds, the data

Our blockbuster_tibble can either be all of the blockbuster_pds or a subset thereof (how many rows and columns are there in this dataset?).

?blockbuster_pds

Using R it's important to expand your basic vocabulary. A useful function is str. What does it do and what does it reveal about the blockbuster_pds object.

str(blockbuster_pds)

Using the above or any other functions;
What is the class of the blockbuster_pds object?
What do you think each variable means? (use ?blockbuster_pds if you get stuck)
* What does each row represent?

filtering the data to answer the question

Given our findings above we can run a simple simulation using blockbuster. Be careful in how you subset as simulating all the rows takes a long time! What have I done below?

one_yr_later_a <- blockbuster(blockbuster_tibble = blockbuster_pds[1, ],
                            forecast_horizon = 1)

one_yr_later_b <- blockbuster(blockbuster_tibble = blockbuster_pds[1:2, ],
                            forecast_horizon = 1)

one_yr_later_c <- blockbuster(blockbuster_tibble = blockbuster_pds[c(1, 3, 3, 7), ],
                            forecast_horizon = 1)

Standard subsetting has its place but the dplyr package provides a readily interpretable and beginner friendly system of verbs to manipulate dataframes. We use it here to filter for rows associated with our buildingid of interest.

Using the dplyr vignette, filter the blockbuster_pds dataframe object (I call it a blockbuster_tibble, as it's a tibble with the variables necessary for use in the blockbuster function) for the correct blocks. What do you think the pipe (|) operator does? Using the dplyr help how might I filter for just these blocks' roofs?

dplyr::filter(blockbuster::blockbuster_pds,  (buildingid == 4382 |
                buildingid == 4472 |
                buildingid == 4487) #  & element == "Roofs"
              )

Practice

Caveat

Remember blockbuster_pds represents a ten percent sample of the entire Property Data Survey. Details of the SQL query used to produce the sample are found within the package at blockbuster/inst/sql/blockbuster_ten_pc_sample.sql.

Running a simulation

Now we are familiar with subsetting the blockbuster_pds dataframe we can go back to answering the original business question.

It's good practice to manipulate your data and then check it looks rights. Here we assign it to object x. Assigning stuff is useful as it saves time if you are likely to use the object again (particularly true for objects that take a long time to create).

  #  We filter our PDS sample for just three blocks to keep things simple           
  x <- dplyr::filter(blockbuster::blockbuster_pds,
                     buildingid == 4382 |
                       buildingid == 4472 | buildingid == 4487)         

Now we can run our simulation using x as our blockbuster_tibble argument and 5 for our forecast_horizon argument for the blockbuster function. Note that the arguments don't need to be named if they are in the correct order as shown using formals earlier. See Hadley's chapter on functions for more detail.

y <- blockbuster(blockbuster_tibble = x, forecast_horizon = 5)
#  z <- blockbuster(x, 5) 
#  identical(y, z)  #  TRUE
class(y)

Using generic methods on output

The blockbuster functions produces a custom output object of class "blockbuster_list" (deriviative to "list"). This informs the generic methods (for which I have coded a method) to handle this object in a certain way. This helps users to quickly visualise and summarise the output without needing much coding expertise.

The underlying code and details can be found in the other vignette. You can just copy and paste from there replacing the output with your blockbuster ouput of interest.

Line plot

plot(y)

Boxplot

boxplot(y)

Summary

summary(y)

Handling a list

The blockbuster function produces a list of dataframes with timestep == 0 at index 1 and the last year in the simulation at index forecast_horizon + 1.

#  y[[1]]  #  timestep 0

final_year <- y[[6]]
final_year

You can manipulate the dataframes within the list as you please. For example, calculating Tukey's five number summary of cost for the final_year.

fivenum(final_year$cost)

Practice

The dataframes are stored in a list for code efficiency and also because the dataframes after timestep zero have additional variables preventing straightforward binding. This isn't a problem for an experienced R user. For more sophisticated methods to handle a list of dataframes see the blockbuster_vignette.

Additional inputs

Thus far we have simulated just the counterfactual. This package is interested in comparing different spending profiles on the rebuilding and repairing of the School Estate and its impact on those school buildings condition.

Use ?blockbuster and the formals function to remind yourself of the other arguments we can provide the blockbuster function with.

Notice how the help says the other arguments must be "a numeric vector of length equal to the forecast_horizon or one." What does this mean? What are the default values (the default is adopted if left blank)?

More than one way to skin a cat

If you provide an input of length one then it is converted into a vector using rep(input, forecast_horizon) (type ?rep()). Alternatively you can provide a vector of length equal to the forecast_horizon using any of the standard methods that a Google search will reveal.

Here we demonstrate three different ways to provide the other arguments to blockbuster. The following code errors, use the error information to fix the input and re-run it.

z <- blockbuster(blockbuster_tibble = x, forecast_horizon = 5,
            rebuild_monies = c(0, 9e4, 0, 0),
            repair_monies = 5e3,
            rebuild_cost_rate = c(rep(1280, 4), 1285))

Saving for later

We can save our blockbuster_list object using saveRDS and if we want to look at the data in the future we can load it into R using readRDS (described here). But does this save all the arguments we used to create the blockbuster_list?

Try using the attributes() function to find out the inputs for z. How would you find out what the forecast_horizon was?

attributes(z)

Practice

Upon fixing:

Case study

Made-up numbers and purely hypothetical.

Doncaster (LA code 371) are interested in improving their spending on rebuilding and repairs on their schools over the next three year period (imagine question was posed as data was collected). They are particularly interested in keeping all building components associated with their roofs in good working condition.

Currently they don't rebuild and just spend £1,000,000 a year on reparations. They want to know if they would be better off with a different spending profile to reduce the overall cost by using that money to rebuild some of the worst off schools. As part of the projections they also want to know the total roof repairs costs for different approaches. You must also consider how construction inflation will affect the rebuild_cost_rate which is predicted to rise one percent each year.



DFE-Capital/blockbuster documentation built on May 26, 2019, 7:23 a.m.