In njtierney/copertura: A Set of Tools For Solving The Maximal Covering Location Problem

knitr::opts_chunk$set(fig.width = 7,
                      fig.height = 5)

maxcovr provides tools to make it easy to solve the "maximum covering location problem", a binary optimisation problem described by Church.

maxcovr aims to provide researchers and analysts with an easy to user, fast, accurate and (eventually) extensible implementation of Church's maximal covering location problem, while adhering as best as it can to tidyverse principles.

This vignette aims to get users started with using maxcovr. In this vignette you will learn about:

The motivation behind maxcovr
How to assess current coverage of facilities
How to maximise coverage using some new proposed facilities
Explore results of the new locations
How to explore the new locations proposed

Other vignettes provided in the package include:

"Cross Validation with maxcovr", which explains how to perform and interpret cross validation results in a maximum covering location framework
"Performing the maximal coverage relocation problem"

The moviation: Why `maxcovr`

maxcovr was created to make it easy for a non expert to correctly solve the maximum covering location problem. This problem is beginning to be applied in problems in AED placement, but has been applied in many different areas. Many implementations in papers apply commercial software such as AMPL, Gurobi, or CPLEX. Additionally, the code that they use in the paper to implement the optimisation is not provided, and has to be requested. As a result, these analyses are more difficult to implement, and more difficult to reproduce.

maxcovr was created to address these shortcomings, using:

R, a free and open source language
Open source solvers, lpSolve, and glpk, that can be used on Linux, Windows, and OSX.
Real-world, open source example data.
Tidyverse principles to design it for humans make it easy to use, code, and extend.

This means results are easy to implement, reproduce, and extend.

The problem

Consider the toy example where we want to place crime surveillance towers to detect crime. We have two datasets, york, and york_crime:

york contains listed building GPS locations in the city of York, provided by the city of york
york_crime contains a set of crime data from the ukpolice package, containing crime data from September 2016.

We already have a few surveillance towers built, which are placed on top of the listed buildings with a grade of I. We will call this dataset york_existing, and the remaining building locations york_proposed. Here, york_existing is the currently locations of facilities, and york_proposed is the potential facility locations.

library(maxcovr)
library(dplyr)

# subset to be the places with towers built on them.
york_existing <- york %>% filter(grade == "I")

york_proposed <- york %>% filter(grade != "I")

We are interested in placing surveillance towers so that they are within 100m of the largest number of crimes. We are going to use the crime data that we have to help us choose ideal locations to place towers.

This can be illustrated with the following graphic, where the red circles indicate the current coverage of the building locations, so those blue crimes within the circles are within the coverage.

library(leaflet)

leaflet() %>%
    addCircleMarkers(data = york, 
                     radius = 1,
                     color = "steelblue") %>%
    addCircles(data = york_existing, 
               radius = 100,
               stroke = TRUE,
               fill = NULL,
               opacity = 0.8,
               weight = 3,
               color = "coral") %>%
    # addTiles() %>%
    addProviderTiles("CartoDB.Positron") %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

Visually, the coverage looks OK, but to get a better estimate, we can verify the coverage using the nearest() function.

nearest() takes two dataframes and returns the nearest lat/long coords from the first dataframe to the second dataframe, along with the distances between them and the appropriate columns from the building dataframe. You can think of reading this as "What is the nearest (nearest_df) to (to_df)".

dat_dist <- nearest(nearest_df = york_proposed, 
                    to_df = york_crime)

head(dat_dist)

Note that maxcovr records the positions of locations must be named "lat" and "long" for latitude and longitude, respectively.

The dataframe dat_dist contains the nearest proposed facility to each crime. This means that we have a dataframe that contains "to_id" - the ID number (labelled from 1 to the number of rows in the to dataset), "nearest_id" describes the row numer of "nearest_df" that corresponds to the row location of that data.frame. We also have the rest of the columns of york_crime. These are returned to make it easy to do other data manipulation / exploration tasks, if you wish.

You can instead return a dataframe which has every building in the rows, and the nearest crime to the building, by simply changing the order.

dat_dist_bldg <- nearest(york_crime,york_existing)

head(dat_dist_bldg)

To evaluate the coverage we use coverage, specifying the distance cutoff in distance_cutoff to be 100m.

coverage(york_proposed, 
         york_crime,
         distance_cutoff = 100)

This contains useful summary information:

distance_within - this is the distance used to determine coverae
n_cov - this is the number of events that are covered
n_not_cov - the number of events not covered
pct_cov - the proportion of events covered
pct_not_cov - the proportion of events not covered
dist_avg - the average distance from the rows to the nearest facility or user
dist_avg - the standard deviation of the distance from the rows to the nearest facility or user.

This tells us that out of all the crime, 37.76% of it is within 100m, 339 crimes are covered, but the mean distance to the surveillance camera is 1400m.

using max_coverage

Knowing this information, you might be interested in improving this coverage. To do so we can use the max_coverage function.

This function takes 5 arguments:

existing_facility = the facilities currently installed
proposed_facility = the facilities proposed to install
user = the users that you want to maximise coverage over
n_added = the number of new facilities that you can install
distance_cutoff = the distance that we consider coverage to be within.

Similar to using nearest - the data frames for existing_facility, proposed_facility, and user need to contain columns of latitude and longitude, and they must be named "lat", and "long", respectively. These are used to calculate the distance.

# mc_20 <- max_coverage(A = dat_dist_indic,
mc_20 <- max_coverage(existing_facility = york_existing,
                      proposed_facility = york_proposed,
                      user = york_crime,
                      n_added = 20,
                      distance_cutoff = 100)

We can look at a quick summary of the model with summary

summary(mc_20)

Here this tells us useful information about the distance cutoff, the number of facilities added, and the number of users covered, and previousl, and the proportion of events covered.

To get this information out we can use the information in each of the columns below. The information each of these is a list, which might seem strange, but it becomes very useful when you are assessing different levels of n_added. We will go into more detail for this soon.

Firstly, we have the data input into n_added and distance_cutoff - the same information that we entered.

mc_20$n_added[[1]]
mc_20$distance_cutoff[[1]]

We can then get summary information about the model coverage. We can first get the existing, or previous coverage with existing_coverage

mc_20$existing_coverage[[1]]

This provides us with the information that we saw previously with summarise_coverage.

We can then get the information of the coverage from the model with the added additional AEDs with model_coverage.

mc_20$model_coverage[[1]]

We can then get both pieces of information from summary

mc_20$summary[[1]]

You can drill deeper, and get more information about the facilities using facility_selected, which returns facilities selected from the proposed_facility data.

mc_20$facility_selected[[1]]

We can then get information about the users with augmented_users

mc_20$augmented_users[[1]]

This returns the dataframe of users, with the distance to their nearest AED, and at the end, provides information about the type of AED that is used.

Now try and run the code for n_added = 40, and call it "mc_40"

Interpreting results

We can assess what happens if we add 100 facilities.

mc_100 <- max_coverage(existing_facility = york_existing,
                      proposed_facility = york_proposed,
                      user = york_crime,
                      n_added = 100,
                      distance_cutoff = 100)

summary(mc_100)

So then, if we want to add information to discover the differences between 20 and 100, we can bind these two pieces together using bind_rows.

mc_20_100 <- bind_rows(mc_20, mc_100)

We can then look at the summary row, and expand the information out here using tidyr::unnest()

mc_20_100_sum <- mc_20_100 %>% select(summary) %>% tidyr::unnest()
mc_20_100_sum

This information can then be plotted, for example, like so:

library(ggplot2)
ggplot(mc_20_100_sum,
       aes(x = n_added,
           y = pct_cov)) + 
    geom_point() + 
    geom_line()

You can then produce a plot of the average distances.

library(ggplot2)
ggplot(mc_20_100_sum,
       aes(x = n_added,
           y = dist_avg)) + 
    geom_point() + 
    geom_line()

If you would like to calculate your own summaries on the distances, I would recommend something like:

mc_20$augmented_users[[1]] %>%
    summarise(mean_dist = mean(distance),
              sd_dist = sd(distance),
              median_dist = median(distance),
              lower_975_dist = quantile(distance, probs = 0.025),
              upper_975_dist = quantile(distance, probs = 0.975))

You can then package this up in a function and apply it to all rows

my_dist_summary <-  function(data){
    data %>%
    summarise(mean_dist = mean(distance),
              sd_dist = sd(distance),
              median_dist = median(distance),
              lower_975_dist = quantile(distance, probs = 0.025),
              upper_975_dist = quantile(distance, probs = 0.975))
}

This can then be used to create a new column with purrr::map().

mc_20_100 %>%
    mutate(new_summary = purrr::map(augmented_users,
                                    my_dist_summary)) %>%
    select(new_summary) %>%
    tidyr::unnest()

Other graphics options

You might be interested in plotting the existing users, the proposed facilities, and the coverage.

You can plot the existing facilities with leaflet.

york_existing %>%
    leaflet() %>%
    addTiles() %>%
    addCircles(color = "steelblue") %>% 
    setView(lng = median(york_existing$long),
            lat = median(york_existing$lat),
            zoom = 12)

You might want to then add the user information to this plot. You can add new circles based on new datasets, and then change the colour, so that they are visible.

leaflet() %>%
    addCircles(data = york_crime, 
               color = "steelblue") %>%
    addCircles(data = york_existing, 
               color = "coral") %>%
    addTiles() %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

With leaflet you can also specify the radius in metres of the circles. This means that we can set the radius of the circles to be 100, and this is 100m.

leaflet() %>%
    addCircles(data = york_crime, 
               color = "steelblue") %>%
    addCircles(data = york_existing, 
               radius = 100,
               color = "coral") %>%
    addTiles() %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

to make this a bit clearer we can remove the fill (fill = FALSE), and also change the map that is used with addProviderTiles("CartoDB.Positron").

leaflet() %>%
    addCircles(data = york_crime, 
               color = "steelblue") %>%
    addCircles(data = york_existing, 
               radius = 100,
               fill = FALSE,
               color = "coral",
               weight = 3,
               dashArray = "1,5") %>%
    addProviderTiles("CartoDB.Positron") %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

Applying this to the coverage data

So using this knowledge, we can visualise:

crime (york_crime)
existing facilities (york_existing)
newly placed facilities (mc_20$facility_selected[[1]])

leaflet() %>%
    addCircles(data = york_crime, 
               color = "steelblue") %>%
    addCircles(data = york_existing, 
               radius = 100,
               fill = FALSE,
               weight = 3,
               color = "coral",
               dashArray = "1,5") %>%
    addCircles(data = mc_20$facility_selected[[1]], 
               radius = 100,
               fill = FALSE,
               weight = 3,
               color = "green",
               dashArray = "1,5") %>%
    addProviderTiles("CartoDB.Positron") %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

Future work

In the future maxcovr will be able to talk to commercial solvers like CPLEX and Gurobi, and will have a mechanism for extensions.

njtierney/copertura documentation built on Nov. 27, 2024, 7:50 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

njtierney/copertura
A Set of Tools For Solving The Maximal Covering Location Problem

In njtierney/copertura: A Set of Tools For Solving The Maximal Covering Location Problem

The moviation: Why `maxcovr`

The problem

using max_coverage

Interpreting results

Other graphics options

Applying this to the coverage data

Future work

R Package Documentation

Browse R Packages

We want your feedback!

njtierney/copertura A Set of Tools For Solving The Maximal Covering Location Problem

In njtierney/copertura: A Set of Tools For Solving The Maximal Covering Location Problem

The moviation: Why maxcovr

The problem

using max_coverage

Interpreting results

Other graphics options

Applying this to the coverage data

Future work

R Package Documentation

Browse R Packages

We want your feedback!

njtierney/copertura
A Set of Tools For Solving The Maximal Covering Location Problem

The moviation: Why `maxcovr`