knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-figs/"
)

maxcovr

AppVeyor Build StatusTravis-CI Build StatusCoverage Status

maxcovr was created to make it easy for a non expert to correctly solve the maximum covering location problem described by Church. Implementations of this problem (such as optimimum AED placement) may use commercial software such as AMPL, Gurobi, or CPLEX, which require an expensive license, or use open source licenses but not provide source code to the analysis performed (e.g., Bonnet 2014 ) This builds a substantial barrier to implement and reproduce these analyses.

maxcovr was created to make results easy to implement, reproduce, and extend by using:

Please note that this project is released with a Contributor Code of Conduct. By participating in this project you agree to abide by its terms.

How to Install

# install.packages("devtools")
devtools::install_github("njtierney/maxcovr")

Using maxcovr

Disclaimer: The following is a fictitious example using real world data.

Consider the toy example where we are playing a tower defense game and we need to place crime surveillance towers to detect crime.

We have two datasets, york, and york_crime:

In this game we already have a few towers built, which are placed on top of the listed buildings with a grade of I. We will call this dataset york_selected, and the remaining building locations york_unselected

library(maxcovr)
library(dplyr)

# subset to be the places with towers built on them.
york_selected <- york %>% filter(grade == "I")

york_unselected <- york %>% filter(grade != "I")

The purpose of the game is to build towers in places so that they are within 100m of crime. We are going to use the crime data that we have to help us choose ideal locations to place towers.

This can be illustrated with the following graphic, where the red circles indicate the current coverage of the building locations, so those blue crimes within the circles are within the coverage.

library(leaflet)

leaflet() %>%
    addCircleMarkers(data = york, 
                     radius = 1,
                     color = "steelblue") %>%
    addCircles(data = york_selected, 
               radius = 100,
               stroke = TRUE,
               fill = NULL,
               opacity = 0.8,
               weight = 2,
               color = "coral") %>%
    addProviderTiles("CartoDB.Positron") %>%
    setView(lng = median(york$long),
            lat = median(york$lat),
            zoom = 15)

Currently the coverage looks alright, but let's verify the coverage using the nearest function. nearest takes two dataframes and returns the nearest lat/long coords from the first dataframe to the second dataframe, along with the distances between them and the appropriate columns from the building dataframe.

dat_dist <- york_selected %>% nearest(york_crime)

head(dat_dist)

You can instead return a dataframe which has every building in the rows, and the nearest crime to the building, by simply changing the order.

dat_dist_bldg <- york_crime %>% nearest(york_selected)
head(dat_dist_bldg)

To evaluate the coverage we can use coverage. This reads as find the coverage of the york buildings (below) to the crimes. Coverage of the first thing on the second thing. Or, how many of the second thing are covered by the first thing.

coverage(york_selected, york_crime)

This tells us that out of all the crime, 18.68% of it is within 100m, 339 crimes are covered, but the mean distance to the surveillance camera is 1400m.

Maximising coverage

Say then we want to add another 20 surveillance towers, but we want to use the best 20, we use max_coverage.

system.time(
# mc_20 <- max_coverage(A = dat_dist_indic,
mc_20 <- max_coverage(existing_facility = york_selected,
                      proposed_facility = york_unselected,
                      user = york_crime,
                      n_added = 20,
                      distance_cutoff = 100)
)

max_coverage actually returns a dataframe of lists.

mc_20

This is handy because it means that later when you want to explore multiple n_added, say you want to explore how coverage changes for 20, 40, 60, 80, 100 n_added, then these are added as rows in the dataframe, which makes it easier to do summaries and manipulate.

Important features here of this dataframe are:

One can also use map from purrr to fit many different configurations of n_added. (Future work will look into allowing n_added to take a vector of arguments).

library(purrr)
n_add_vec <- c(20, 40, 60, 80, 100)

system.time(
map_mc_model <- map_df(.x = n_add_vec,
                       .f = ~max_coverage(existing_facility = york_selected,
                                          proposed_facility = york_unselected,
                                          user = york_crime,
                                          distance_cutoff = 100,
                                          n_added = .))
)

This returns a list of dataframes, which we can bind together like so:

map_cov_results <- bind_rows(map_mc_model$model_coverage)

We can then visualise the effect on coverage:

library(ggplot2)
bind_rows(map_mc_model$existing_coverage[[1]],
          map_cov_results) %>%
    ggplot(aes(x = factor(n_added),
               y = pct_cov)) + 
    geom_point() +
    geom_line(group = 1) + 
    theme_minimal()

You can read more about the use of max_coverage, covering topics like cross validation in the vignette.

Known Issues

Future Work

I will be focussing on making maxcovr work well within the tidyverse. This includes providing sensible standard summaries using key function verbs from broom, adding exploratory plots, improving speed using Rcpp, and allowing users to select the solver they want to use.

# 
# n_add_vec <- c(20,40)
# 
# system.time(
#     mc_cv_fit_many <- 
#         map2_df(mc_cv$train, # training set goes here
#                 n_add_vec,
#                 .f = ~max_coverage,
#                 existing_facility = york_selected,
#                 proposed_facility = york_unselected,
#                 distance_cutoff = 100)
# )
# 
# 
# system.time(
#     mc_cv_fit_many <- 
#         pmap(.l = list(user = mc_cv$train, # training set goes here
#                        n_added = n_add_vec),
#              .f = ~max_coverage,
#              existing_facility = york_selected,
#              proposed_facility = york_unselected,
#              distance_cutoff = 100)
# )

If you have any suggestions, please file an issue and I will get to it as soon as I can.

Code of Conduct

Please note that this project is released with a Contributor Code of Conduct.

By participating in this project you agree to abide by its terms.

Acknowledgements

Oliver Czibula, for this initial help in helping me understand the Maximum Covering Location Problem. Angelo Auricchio and Antonietta Mira for their supervision. Alessio Quaglino and Jost Reinhold for their help in implementing the first implementation of this problem in lpSolve. Martin Weiser for his suggestion for the relocation process. Matt Sutton for his very thoughtful explanations on how to interact with high level solvers, which led to implementing maxcovr in lpsolve, glpk, and gurobi. And Alex Simmons, for teaching me how to write better C++ code.



njtierney/copertura documentation built on Nov. 13, 2019, 6:37 p.m.