In this article we will look at the Warehouse Location Problem. Given a set of customers and set of locations to build warehoses the task is to decide where to build warehouses and from what warehouses goods should be shipped to which customer.

Thus there are two decisions that need to made at once: where and if to build warehouses and the assignment of customers to warehouses. This simple setting also implies that at least one warehouse must be built and that any warehouse is big enough to serve all customers.

As a practical example: you run the logistics for an NGO and want to regularly distribute goods to people in need. You identified a set of possible locations to set up your distribution hubs, but you are not sure where to build them. Then such a model might help. In practice however you might need to incorporate additional constraints into the model.

The mathematical model

We start with a set of customers $C = {1 \ldots n}$ and a set of possible warehouses $W = {1 \ldots m}$ that could be built. In addition we have a cost function giving us the transportation cost from a warehouse to a customer. Furthermore there is a fixed cost associated with each warehouse if it will be built. This basic version of the warehouse location problem is adapted from the German Wikipedia page about the problem.

$$ \begin{equation} \begin{array}{[email protected]{}ll} \text{min} & \displaystyle\sum\limits_{i=1}^{n}\sum\limits_{j=1}^{m}\operatorname{transportcost}{i,j} \cdot x{i, j} + \sum\limits_{j=1}^{m}\operatorname{fixedcost}{j} \cdot y{j}& &\ \text{subject to} & \displaystyle\sum\limits_{j=1}^{m} x_{i, j} = 1 & i=1 ,\ldots, n&\ & \displaystyle x_{i, j} \leq y_j, & i=1 ,\ldots, n & j=1 ,\ldots, m&\ & x_{i,j} \in {0,1} &i=1 ,\ldots, n, & j=1 ,\ldots, m \ & y_{j} \in {0,1} &j=1 ,\ldots, m& \end{array} \end{equation} $$

The model in R

The first thing we need is the data. In this article we will simply generate artificial data.

We assume the customers are located in a grid with euclidian distances:

grid_size <- 1000
n <- 100
customer_locations <- data.frame(
  id = 1:n,
  x = round(runif(n) * grid_size),
  y = round(runif(n) * grid_size)

The warehouses are also randomly placed on the grid. The fixed cost for the warehouses are randomly generated as well with mean cost of 10,000.

m <- 20
warehouse_locations <- data.frame(
  id = 1:m,
  x = round(runif(m) * grid_size),
  y = round(runif(m) * grid_size)
fixedcost <- round(rnorm(m, mean = grid_size * 10, sd = grid_size * 5))

The fixed costs to set up a warehouse are the following:


Next step is to build a functions that takes a customer and a warehouse and returns the transport cost.

transportcost <- function(i, j) {
  customer <- customer_locations[i, ]
  warehouse <- warehouse_locations[j, ]
  round(sqrt((customer$x - warehouse$x)^2 + (customer$y - warehouse$y)^2))
transportcost(1, 3)

Now let's plot everything. Black dots are customers and red dots are possible warehouse locations.

p <- ggplot(customer_locations, aes(x, y)) + 
  geom_point() + 
  geom_point(data = warehouse_locations, color = "red", alpha = 0.5, shape = 17) +
  scale_x_continuous(limits = c(0, grid_size)) +
  scale_y_continuous(limits = c(0, grid_size)) +
  theme(axis.title = element_blank(), 
        axis.ticks = element_blank(), 
        axis.text = element_blank(), panel.grid = element_blank())
p + ggtitle("Warehouse location problem", 
          "Black dots are customers. Light red triangles show potential warehouse locations.")

The model with rmpk then looks like this:

model <- MIPModel(ROI_solver("glpk"))

# 1 iff i gets assigned to warehouse j
x <- model$add_variable(i = 1:n, j = 1:m, type = "binary")

# 1 iff warehouse j is built
y <- model$add_variable(j = 1:m, type = "binary")

# maximize the preferences
model$set_objective(sum_expr(transportcost(i, j) * x[i, j], i = 1:n, j = 1:m) + 
                  sum_expr(fixedcost[j] * y[j], j = 1:m), "min") 
# every customer needs to be assigned to a warehouse
model$add_constraint(sum_expr(x[i, j], j = 1:m) == 1, i = 1:n)

# if a customer is assigned to a warehouse, then this warehouse must be built
model$add_constraint(x[i,j] <= y[j], i = 1:n, j = 1:m)

Solve the model

We will use glpk to solve the above model.


We solved the problem with an objective value of r sprintf("%d", model$objective_value()).

matching <- model$get_variable_value(x[i, j]) %>%
  filter(value > .9) %>%  
  select(i, j)

The last step is to add the assignments to the previous plot we generated.

plot_assignment <- matching %>% 
  inner_join(customer_locations, by = c("i" = "id")) %>% 
  inner_join(warehouse_locations, by = c("j" = "id"))
customer_count <- matching %>% group_by(j) %>% summarise(n = n()) %>% rename(id = j)
plot_warehouses <- warehouse_locations %>% 
  mutate(costs = fixedcost) %>% 
  inner_join(customer_count, by = "id") %>% 
  filter(id %in% unique(matching$j))
p + 
  geom_segment(data = plot_assignment, aes(x = x.y, y = y.y, xend = x.x, yend = y.x)) + 
  geom_point(data  = plot_warehouses, color = "red", size = 3, shape = 17) +
  ggrepel::geom_label_repel(data  = plot_warehouses, 
                            aes(label = paste0("fixed costs:", costs, "; customers: ", n)), 
                            size = 2, nudge_y = 20) + 
  ggtitle(paste0("Cost optimal warehouse locations and customer assignment"),
          "Big red triangles show warehouses that will be built, light red are unused warehouse locations. 
Dots represent customers served by the respective warehouses.")

The fixed costs for setting up the r n_distinct(matching$j) warehouses is:



Do you have any questions, ideas, comments? Or did you find a mistake? Let's discuss on Github.

dirkschumacher/rmpk documentation built on Jan. 13, 2020, 4:48 a.m.