The knapsack problem, in its simplest form, asks: given a sack with a maximum carrying capacity, and a set of items of varying weights and values, take a subset of the items that has the highest total value but that still fits in the sack. The puzzlr package includes functions to create instances of the knapsack problem:
library(puzzlr) set.seed(97317) ks <- weakly_correlated_instance(n = 50) ks
Each instance consists of a capacity
and a set of items
:
capacity(ks) items(ks)
By default, items in knapsack instances are ordered in decreasing order of value per unit weight, or density. The top item for a given knapsack instance can be accessed with the next_item
function:
next_item(ks)
Given a knapsack instance, a new instance can be created by either taking or leaving the next item, leaving a sub problem:
take_next(ks) leave_next(ks) sub_problems(ks)
In this way, one can exhaustively enumerate all subsets of items: with the original problem as a root node, create two branches based on the subproblems take_next(ks)
and leave_next(ks)
, and then construct each of their two subproblems, and so on until there are no more items left to consider. The optimal solution is found in one of the leaf nodes of the tree -- among those leaf nodes whose aggregate weight does not exceed capacity(ks)
, it is the one that maximizes the total value. Given n items, there are $2^n$ leaf nodes. So how can one find the optimal solution in a reasonable amount of time?
Branch and bound is a family of algorithms for finding the provably optimal solution of a problem like the knapsack that can be represented in terms of branching sub-problems. In the case of the knapsack problem, given a fast way to estimate upper and lower bounds on the true optimum for a sub-problem, prune the search tree during its construction:
The lazily evaluated lists from the lazylist package provide an appealing way to implement branch-and-bound, due to the ability to define data structures recursively. I'll start with the high-level implementation and then fill in the details. I use the empty_stream()
to prune or terminate a search path, and merge multiple search paths into one by using a search_strategy
that prioritizes search nodes:
library(lazylist) solution_tree <- function(root, branch, search_strategy, best_so_far) { branches <- branch(root) if (length(branches) == 0) return(empty_stream()) explore_and_prune <- function(node) { if (upper_bound(node) < best_so_far) return(empty_stream()) new_best <- max(best_so_far, lower_bound(node)) cons_stream(node, solution_tree(node, branch = branch, search_strategy = search_strategy, best_so_far = new_best)) } branches <- purrr::map(branches, explore_and_prune) purrr::reduce(branches, search_strategy) }
A search strategy should be a function that takes two lazy-lists (or "streams") and merges them into one. Since it's being applied repeatedly at every branch, I like to visualize its effect as braiding or interleaving all of the branches together. The result is a single stream of search nodes, with ordering dependent on the search strategy. There are multiple search strategies I might implement this way: breadth-first, depth-first, etc. As a first example, I'll implement best-first search, in which, at each stage of the search, the "best" node is considered, with "best" defined as the node with the highest upper_bound
. Again, the ability to define things recursively helps -- best_first
is defined as taking the better of the two head elements of two streams, along with the best_first
of the remaining elements of both streams:
best_first <- function(s1, s2) { if (is_emptystream(s1)) return(s2) if (is_emptystream(s2)) return(s1) node1 <- stream_car(s1) node2 <- stream_car(s2) if (upper_bound(node1) >= upper_bound(node2)) { return(cons_stream(node1, best_first(stream_cdr(s1), s2) )) } cons_stream(node2, best_first(s1, stream_cdr(s2)) ) }
I still need a way to create search nodes that have upper_bound
and lower_bound
methods. I'll define a search node as a knapsack instance with some additional information tacked on.
search_node <- function(ks, bounds) { # if there are no items left, then we can't add any more value if (n_items(ks) == 0) return( list(problem = ks, upper_bound = total_value(ks), lower_bound = total_value(ks)) ) b <- bounds(ks) list(problem = ks, upper_bound = b$upper_bound, lower_bound = b$lower_bound) } upper_bound <- function(node) node$upper_bound lower_bound <- function(node) node$lower_bound
I've so far been working under the assumption that there is a quick way to estimate the upper and lower bounds of a knapsack instance. There are various trivial bounds to a problem, for instance a lower bound of 0, or an upper bound equal to the sum of values of all of items. I'm looking for two qualities from a bounding function, and they are in conflict with one another:
Since items are already sorted in order of density, I can take items one at a time until the knapsack is full, giving a decent lower bound. To get an upper bound, I consider an easier to solve problem: what is the most value possible if I'm allowed to take fractional items? Taking items in order of density and then taking a fraction of the next item will result in a total value that is no less than the true optimum. The greedy
function calculates both bounds, and also returns the remaining sub-problem after achieving the lower bound:
greedy <- function(ks) { item <- next_item(ks) # if the only remaining item doesn't fit if (n_items(ks) == 1 && item$weight > capacity(ks)) return( list(lower_bound = total_value(ks), upper_bound = total_value(ks), remaining = leave_next(ks)) ) remaining <- ks while (!is.null(item) && item$weight <= capacity(remaining)) { remaining <- take_next(remaining) item <- next_item(remaining) } lower_bound <- total_value(remaining) upper_bound <- lower_bound if (capacity(remaining) > 0 && !is.null(item)) { partial_amount <- capacity(remaining) / item$weight upper_bound <- upper_bound + (partial_amount * item$value) } list(lower_bound = lower_bound, upper_bound = upper_bound, remaining = remaining) } greedy(ks) # the root of the search tree for ks: search_node(ks, bounds = greedy)
I now have all of the necessary components of a knapsack solver:
solve_knapsack <- function(ks, bounds, search_strategy) { root <- search_node(ks, bounds = bounds) # branch function takes a search node and returns its children branch <- function(node) { subprobs <- sub_problems(node$problem) purrr::map(subprobs, search_node, bounds = bounds) } cons_stream(root, solution_tree(root, branch = branch, search_strategy = search_strategy, best_so_far = root$lower_bound)) } solution <- solve_knapsack(ks, bounds = greedy, search_strategy = best_first) solution[50]
Recall that the leaf nodes collectively hold all possible solutions. Since I'm using best first search, the first leaf-node that is processed must be the optimal solution. If there were a better solution than the first leaf node, it would have to have a better upper bound, so it would have been processed earlier.
A leaf node is one where the subproblem has no remaining items to consider.
optimal <- stream_which(solution, function(x) upper_bound(x) == lower_bound(x)) optimal <- optimal[1] optimal solution[optimal] # view which items were taken taken_items(solution[optimal]$problem)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.