knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
The Knapsack Problem is a well known problem in Computing. This package, called "knapsack", implements four different solutions for this problem: the first two use brute-force methods - of these, the first makes use only of one processor core, whereas the second uses multiple cores via parallel programming techniques - the remaining two solutions apply dynamic programming and greedy approaches respectively. In each of these cases, the solution is implemented as a function (hereafter "package functions"), as described below.
library(knapsack)
All package functions expect the following two input arguments:
Below is an example of a set of suitable arguments.
set.seed(42) n <- 1000000 x <- data.frame( w=sample(1:4000, size = n, replace = TRUE), v=runif(n = n, 0, 10000) ) W <- 3500 print(x[1:8, ]) print(W)
If no errors are encountered, each package function outputs a list containing two elements:
Note: Brute-force and dynamic programming methods are guaranteed to generate a global maximum value, whereas greedy methods generate a local maximum value that is guaranteed, in the worst case, be half that of the global maximum value. See cited references for more details.
Given n items to choose from, the knapsack_brute_force function goes over each of the 2^n possible ways of filling the knapsack, whilst keeping track of the hitherto most optimal solution, and returns outputs as described above. Our implementation, however, does not allow for n to be any larger than 31; this is because of our reliance on the intToBits function, which requires arguments to be integers, which in turn can be no larger than 2^31 - 1 (2,147,483,647) in R.
knapsack_brute_force(x[1:8, ], 3500)
Given its O(2^n) time complexity, this function does not perform optimally in terms of time taken for execution. For n = 16 items and knapsack capacity W = 2000, we can see below the time taken for function execution.
system.time(knapsack_brute_force(x[1:16, ], 2000))
The knapsack_brute_force function described above also takes on an optional argument 'parallel' whose value is set to FALSE by default. Forcing it to TRUE however invokes the parallelised version of the above mentioned brute-force solution. If the machine running the function has suitable computing hardware and if there are a sufficiently large number of objects to choose from, then despite the function still having time complexity of order O(2^n), parallelisation will deliver significantly faster execution.
``` {r, eval = FALSE} system.time(knapsack_brute_force(x[1:21, ], 2000)) system.time(knapsack_brute_force(x[1:21, ], 2000, parallel = TRUE))
Note that the above two lines of code cannot be run as part of this vignette, as current CRAN principles do not allow for package vignettes to use more than two processor cores for computation. Our implementation, on the other hand, makes use of as many processor cores as are available. The above lines of code are being shown in this vignette merely as examples that can, in general, be run. To give you a sense of the performance gain involved, below are images of runs of the parallelised and non-parallelised versions of the brute force based algorithm on a device whose specs are as shown below. {width=50%} {width=25%} ## Function implementing a Dynamic Programming based Solution The __knapsack_dynamic__ function solves the knapsack problem using dynamic programming, as outlined in several texts (see references). ``` {r} knapsack_dynamic(x[1:8, ], 3500)
This function has time complexity of order O(n.W), where n is the number of items that can potentially be included in the knapsack and W is the knapsack capacity. One can therefore expect faster execution as compared to a single processor Brute Force implementation. For n = 500 items and knapsack capacity W = 2000, we can see below the time taken for function execution.
system.time(knapsack_dynamic(x[1:500, ], 2000))
The function greedy_knapsack uses a greedy approach which, as cited in the literature we reference below, consists of sorting available items in descending order of their value per unit weight and including in the knapsack as many of the top-most items as it can possibly hold. Although this approach may not provide the most optimal solution to the problem, its solution is guaranteed (as also cited in the literature referenced) to generate knapsack value that is, at worst, half of the maximum value that can possibly be generated with the given objects and knapsack capacity.
greedy_knapsack(x[1:8, ], 3500)
This function has time complexity of order O(n.log(n)), where n is the number of items that can potentially be included in the knapsack. This follows from the fact that the complexity of the sorting operation overwhelms the sum of complexities of all the other operations in the procedure. One can therefore expect faster execution even in comparison to Dynamic Programming based implementations. For n = 1,000,000 items and knapsack capacity W = 2000, we can see below the time taken for function execution.
system.time(greedy_knapsack(x[1:1000000, ], 2000))
Profiling using the profvis package helped optimise our greedy_knapsack function. Profiles revealed a bottleneck wherein the input data.frame of knapsack objects was being sorted. Thereafter, we created another version of the greedy function which used sorted row indices in place of the sorted data.frame. This brought in over 2x performance gain.
system.time(another_greedy_knapsack(x[1:1000000, ], 2000))
{length=50%, width=50%}
{length=50%, width=50%}
Among the multiple solutions we have made available through the "knapsack" package, the brute-force based function, "knapsack_brute_force" is the slowest. Parallelisation improves speed of execution, but the inherent high time complexity of the algorithm, makes it less efficient than the "knapsack_dynamic" function. Dynamic programming brings down algorithm time complexity, but can still take a large amount of time for execution if there a very large number of objects to choose from. In such a case, the function "another_greedy_knapsack" will deliver best performance in terms of speed of execution, but at the cost of some accuracy.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.