knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(Group8knapsack)
This package is designed to tackle the classical knapsack problem with 3 different approaches: Brute Force Method: brute_force_knapsack() Dynamic Programming: knapsack_dynamic() * Greedy heuristic: greedy_knapsack()
But before we get into the algorithms used, we've written a data generation function that creates a set of objects to be used for the knapsack problem. The sample data can be created by the user providing the total number of objects 'n'
test_knapsack <- knapsack_objects(2000)
This returns a dataset with 2 columns 'w' which represents the weights of each object and 'v' which represents the corresponding value of each knapsack object.
The first approach is the 'naive' approach or as we call it the brute force method. It is so called because it iterates over every possible combination of all available knapsack objects and finds the most optimum solution. The function to be called for this algorithm is displayed as follows:
test_knapsack <- knapsack_objects(2000) brute_force_knapsack(test_knapsack[1:12,], 3500)
This method results in the computation of a problem whose complexity is of the order 2^n for n objects. Hence, the computation of the problem takes an incrementally longer amount of time as we keep increasing knapsack objects
start_time <- Sys.time() invisible(brute_force_knapsack(x = test_knapsack[1:12,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
start_time <- Sys.time() invisible(brute_force_knapsack(x = test_knapsack[1:16,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
We next improve the speed of solving by applying dynamic programming. In this approach we iterate over every element and check if it should be included in the knapsack or not. We achieve this by filling a matrix of the dimensions - 'n' x 'W'; where 'n' is the total number of knapsack objects and 'W' is the knapsack capacity. This approach follows that the most optimum solution will be found [n,W] cell of the resultant matrix. The function can be called as follows:
test_knapsack <- knapsack_objects(2000) knapsack_dynamic(test_knapsack[1:12,], 3500)
The order of complexity in this approach is O(nW) as the matrix used to derive the result is of the dimension n x W. Hence the loop runs at most 'nW' times.
While this solution is significantly faster than the brute force method, the time taken to solve this problem still increases significantly as we keep increasing the number of objects in the knapsack.
start_time <- Sys.time() invisible(knapsack_dynamic(x = test_knapsack[1:12,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
start_time <- Sys.time() invisible(knapsack_dynamic(x = test_knapsack[1:16,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
start_time <- Sys.time() invisible(knapsack_dynamic(x = test_knapsack[1:500,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
The final algorithm used in solving this problem is the greedy method. In this method, we assume that the object in the knapsack can be split and included in the knapsack by parts. Therefore, this algorithm ranks all the objects in the decreasing order of value per unit of weight of each object and start filling the knapsack with the objects with higher yield. When it reached the upper limit of the knapsack capacity, it then takes a fraction of the final object and returns the resulting value and the list of chosen elements. The usage of the function is as follows:
test_knapsack <- knapsack_objects(2000000) greedy_knapsack(test_knapsack[1:800,], 3500)
This particular code runs significantly faster and hance can be used to find a solution for much larger data sets in a limited amount of time.
start_time <- Sys.time() invisible(greedy_knapsack(x = test_knapsack[1:800,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
start_time <- Sys.time() invisible(greedy_knapsack(x = test_knapsack[1:1200,], W = 3500)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
start_time <- Sys.time() invisible(greedy_knapsack(x = test_knapsack[1:1000000,], W = 350000)) end_time <- Sys.time() print(difftime(end_time, start_time, units = "secs"))
While solving the greedy algorithm, we initially developed a better solution because even after filling the knapsack with objects ranked according to their benefit, we DID NOT break the loop immediately. Instead, we kept the loop going over all the objects to find any other object which may be able to fit into the knapsack. We were able to find such elements and include them into the knapsack. To demonstrate, we have created another function called greedy_knapsack_better()
test_knapsack <- knapsack_objects(2000) greedy_knapsack(test_knapsack[1:800,], 3500) greedy_knapsack_better(test_knapsack[1:800,], 3500)
As can be seen from the 2 results above, the better solution also takes the object #401 in the final solution!
In order to profile our code, we focused on the brute force method with 20 objects as a sample. We used the following method to profile our original brute_force_knapsack function
This led us to find that the bottleneck was the subset function that we were using while evaluating that particular combination. We modified the subsetting method as follows which led to a faster code
This change in the method used to subset led to a faster code with the following improvement:
suppressWarnings(RNGversion(min(as.character(getRversion()),"3.5.3"))) set.seed(42, kind = "Mersenne-Twister", normal.kind = "Inversion") n <- 2000 knapsack_objects <- data.frame( w=sample(1:4000, size = n, replace = TRUE), v=runif(n = n, 0, 10000) ) x <- knapsack_objects[1:20,] W = 3500 new_bfk <- function(x, W) { n <- length(x$v) #to count number of combinations we will evaluate value_sum = 0 combinations_x <- lapply(1:(2^n-1), function(x) as.integer(intToBits(x))) for (i in 1:(2^n - 1)) { comb_elements <- which(combinations_x[[i]] > 0) sub_df <- x[as.integer(row.names(x)) %in% comb_elements, ] if (sum(sub_df$w) < W && sum(sub_df$v) > value_sum) { value_sum = sum(sub_df$v) value = sum(sub_df$v) elements = comb_elements } } } old_bfk <- function(x, W){ n <- length(x$v) #to count number of combinations we will evaluate value_sum = 0 combinations_x <- lapply(1:(2^n-1), function(x) as.integer(intToBits(x))) for (i in 1:(2^n - 1)) { comb_elements <- which(combinations_x[[i]] > 0) sub_df <- subset(x, as.integer(row.names(x)) %in% comb_elements) if (sum(sub_df$w) < W && sum(sub_df$v) > value_sum) { value_sum = sum(sub_df$v) value = sum(sub_df$v) elements = comb_elements } } } old_time <- system.time(old_bfk(x, W)) new_time <- system.time(new_bfk(x, W)) time_improvement <- old_time[3] - new_time[3] time_improvement
We attempted to parallelize the brute force algorithm and were successfully able to implement parallelization on the formation of all possible combinations of the knapsack elements. The results of implementing the two were as follows:
suppressWarnings(RNGversion(min(as.character(getRversion()),"3.5.3"))) set.seed(42, kind = "Mersenne-Twister", normal.kind = "Inversion") n <- 2000 knapsack_objects <- data.frame( w=sample(1:4000, size = n, replace = TRUE), v=runif(n = n, 0, 10000) ) x <- knapsack_objects[1:20,] W = 3500 par_bfk_time <- system.time(brute_force_knapsack(x, W, parallel = TRUE)) bfk_time <- system.time(brute_force_knapsack(x, W)) time_difference <- bfk_time - par_bfk_time time_difference
The results were very interesting in the sense that even though the parallel computing was resulting in faster execution of the user instructions, but the system processes were taking longer and thus resulting in no significant improvement to the overall code speed.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.