knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )
library(knapsack) library(microbenchmark)
knapsack is a package to deal with the knapsack problem implementing three different approaches.
The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.Details
This problem is NP-hard, meaning that it is "at least as hard as the hardest problem in "NP". NP is a (fundamental) class of problems for which there are (currently) no polynomial time algorithms to solve them.Details.
This package contains three different functions for solving what is called the knapsack problem. The main objective of this package is to study the effects of algorithms with different computional complexity. These functions include:
Brute force search Dynamic programming * Greedy heuristic or"knapsack_objects" dataset will be used as a dataset example to examine the performance of the three functions. This dataset consists of 1000000 rows on two variables, and has been introduced to the package as a dataset.
data("knapsack_objects")
The only solution that is guaranteed to give a correct answer in all situations for the knapsack problem is using brute-force search, i.e. going through all possible alternatives and return the maximum value found. This approach is of complexity $O(2^n)$ since all possible combinations $2^n$ needs to be evaluated.
**Question: How much time does it takes to run the algorithm for n = 16 objects?**
microbenchmark(brute_force_knapsack(x = knapsack_objects[1:16,], W = 2000), times = 3) #> Unit: seconds #> expr min #> brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000) 1.380542 #> lq mean median uq max neval #> 1.398651 1.405494 1.41676 1.41797 1.419179 3
As it can be seen on average it took the code about 1.40 seconds to be run.
If the weights are actually discrete values (as in our example) we can use this to create an algorithm that can solve the knapsack problem exact by iterating over all possible values of w. In this approach a matrix is created whose rows are the items and the columns starts from 0 ends in W(maximum capacity of the knapsack). The value of the cells are such that for every cell[i, j] the relevant value is maximized. This approach is of complexity $O(Wn)$. Details.
**Question: How much time does it takes to run the algorithm for n = 500 objects?**
microbenchmark(knapsack_dynamic(x = knapsack_objects[1:500,], W = 2000), times = 3) #> Unit: seconds #> expr min #> knapsack_dynamic(x = knapsack_objects[1:500, ], W = 2000) 1.960008 #> lq mean median uq max neval #> 1.963645 1.967102 1.967282 1.97065 1.974017 3
The run time mean for dynamic solution was about 2 seconds. Considering the value of n, this algorithm had a better performance compared to brute force, as it was expected.
George Dantzig proposed a greedy approximation algorithm to solve the knapsack problem. His version sorts the items in decreasing order of value per unit of weight, $\frac{v_i}{w_i}$. It then proceeds to insert them into the sack, starting with the first item and continue until there is no longer space in the sack for more.
This algorithm will not give an exact result (but it can be shown that it will return at least $50\%$ of the true maximum value), but it will reduce the computational complexity considerably (actually to $O(n \log {n})$ due to the sorting part of the algorithm). Details
**Question: How much time does it takes to run the algorithm for n = 1000000 objects?**
microbenchmark(greedy_knapsack(x = knapsack_objects, W = 2000), times = 3) #> Unit: milliseconds #> expr min lq #> greedy_knapsack(x = knapsack_objects, W = 2000) 145.4702 159.5973 #> mean median uq max neval #> 167.5093 173.7244 178.5288 183.3332 3
As illustrated the mean performance for greedy approximation was around 170 milliseconds. This performance was by far better than the other two algorithms, considering that n = 1000000 here.
The package "lineprof" is not available (for R version 3.6.1). Thus, we implemented the package "profvis".
library(profvis)
source("brute_force_knapsack.R") bf <- profvis(brute_force_knapsack(x = knapsack_objects[1:16,], W = 2000, parallel = FALSE)) bf
From the result it seems that much of the running time is because of the subsetting the data frame using $\$$ sign. So we decided to convert the data frame to a matrix and continue with the matrix.
brute_force_knapsack_mat <- function(x, W){ stopifnot(is.data.frame(x)) stopifnot(names(x) == c("w", "v")) stopifnot(x[] > 0) stopifnot(is.numeric(W), length(W) == 1, W > 0) n <- nrow(x) x <- as.matrix(x) ### The data frame changed to a matrix ### best_weight <- 0 best_value <- 0 elements <- NA i = 1 while(i <= 2^n - 1) { binary <- intToBits(i) index <- which(binary == 1) total_w = 0 total_v = 0 for (j in 1:length(index)) { total_w = total_w + x[[index[j], 1]] total_v = total_v + x[[index[j], 2]] } if(total_w <= W){ if(total_v > best_value){ best_weight <- total_w best_value <- total_v elements <- index } } i = i + 1 } return(list( value = best_value, elements = elements)) }
microbenchmark(brute_force_knapsack_mat(x = knapsack_objects[1:16,], W = 2000), times = 3) #> Unit: milliseconds #> expr min #> brute_force_knapsack_mat(x = knapsack_objects[1:16, ], W = 2000) 241.5265 #> lq mean median uq max neval #> 242.8438 250.2282 244.1611 254.579 264.9969 3
*As it is shown the mean of the runtime decreased to about 250 milliseconds which is much less than that of the original code(1.4 seconds, see 1-1 ).*
source("knapsack_dynamic.R") kd <- profvis(knapsack_dynamic(x = knapsack_objects[1:500,], W = 2000)) kd
The same issue as of the brute force search can also be deduced from this result.
knapsack_dynamic_mat <- function(x, W){ stopifnot(is.data.frame(x)) stopifnot(names(x) == c("w", "v")) stopifnot(x[] > 0) stopifnot(is.numeric(W), length(W) == 1, W > 0) x_new <- as.matrix(rbind(c(0,0), x)) ### Changing the data frame to matrix ### n <- nrow(x_new) m = matrix(NA, nrow = n+1, ncol = W+1) m[1,] <- 0 m[,1] <- 0 for (i in 2:n) { for (j in 2:W) { if(j >= x_new[[i,1]]){ m[i, j] <- max(m[i-1, j],m[i-1, j-x_new[[i, 1]]] + x_new[[i,2]]) }else{ m[i, j] <- m[i-1, j] } } } value <- m[n,W] N <- n c <- W elements <- c() currValue <- m[N,c] while(N > 1){ if(currValue %in% m[N-1,] == FALSE){ elements <- c(elements, N-1) currValue <- currValue - x_new[[N, 2]] N <- N-1 }else{ N<- N-1 } } return(list(value = round(value), elements = rev(elements))) }
microbenchmark(knapsack_dynamic_mat(x = knapsack_objects[1:500,], W = 2000), times = 3) #> Unit: milliseconds #> expr min #> knapsack_dynamic_mat(x = knapsack_objects[1:500, ], W = 2000) 306.7584 #> lq mean median uq max neval #> 325.8744 338.8215 344.9904 354.8531 364.7158 3
*The result suggests that the running time declined to about 340 milliseconds from the original 2 seconds (see 1-2 ).*
source("greedy_knapsack.R") gd <- profvis(greedy_knapsack(x = knapsack_obj, W = 2000)) gd
For the Greedy algorithm, the data changed to a matrix as well. In addition, we eliminated some line of our codes related to sorting the weights and values. Instead we implemented the order function to sort the data as a whole.
greedy_knapsack_mat <- function(x, W){ stopifnot(is.data.frame(x)) stopifnot(names(x) == c("w", "v")) stopifnot(x[] > 0) stopifnot(is.numeric(W), length(W) == 1, W > 0) n <- nrow(x) x$density <- x[[2]] / x[[1]] ### Adding the v/w fractions to the data### x$ix <- 1:n ### Adding the indices to the data### x <- as.matrix(x[order(x$density, decreasing = TRUE),]) ### Sorting the data by order function and convert it to a matrix### #ValPerW <- x$v / x$w ### These lines are eliminated### #sortedValPerW <- sort(ValPerW, decreasing = TRUE, index.return = TRUE) #sorted_w <- x$w[sortedValPerW$ix] #sorted_v <- x$v[sortedValPerW$ix] total_w <- 0 total_v <- 0 elements <-c() i <- 1 while(total_w <= W & i <= n){ if((x[[i, 1]] + total_w) <= W){ total_w <- x[[i,1]] + total_w total_v <- x[[i,2]] + total_v elements <-c(elements,x[[i, 4]]) i <- i + 1 }else{break} } return(list(value = round(total_v), elements = elements)) }
microbenchmark(greedy_knapsack_mat(x = knapsack_objects, W = 2000), times = 3) #> Unit: milliseconds #> expr min lq #> greedy_knapsack_mat(x = knapsack_objects, W = 2000) 229.1516 232.4456 #> mean median uq max neval #> 236.4506 235.7396 240.1001 244.4605 3
*This statistics shows the mean time as about 240 millisecond which is more than the runtime of original code(about 170 ms, see 1-3). Thus, these changes did not end in a better performance.*
We implemented parLapply to see if a better performance may be gained. For this implementation we require "parallel" package.
library(parallel)
microbenchmark(brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000, parallel = TRUE), times = 3) #> Unit: seconds #> expr #> brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000, parallel = TRUE) #> min lq mean median uq max neval #> 3.149329 3.162349 3.210859 3.175369 3.241624 3.30788 3
In comparison to the results in section 1-1, this result shows a rise in runtime mean. So no performance improvement gained implementing this approach.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.