knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(knapsack)
library(microbenchmark)

knapsack is a package to deal with the knapsack problem implementing three different approaches.

knapsack problem

The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible.Details

This problem is NP-hard, meaning that it is "at least as hard as the hardest problem in "NP". NP is a (fundamental) class of problems for which there are (currently) no polynomial time algorithms to solve them.Details.

1) The knapsack package

This package contains three different functions for solving what is called the knapsack problem. The main objective of this package is to study the effects of algorithms with different computional complexity. These functions include:

Brute force search Dynamic programming * Greedy heuristic or

"knapsack_objects" dataset will be used as a dataset example to examine the performance of the three functions. This dataset consists of 1000000 rows on two variables, and has been introduced to the package as a dataset.

data("knapsack_objects")

1-1) Brute force search

The only solution that is guaranteed to give a correct answer in all situations for the knapsack problem is using brute-force search, i.e. going through all possible alternatives and return the maximum value found. This approach is of complexity $O(2^n)$ since all possible combinations $2^n$ needs to be evaluated.

**Question: How much time does it takes to run the algorithm for n = 16 objects?**

Runtime for n = 16, W = 2000
microbenchmark(brute_force_knapsack(x = knapsack_objects[1:16,], W = 2000), times = 3)
#> Unit: seconds
#>                                                          expr      min
#>  brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000) 1.380542
#>        lq     mean  median      uq      max neval
#>  1.398651 1.405494 1.41676 1.41797 1.419179     3

As it can be seen on average it took the code about 1.40 seconds to be run.

1-2) Dynamic Programming

If the weights are actually discrete values (as in our example) we can use this to create an algorithm that can solve the knapsack problem exact by iterating over all possible values of w. In this approach a matrix is created whose rows are the items and the columns starts from 0 ends in W(maximum capacity of the knapsack). The value of the cells are such that for every cell[i, j] the relevant value is maximized. This approach is of complexity $O(Wn)$. Details.

**Question: How much time does it takes to run the algorithm for n = 500 objects?**

Runtime for n = 500, W = 2000
microbenchmark(knapsack_dynamic(x = knapsack_objects[1:500,], W = 2000), times = 3)
#> Unit: seconds
#>                                                       expr      min
#>  knapsack_dynamic(x = knapsack_objects[1:500, ], W = 2000) 1.960008
#>        lq     mean   median      uq      max neval
#>  1.963645 1.967102 1.967282 1.97065 1.974017     3

The run time mean for dynamic solution was about 2 seconds. Considering the value of n, this algorithm had a better performance compared to brute force, as it was expected.

1-3) Greedy Approximation

George Dantzig proposed a greedy approximation algorithm to solve the knapsack problem. His version sorts the items in decreasing order of value per unit of weight, $\frac{v_i}{w_i}$. It then proceeds to insert them into the sack, starting with the first item and continue until there is no longer space in the sack for more.

This algorithm will not give an exact result (but it can be shown that it will return at least $50\%$ of the true maximum value), but it will reduce the computational complexity considerably (actually to $O(n \log {n})$ due to the sorting part of the algorithm). Details

**Question: How much time does it takes to run the algorithm for n = 1000000 objects?**

Runtime for n = 1000000, W = 2000
microbenchmark(greedy_knapsack(x = knapsack_objects, W = 2000), times = 3)
#> Unit: milliseconds
#>                                             expr      min       lq
#>  greedy_knapsack(x = knapsack_objects, W = 2000) 145.4702 159.5973
#>      mean   median       uq      max neval
#>  167.5093 173.7244 178.5288 183.3332     3

As illustrated the mean performance for greedy approximation was around 170 milliseconds. This performance was by far better than the other two algorithms, considering that n = 1000000 here.

2) Profiling

The package "lineprof" is not available (for R version 3.6.1). Thus, we implemented the package "profvis".

library(profvis)

2-1) Brute force

source("brute_force_knapsack.R")
bf <- profvis(brute_force_knapsack(x = knapsack_objects[1:16,], W = 2000, parallel = FALSE))
bf

From the result it seems that much of the running time is because of the subsetting the data frame using $\$$ sign. So we decided to convert the data frame to a matrix and continue with the matrix.

change the data frame to matrix
brute_force_knapsack_mat <- function(x, W){

  stopifnot(is.data.frame(x))
  stopifnot(names(x) == c("w", "v"))
  stopifnot(x[] > 0)
  stopifnot(is.numeric(W), length(W) == 1, W > 0)


  n <- nrow(x)
  x <- as.matrix(x)  ### The data frame changed to a matrix ###


  best_weight <- 0
  best_value <- 0
  elements <- NA
  i = 1

  while(i <= 2^n - 1) {

    binary <- intToBits(i)
    index <- which(binary == 1)

    total_w = 0
    total_v = 0

    for (j in 1:length(index)) {
      total_w = total_w + x[[index[j], 1]] 
      total_v = total_v + x[[index[j], 2]]
    }
    if(total_w <= W){

      if(total_v > best_value){

        best_weight <- total_w
        best_value <- total_v
        elements <- index
      }
    }
    i = i + 1

  }
  return(list( value = best_value, elements = elements))
}
Runtime for n = 16, W = 2000
microbenchmark(brute_force_knapsack_mat(x = knapsack_objects[1:16,], W = 2000), times = 3)
#> Unit: milliseconds
#>                                                              expr      min
#>  brute_force_knapsack_mat(x = knapsack_objects[1:16, ], W = 2000) 241.5265
#>        lq     mean   median      uq      max neval
#>  242.8438 250.2282 244.1611 254.579 264.9969     3

*As it is shown the mean of the runtime decreased to about 250 milliseconds which is much less than that of the original code(1.4 seconds, see 1-1 ).*

2-2) Dynamic Programming

source("knapsack_dynamic.R")
kd <- profvis(knapsack_dynamic(x = knapsack_objects[1:500,], W = 2000))
kd

The same issue as of the brute force search can also be deduced from this result.

change the data frame to matrix
knapsack_dynamic_mat <- function(x, W){

  stopifnot(is.data.frame(x))
  stopifnot(names(x) == c("w", "v"))
  stopifnot(x[] > 0)
  stopifnot(is.numeric(W), length(W) == 1, W > 0)

  x_new <- as.matrix(rbind(c(0,0), x))  ### Changing the data frame to matrix ###

  n <- nrow(x_new)
  m = matrix(NA, nrow = n+1, ncol = W+1)

  m[1,] <- 0
  m[,1] <- 0


  for (i in 2:n) {
    for (j in 2:W) {

      if(j >= x_new[[i,1]]){
        m[i, j] <- max(m[i-1, j],m[i-1, j-x_new[[i, 1]]] + x_new[[i,2]])
      }else{
        m[i, j] <- m[i-1, j]
      }
    }
  }

  value <- m[n,W]


  N <- n
  c <- W
  elements <- c()
  currValue <- m[N,c]

  while(N > 1){

    if(currValue %in% m[N-1,] == FALSE){

      elements <- c(elements, N-1)
      currValue <- currValue - x_new[[N, 2]]
      N <- N-1
    }else{
        N<- N-1
        }
  }
  return(list(value = round(value), elements = rev(elements)))
}
Runtime for n = 500, W = 2000
microbenchmark(knapsack_dynamic_mat(x = knapsack_objects[1:500,], W = 2000), times = 3)
#> Unit: milliseconds
#>                                                           expr      min
#>  knapsack_dynamic_mat(x = knapsack_objects[1:500, ], W = 2000) 306.7584
#>        lq     mean   median       uq      max neval
#>  325.8744 338.8215 344.9904 354.8531 364.7158     3

*The result suggests that the running time declined to about 340 milliseconds from the original 2 seconds (see 1-2 ).*

2-3) Greedy approximation

source("greedy_knapsack.R")
gd <- profvis(greedy_knapsack(x = knapsack_obj, W = 2000))
gd

For the Greedy algorithm, the data changed to a matrix as well. In addition, we eliminated some line of our codes related to sorting the weights and values. Instead we implemented the order function to sort the data as a whole.

greedy_knapsack_mat <- function(x, W){

  stopifnot(is.data.frame(x))
  stopifnot(names(x) == c("w", "v"))
  stopifnot(x[] > 0)
  stopifnot(is.numeric(W), length(W) == 1, W > 0)


  n <- nrow(x)
  x$density <- x[[2]] / x[[1]]  ### Adding the v/w fractions to the data###
  x$ix <- 1:n                   ### Adding the indices to the data###
  x <- as.matrix(x[order(x$density, decreasing = TRUE),]) ### Sorting the data by                                                                                             order function and convert it to a matrix###


  #ValPerW <- x$v / x$w                    ### These lines are eliminated###

  #sortedValPerW <- sort(ValPerW, decreasing = TRUE, index.return = TRUE)

  #sorted_w <- x$w[sortedValPerW$ix]
  #sorted_v <- x$v[sortedValPerW$ix]

  total_w <- 0
  total_v <- 0
  elements <-c()

  i <- 1
  while(total_w <= W & i <= n){

    if((x[[i, 1]] + total_w) <= W){

      total_w <- x[[i,1]] + total_w
      total_v <- x[[i,2]] + total_v
      elements <-c(elements,x[[i, 4]])
      i <- i + 1
    }else{break}

  }

  return(list(value = round(total_v), elements = elements))
}
Runtime for n = 1000000, W = 2000
microbenchmark(greedy_knapsack_mat(x = knapsack_objects, W = 2000), times = 3)
#> Unit: milliseconds
#>                                                 expr      min       lq
#>  greedy_knapsack_mat(x = knapsack_objects, W = 2000) 229.1516 232.4456
#>      mean   median       uq      max neval
#>  236.4506 235.7396 240.1001 244.4605     3

*This statistics shows the mean time as about 240 millisecond which is more than the runtime of original code(about 170 ms, see 1-3). Thus, these changes did not end in a better performance.*

3) Brute Force using parallelization

We implemented parLapply to see if a better performance may be gained. For this implementation we require "parallel" package.

library(parallel)
Runtime for n = 16, W = 2000
microbenchmark(brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000, parallel = TRUE), times = 3)
#> Unit: seconds
#>                                                                                expr
#>  brute_force_knapsack(x = knapsack_objects[1:16, ], W = 2000,      parallel = TRUE)
#>       min       lq     mean   median       uq     max neval
#>  3.149329 3.162349 3.210859 3.175369 3.241624 3.30788     3

In comparison to the results in section 1-1, this result shows a rise in runtime mean. So no performance improvement gained implementing this approach.



mpirmoradiyan/knapsack documentation built on Nov. 4, 2019, 7:31 p.m.