In thozh912/Lab6: Performace code for knapsack problem

Package Info

The Knapsack package contains three different function to solve the knapsack problem. In the Knapsack problem we have a knapsack with a limited weight W and a number of items i, each with a weight $w_{i}$ and a value $v_{i}$. The goal of the problem is to fill the knapsack with the items so that the total weight is less than or equal to W and the total value is as large as possible.

The three functions take as input a Data frame with variables v and w, which have to be positive values.

Brute force search

This function solves the Knapsack probem by brute force.

For example. for n = 16 objects:

library(Lab6)

system.time(brute_force_knapsack(knapsack_generator()[1:16,],W =2000))

The brute force function takes around 27 seconds to run the algorithm for n = 16 objects.

Dynamic programming

This function solves the 0/1 Knapsack problem using dynamic programming.

For example, for n = 500 objects:

system.time(knapsack_dynamic(knapsack_generator()[1:500,],W =2000))

The dynamic function takes around 38 seconds to run the algorithm for n = 500 objects.

Greedy heuristic

This function solves the Knapsack problem using a greedy heuristic approach. It does not return an exact result but is computationally faster than the other approaches in this package.

For example, for n = 1000000 objects:

system.time(greedy_knapsack(knapsack_generator()[1:1000000,],W =2000))

The greedy heuristic function takes around 50 seconds to run the algorithm for n = 1000000 objects.

Profiling

We try to improve this appaling performance. Using the Rprof we discovered that the problem is the use of the data frame, in fact replacing the data frame with matrix the functions are faster.

For example for the brute force function:

knapsack_matr <- knapsack_gener_matr()
knapsack_df <- knapsack_generator()

tmp <- tempfile()

Rprof(tmp,interval = 0.01)
brute_force_knapsack(knapsack_matr[1:15,], W = 2000)
Rprof(NULL)
summaryRprof(tmp)$by.self

tmp <- tempfile()

Rprof(tmp,interval = 0.01)
brute_force_knapsack(knapsack_df[1:15,], W = 2000)
Rprof(NULL)
summaryRprof(tmp)$by.self

It can be seen that using the matrix the function is a lot faster. Further improvements which the profiling suggests are possible include replacing / division in the greedy knapsack algorithm with a division algorithm of some sort. Attempts at replacing the max() function in the dynamic programming algorithm with if block did not improve performance.

Parallelize Brute force search

We added an argument, parallel, in the brute force function that when it is set to TRUE the function parallelize over the multiple cores. We used a computer with 8 cores.

  tmp <- tempfile()
  Rprof(tmp,interval = 0.01)
  brute_force_knapsack(knapsack_matr[1:10,], W = 2000, parallel = TRUE)
  Rprof(NULL)
  summaryRprof(tmp)$by.self

This function is torturously slow due mainly to its huge expand.grid() function call that it creates a lot of times. Let us try to manually create the very same indexing array which expand.grid() creates. We shall try a matrix.

  tmp <- tempfile()
  Rprof(tmp,interval = 0.01)
  alt_brute_force_knapsack(knapsack_matr[1:10,], W = 2000, parallel = TRUE)
  Rprof(NULL)
  summaryRprof(tmp)$by.self

We see that the total time is approximately halved by switching to a matrix indexer. This is fairly nice. Maybe there are even better ways to speed this memory allocation. Maybe the code is so badly written it has to be totally rewritten. (it is a first attempt at parallelization)

thozh912/Lab6 documentation built on May 31, 2019, 11:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com