In PatrickSVM/knapsack: Algorithms to solve the knapsack problem

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
  #eval = FALSE
)

options(expressions = 50000)

Package description

The package "knapsack" is based on laboratory 6 of the course "Advanced R programming" at LiU. It implements 3 different algorithms to solve or at least approximate the optimal solution for the famous knapsack optimization problem:

| Algorithm | Description | Complexity |
|:---------------|:--------------|---------------:| | Brute Force | Tries out all $2^n$ possible item-combinations | $O(2^n)$ |
| Dynamic Programming | Uses dynamic programming for solving the problem | $O(Wn)$ |
| Greedy Heuristic | Heuristic approach to find a good approximation of the optimal solution | $O(n\ log(n))$ |

In the following paragraphs, we demonstrate the proper usage of the packages functions and also analyze the time complexity of the different algorithms.

Brute Force Algorithm

The following code shows two examples of the brute force algorithm.

library(knapsack)


# we create a dataframe with 2000 possible knapsack items 
knapsack_objects <- get_knapsack_objects(2000)


# Example 1:

# now the brute force algorithm is used to solve for the first 8 items with a 
# maximum weight W of 3500

brute_force_knapsack(x = knapsack_objects[1:8, ], W = 3500)


# Example 2:

# now the brute force algorithm is used to solve for the first 12 items with a 
# maximum weight W of 2000

brute_force_knapsack(x = knapsack_objects[1:12,], W = 2000)

Computation time - Brute Force

# number of items = 16, W = 3500
time <- system.time(brute_force_knapsack(x = knapsack_objects[1:16, ], W = 3500))
time

cat(sprintf(" Computation time for Brute Force (16 items): %#.4f seconds\n\n", time[3]), 
    sprintf("Computation time for Brute Force (18 items): %#.4f seconds\n\n",  system.time(brute_force_knapsack(x = knapsack_objects[1:18, ], W = 3500))[3]),
    sprintf("Computation time for Brute Force (20 items): %#.4f seconds\n\n", system.time(brute_force_knapsack(x = knapsack_objects[1:20, ], W = 3500))[3]))

Dynamic Programming

The following code shows the same two examples as before but uses the dynamic programming algorithm.

# Example 1:

# now the dynamic programming algorithm is used to solve for the first 8 items with a 
# maximum weight W of 3500

knapsack_dynamic(x = knapsack_objects[1:8, ], W = 3500)


# Example 2:

# now the dynamic programming algorithm is used to solve for the first 12 items with a 
# maximum weight W of 2000

knapsack_dynamic(x = knapsack_objects[1:12,], W = 2000)

Computation time - Dynamic Programming

# number of items = 125, W = 3500
time <- system.time(knapsack_dynamic(x = knapsack_objects[1:125, ], W = 3500))
time

cat(sprintf(" Computation time for Dynamic Programming (125 items): %#.4f seconds\n\n", time[3]), 
    sprintf("Computation time for Dynamic Programming (250 items): %#.4f seconds\n\n",  system.time(knapsack_dynamic(x = knapsack_objects[1:250, ], W = 3500))[3]),
    sprintf("Computation time for Dynamic Programming (500 items): %#.4f seconds\n\n", system.time(knapsack_dynamic(x = knapsack_objects[1:500, ], W = 3500))[3]))

Greedy Heuristic

The following code shows two examples for the usage of the greedy heuristic, with a higher amount of items.

# Example 1:

# now the greedy heuristic is used to solve for the first 800 items with a 
# maximum weight W of 3500

greedy_knapsack(x = knapsack_objects[1:800,], W = 3500)


# Example 2:

# now the greedy heuristic is used to solve for the first 1200 items with a 
# maximum weight W of 2000

greedy_knapsack(x = knapsack_objects[1:1200,], W = 2000)

Computation time - Greedy Heuristic

# we create a dataframe with 1,000,000 possible knapsack items 
knapsack_objects <- get_knapsack_objects(1000000)

# number of items = 10000, W = 3500
time <- system.time(greedy_knapsack(x = knapsack_objects[1:10000, ], W = 3500))
print(time)

cat(sprintf(" Computation time for Greedy Heuristic (10,000 items): %#.4f seconds\n\n", time[3]), 
    sprintf("Computation time for Greedy Heuristic (100,000 items): %#.4f seconds\n\n",  system.time(greedy_knapsack(x = knapsack_objects[1:100000, ], W = 3500))[3]),
    sprintf("Computation time for Greedy Heuristic (1,000,000 items): %#.4f seconds\n\n", system.time(greedy_knapsack(x = knapsack_objects, W = 3500))[3]))

Optimization of Computational Time

After profiling the two algorithms, Brute Force and Dynamic Programming, an optimized version of both of them was created, to reduce bottlenecks and thus computational time.

Optimized Brute Force Algorithm

Profiling showed, that transposig the "combi-vector" takes a lot of time. But it is just a necessary operation in math but not in programming, just multiplying both vectors yields the same result. So we simply deleted this operation, which lead to reduced compuitational time:

# number of items = 20, W = 3500
time_1 <- system.time(brute_force_knapsack(x = knapsack_objects[1:20, ], W = 3500))
print(time_1)

time_2 <- system.time(optimized_brute_force(x = knapsack_objects[1:20, ], W = 3500))
print(time_2)

cat(sprintf(" Computation time for Normal Brute Force Algorithm (20 items): %#.4f seconds\n\n", time_1[3]), 
    sprintf("Computation time for Optimized Brute Force Algorithm (20 items): %#.4f seconds\n\n",  time_2[3]),
    sprintf("Improvement in computational time: %#.4f seconds\n\n", time_1[3]-time_2[3]))

Optimized Dynamic Programming Algorithm

Profiling showed, that the operation "$" takes a lot of time. In the for loop we use it twice to refer to the same value in the dataframe, that is why we can simply store it in a variable, to reduce the number of times the operation is used:

# number of items = 1500, W = 3500
time_1 <- system.time(knapsack_dynamic(x = knapsack_objects[1:1500, ], W = 3500))
print(time_1)

time_2 <- system.time(optimized_dynamic_programming(x = knapsack_objects[1:1500, ], W = 3500))
print(time_2)

cat(sprintf(" Computation time for Normal Dynamic Programming Algorithm (1500 items): %#.4f seconds\n\n", time_1[3]), 
    sprintf("Computation time for Optimized Dynamic Programming Algorithm (1500 items): %#.4f seconds\n\n",  time_2[3]),
    sprintf("Improvement in computational time: %#.4f seconds\n\n", time_1[3]-time_2[3]))

Parallelization of Brute Force Algorithm

To analyze the impact of parallelization on computation time, we integrated an optional parallelized solving approach to the Brute Force Algorithm that uses the mcapply function. By default, brute_force_knapsack still uses non-parallelized processes, but by setting setting the argument parallel to r TRUE, the parallelized approach is used.

The following code contains two examples, which show the time efficiency gain due to parallelization:

# NORMAL, number of items = 18, W = 3500
system.time(brute_force_knapsack(x = knapsack_objects[1:18, ], W = 3500, parallel = FALSE))


# PARALLEL, number of items = 18, W = 3500
system.time(brute_force_knapsack(x = knapsack_objects[1:18, ], W = 3500, parallel = TRUE))

# NORMAL, number of items = 23 W = 30000
system.time(brute_force_knapsack(x = knapsack_objects[1:23, ], W = 30000, parallel = FALSE))


# PARALLEL, number of items = 23, W = 30000
system.time(brute_force_knapsack(x = knapsack_objects[1:23, ], W = 30000, parallel = TRUE))