In harjew/lab6G3: Computer lab 5

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)

library(lab6G3)
library(microbenchmark)

The knapsack problem with 3 different solutions

The knapsack problem is a problem in combinatorial optimisation. Given that each item has a weight and a value, the problem is that you have backpack with a weight limit. You then want to maximize the value of the goods you can put in your backpack.

Generated data

Using a function to generate the data used in this lab. This makes it easier to generate data for the entire lab.

knapsack_data<-function(n=2000,seed=42){

set.seed(seed)

knapsack_objects <-data.frame(

w=sample(1:4000, size = n, replace = TRUE),

v=runif(n = n, 0, 10000))

return(knapsack_objects)

}

Brute force search

This is the only solution that is guaranteed to give a correct answer. But it come with a cost of time and memory space. This approach calculates all the possible combinations that is $2^n$. The calculated runtime is $O(2^n)$.

Parallelize brute force search

Question What performance gain could you get by parallelizing brute force search? Using parallel programming can make the code run faster and save a lot of time. By comparing the results we see that by parallel programming we see that the function becomes about twice as fast.

# 16 observations ?knapsack_data
x<-knapsack_data(16)
print(brute_force_knapsack(x,3500))
# System time orginal
print(microbenchmark(brute_force_knapsack(x,3500),times=5))

# System time parallelize programming
print(microbenchmark(brute_force_knapsack(x,3500,TRUE),times=5))

Dynamic programming

The dynamic approach saves the values already calculated, which implies that the code does not need to calculate the same equation more than once. The calculated runtime is $O(nW)$, where $n$ is the number of observations and $W$ is the maximum weight limit.

# 500 observations ?knapsack_data
x<-knapsack_data(500)
print(knapsack_dynamic(x,3500))
# System time
print(microbenchmark(knapsack_dynamic(x,3500),times=5))

Greedy heuristic

The Greedy approximation is to solve the inbounded knapsack problem. The Greedy heuristic algorithm will not give an exact result, it will return at least $50$% of the true maximum value. The estimated runtime is estimated to $O(n \times log(n))$.

And the runtime for $1 000 000$ observation.

# 1 000 000 observations ?knapsack_data
x<-knapsack_data(1000000)
# Does not show the elements as there are 1 million observations
print(greedy_knapsack(x,3500,F))
# System time
print(microbenchmark(greedy_knapsack(x,3500),times=5))

Profile your code and optimize your code

The qestion is:

What performance gain could you get by trying to improving your code?

By improving the code, the code will require less from the computer, in the terms of memory and the time to execute the code. By using the lineprof package i was able to improve my functions.

I used this to perform parallel programming to identify which features could be improved in the brute_force_knapsack function, The part of the function that in my case took the longest to calculate was the function colSums. The difference between these changes can be seen above in ## Dynamic programming

I also used it to find and delete duble calculations in the knapsack_dynamic function, which did not improve the time visibly. I found that the demanding features I could not find any improvement on because they are critical for the function to work. It was specifically the matrix function and pmax functions. With parallel programming or implentation in Rcpp, time could have been improved.

In the greedy_knapsack function i found that one of the function that was time consuming was ifelse function. By changing this to changing the rows directly I managed to shorten the time to implement the function, as shown below.

x<-knapsack_data(1000000)

# Updated function 
print(microbenchmark(greedy_knapsack(x,3500,show_elements = F,old = F),times=5))

# Old function
print(microbenchmark(greedy_knapsack(x,3500,show_elements = F,old = T),times=5))