Optimal Project Prioritization
In oppr: Optimal Project Prioritization

# define fig sizes
fw <- 7.0
fh <- 4.5

# define dummy data sets
s1 <- data.frame()
s2 <- data.frame()
s3 <- data.frame(a = rep(1, 100))
s4 <- data.frame()
s5 <- data.frame()
s6 <- data.frame()
s7 <- data.frame()
s8 <- data.frame()
s9 <- data.frame()
mean_percent <- 0

# disable running vignette code during R CMD check
is_check <- ("CheckExEnv" %in% search()) || any(c("_R_CHECK_TIMINGS_",
             "_R_CHECK_LICENSE_") %in% names(Sys.getenv()))
knitr::opts_chunk$set(fig.align = "center", eval = !is_check)

options(crayon.enabled = TRUE)
options(pillar.bold = TRUE)

knitr::opts_chunk$set(collapse = TRUE, comment = pillar::style_subtle("#"))

colourise_chunk <- function(type) {
  function(x, options) {
    # lines <- strsplit(x, "\\n")[[1]]
    lines <- x
    if (type %in% c("error", "warning")) {
      lines <- crayon::red(lines)
    }
    paste0(
      '<div class="sourceCode"><pre class="sourceCode"><code class="sourceCode">',
      paste0(
        sgr_to_html(htmltools::htmlEscape(lines)),
        collapse = "\n"
      ),
      "</code></pre></div>"
    )
  }
}
knitr::knit_hooks$set(
  output = colourise_chunk("output"),
  message = colourise_chunk("message"),
  warning = colourise_chunk("warning"),
  error = colourise_chunk("error")
)

# set function for colorizing html
sgr_to_html <- identity
if (requireNamespace("fansi", quietly = TRUE)) {
  sgr_to_html <- fansi::sgr_to_html
}

# load pkg
devtools::load_all()

Overview

The oppr R package is decision support tool for prioritizing conservation projects. Prioritizations can be developed by maximizing expected feature richness, expected phylogenetic diversity, the number of features that meet persistence targets, or identifying a set of projects that meet persistence targets for minimal cost. Constraints (e.g. lock in specific actions) and feature weights can also be specified to further customize prioritizations. After defining a project prioritization problem, solutions can be obtained using exact algorithms, heuristic algorithms, or random processes. In particular, it is recommended to install the 'Gurobi' optimizer because it can identify optimal solutions very quickly. Finally, methods are provided for comparing different prioritizations and evaluating their benefits.

Tutorial

Introduction

Here we will provide a short tutorial showing how the oppr R package can be used to prioritize funding for conservation projects. This package is a general purpose project prioritization decision support tool. It can generate solutions for species-based project prioritization problems [@r3; @r4] and priority threat management problems [@r18]. To develop a project prioritization, this package requires data for (i) conservation projects, (ii) management actions data, and (iii) biodiversity features.

Briefly, biodiversity features are the biological entities that we wish would persist into the future (e.g. threatened populations, native species, eco-systems). These biodiversity features can (and ideally should) include non-threatened species, but should not include threatening processes that we wish to eradicate (e.g. populations of invasive species). Management actions are acts that can be undertaken to enhance biodiversity (e.g. planting native species, reintroducing native species, trapping pest species). Each action should pertain to a specific location (e.g. a national park) or area (e.g. an entire country), and should be associated with a cost estimate. To guide the prioritization, the management actions are grouped into conservation projects (also termed "strategies"). Typically, management actions are grouped into conservation projects based on spatial (e.g. management actions that pertain to the same area), taxonomic (e.g. management actions that pertain to the same pest species or threatened species), or thematic criteria (e.g. management actions that pertain to pest eradication are grouped into a "pest project", and actions that pertain to habitat restoration are grouped into a "habitat project"). Additionally, some conservation projects can be combinations of other projects (e.g. a "pest and habitat project"). Each conservation project should be associated with (i) a probability of succeeding if it is implemented (also termed "feasibility"), (ii) information about which management actions are associated with it, and (iii) an estimate of how likely each conservation feature affected by the project is to persist into the future if the project is implemented (often derived using expert elicitation). The conservation projects should also include a "baseline project" that represents a "do nothing scenario" which has a 100% chance of succeeding and is associated with an action that costs nothing to implement. This is important because we can't find a cost-effective solution if we don't know how much each project improves a species' chance at persistence. For more background information on project prioritization, please refer to Carwardine et al. [-@r18].

To start off, we will initialize the random number generator to ensure reproducibility. Next, we will load the oppr R package. And then we will load the ggplot2, tibble, and tidyr R packages to help with data for visualizations and wrangling.

# set random number generated seed
set.seed(1000)
# load oppr package for project prioritization
library(oppr)
# load ggplot2 package for plotting
library(ggplot2)
# load tibble package for working with tabular data
library(tibble)
# load tidyr package for reshaping tabular data
library(tidyr)

Data simulation

Now we will simulate a dataset to practice developing project prioritizations. Specifically, we will simulate a dataset for a priority threat management exercise that contains 40 features, 70 projects, and 30 actions.

# simulate data
sim_data <- simulate_ptm_data(number_projects = 70, number_actions = 30,
                              number_features = 40)

# extract project, action, feature data
projects <- sim_data$projects
actions <- sim_data$actions
features <- sim_data$features

# manually set feature weights for teaching purposes
features$weight <- exp(runif(nrow(features), 1, 15))

# print data
print(projects) # data for conservation projects
print(actions)  # data for management actions
print(features) # data for biodiversity features

Prioritizations with exact algorithms

Let's assume that we want to maximize the overall persistence of the conservation features, and that we can only spend $1,000 funding management actions. We will also assume that our decisions involve either funding management actions or not (in other words, our decisions are binary). For the moment, let's assume that we value the persistence of each feature equally. Given this information, we can generate a project prioritization problem object that matches our specifications.

# build problem
p1 <- problem(projects = projects, actions = actions, features = features,
              "name", "success", "name", "cost", "name") %>%
      add_max_richness_objective(budget = 1000) %>%
      add_binary_decisions()

# print problem
print(p1)

After building the problem, we can solve it. Although backwards heuristic algorithms have conventionally been used to solve project prioritization problems [e.g. @r3; @r4; @r11], here we will use exact algorithms to develop a prioritization. Exact algorithms are superior to heuristic algorithms because they can provide guarantees on solution quality [@r12; @r13]. In other words, if you specify that you want an optimal solution when using exact algorithms, then you are guaranteed to get an optimal solution. Heuristic algorithms---and even more advanced meta-heuristic algorithms such as simulated annealing that are commonly to guide conservation decision making [@r22]---provide no such guarantees and can actually deliver remarkably poor solutions [e.g. @r21]. Later on we will experiment with heuristic algorithms, but for now, we will use exact algorithms. Since we haven't explicitly stated which solver we would like to use, the oppr R package will identify the best exact algorithm solver currently installed on your system. This is typically the lpSolveAPI R package unless you have manually installed the gurobi R package and the Gurobi optimization suite. Although not shown here, when you try solving the problem you will see information printed to your screen as the solver tries to find an optimal solution. Please do not be alarmed, this is normal behavior and some of the information can actually be useful (e.g. to estimate how long it might take to solve the problem).

# solve problem
s1 <- solve(p1)

# print solution
print(s1)

The s1 table contains the solution (i.e. a prioritization) and also various statistics associated with the solution. Here, each row corresponds to a different solution. By default only one solution will be returned, and so the table has one row. The "solution" column contains an integer identifier for the solution (this is when multiple solutions are output), the "obj" column contains the objective value (i.e. the expected feature richness in this case), the "cost" column stores the cost of the solution, and the "status" column contains information from the solver about the solution (i.e. it found an optimal solution). Additionally, it contains columns for each action ("action_1", "action_2", "action_3", ..., "baseline_action") which indicate their status in the solution. In other words, these columns tell us if each action was prioritized for funding in the solution (i.e. a value of one) or not (i.e. a value of zero). Additionally, it contains columns for each project ("project_1", "project_2", "project_3", ..., "baseline_project") that indicate if the project was completely funded or not. Finally, this table contains columns for each feature ("F1, "F2", "F3, ...) which indicate the probability that each feature is expected to persist into the future if the solution was implemented (for information on how this is calculated see ?add_max_richness_objective).

Since the text printed to the screen doesn't show which projects were completely funded, let's extract this information from the table.

# extract names of funded projects
s1_projects <- s1[, c("solution", p1$project_names())] %>%
               gather(project, status, -solution)

# print project results
print(s1_projects)

# print names of completely funded projects
s1_projects$project[s1_projects$status > 0]

We could also calculate cost-effectiveness measures for each of the conservation projects.

# calculate cost-effectiveness of each project
p1_pce <- project_cost_effectiveness(p1)

# print output
print(p1_pce)

The p1_pce table contains the cost-effectiveness of each project. Specifically, the "project" column contains the name of each project. The "cost" column contains the total cost of all the management actions associated with the project. The "obj" column contains the objective value associated with a solution that contains all of the management actions for each project, and also the actions associated with a zero cost. The "benefit" column contains the difference between (i) the objective value associated with each project and also any zero cost actions (as per the "obj" column), and (ii) the objective value associated with the baseline project. The "ce" column contains the cost-effectiveness of each project. For a given project, this is calculated by dividing its benefit by its cost (as per the "benefit" and "cost" columns). The "rank" column contains the rank for each project, with the most cost-effective project having a rank of one.

By combining the project cost-effectiveness table with the project funding table, we can see that many of the projects selected for funding are not actually the most cost-effective projects. This is because simply selecting the most cost-effective projects might double up on the same features and reduce the overall effectiveness of the solution [see @r5 for more information].

# print the ranks of the priority projects
p1_pce$rank[s1_projects$status > 0]

We could also save these table to our computer (e.g. so we could view them in a spreadsheet program, such as Microsoft Excel).

# print folder where the file will be saved
cat(getwd(), "\n")

# save table to file
write.table(s1, "solutions.csv" , sep = ",", row.names = FALSE)
write.table(s1_projects, "project_statuses.csv" , sep = ",", row.names = FALSE)
write.table(p1_pce, "project_ce.csv" , sep = ",", row.names = FALSE)

Since visualizations are an effective way to understand data, let's create a bar plot to visualize how well this solution would conserve the features. In this plot, each bar corresponds to a different feature, the width of each bar corresponds to its expected probability of persistence, the color of each bar corresponds to the feature's weight (since we didn't specify any weights, all bars are the same color). Asterisks denote features that benefit from fully funded projects, and open circles (if any) denote features that do not benefit from fully funded projects but may indirectly benefit from partially funded projects.

# plot solution
plot(p1, s1)

mean_percent <- round(mean(c(as.matrix(s1[, p1$feature_names()]))) * 100)

Overall, we can see that most species have a fairly decent chance at persisting into the future (approx. r mean_percent%). But when making this prioritization, we assumed that we valued the persistence of each feature equally. It is often important to account for the fact that certain features are valued more highly than other features when making prioritizations (e.g. for cultural or taxonomic reasons). The features table that we created earlier contains a "weight" column, and features with larger values mean that they are more important. Let's quickly visualize the feature weight values.

# print features table
print(features)

# plot histogram of feature weights
ggplot(data = features, aes(weight)) +
geom_histogram(bins = 30) +
scale_x_continuous(labels = scales::comma) +
xlab("Feature weight") +
ylab("Frequency")

We can see that most features have a low weighting (less than 1,000), but a few features have very high weightings (greater than 500,000). Let's make a new problem based on the original problem, add the feature weights to the new problem, and then solve this new problem.

# build on existing problem and add feature weights
p2 <- p1 %>%
      add_feature_weights("weight")

# print problem
print(p2)

# solve problem
s2 <- solve(p2)

# print solution
print(s2)

assertthat::assert_that(mean(as.matrix(s2[, action_names(p2)])) < 1)

# plot solution
plot(p2, s2)

We can also examine how adding feature weights changed our solution.

# print actions prioritized for funding in first solution
actions$name[which(as.logical(as.matrix(s1[, action_names(p1)])))]

# print actions prioritized for funding in second solution
actions$name[which(as.logical(as.matrix(s2[, action_names(p2)])))]

# calculate number of actions funded in both solutions
sum((as.matrix(s1[, action_names(p1)]) == 1) &
    (as.matrix(s2[, action_names(p2)]) == 1))

a1 <- actions$name[which(as.logical(as.matrix(s1[, action_names(p1)])))]

a2 <- actions$name[which(as.logical(as.matrix(s2[, action_names(p2)])))]

assertthat::assert_that(!setequal(a1, a2))

Earlier, we talked about the oppr R package using the default exact algorithm solver. Although you should have the lpSolveAPI solver automatically installed, we strongly recommend installing the gurobi R package and the Gurobi optimization suite. This is because Gurobi can solve optimization problems much faster than any other software currently supported and it can also easily generate multiple solutions. Unfortunately, you will have to install Gurobi manually, but please see the documentation for the solver for instructions (i.e. enter the command ?add_gurobi_solver into the console). If you have Gurobi installed, let's try using the Gurobi solver to generate multiple solutions. Here, we will manually specify the Gurobi solver and request 1,000 solutions (though we will probably obtain less than 1,000 solutions, so if we actually wanted 1,000 solutions then we would need to specify a much larger number).

# create new problem, with Gurobi solver added and request multiple solutions
# note that we set the gap to 0.5 because we are not necessarily interested
# in the top 1,000 solutions (for more information on why read the Gurobi
# documentation on solution pools)
p3 <- p2 %>%
      add_gurobi_solver(gap = 0.5, number_solution = 1000)

# solve problem
s3 <- solve(p3)

# print solution
print(s3)

assertthat::assert_that(nrow(s3) > 10)

We obtained r nrow(s3) solutions. Let's briefly explore them.

# plot histogram of objective values, and add red dashed line to indicate the
# objective value for the optimal solution
ggplot(data = s3, aes(obj)) +
geom_histogram(bins = 10) +
geom_vline(xintercept = s2$obj, color = "red", linetype = "dashed") +
scale_x_continuous(labels = scales::comma) +
xlab("Expected richness (objective function)") +
ylab("Frequency") +
theme(plot.margin = unit(c(5.5, 10.5, 5.5, 5.5), "pt"))

We can see that some of our solutions performed much better than other solutions. And we can also see that there are several solutions that perform nearly as well as the optimal solution.

Assessing action importance

After obtaining a solution to a project prioritization problem, it is often important to understand which of priority actions are most "important" [or in other words: irreplaceable; @r27]. This is because it may not be possible to implement all actions immediately and simultaneously, and since some conservation projects may be less likely to succeed if their management actions are delayed, it may be useful to provide decision makers with a measure of importance for each priority action. For example, if a pest eradication project is delayed, then the success of the project may diminish as pest populations increase over time. One simple---and potentially inaccurate [but see @r26]---approach for assessing the importance of priority actions (i.e. actions selected for funding in a solution) is calculating the "selection frequency" of the actions. Given multiple solutions, this metric involves calculating the average number of times that each priority action is selected [@r19]. For illustrative purposes, we shall compute the selection frequency of the solutions we obtained previously (i.e. the s3 object).

# print the solution object to remind ourselves what it looks like
print(s3)

# calculate percentage of times each action was selected for
# funding in the solutions
actions$sel_freq <- apply(as.matrix(s3[, actions$name]), 2, mean) * 100

# print the actions table with the new column
print(actions)

# print top 10 most important actions based on selection frequency
head(actions[order(actions$sel_freq, decreasing = TRUE), ], n = 10)

# plot histogram showing solution frequency
ggplot(data = actions, aes(sel_freq)) +
geom_histogram(bins = 30) +
coord_cartesian(xlim = c(0, 100)) +
xlab("Selection frequency (%)") +
ylab("Frequency")

Although selection frequency might seem like an appealing metric, it suffers from an assumption that is unrealistic for most large-scale problems. Specifically, it assumes that our portfolio of solutions is a representative sample of the near-optimal solutions to the problem. And we do not currently have any method to verify this, except by enumerating all possible solutions---which is not feasible for reasonably sized project prioritization exercises. So now we shall examine a superior metric termed the "replacement cost" [@r20]. Given a set of priority actions and a project prioritization problem, the replacement cost can be calculated for a given priority action by re-solving the problem with the priority action locked out, and calculating the difference between the objective value based on the priority actions and the new objective value with the priority action locked out. So let's calculate the replacement cost for the priority actions in object s2 using the problem p2.

# print p2 to remind ourselves about the problem
print(p2)

# print s2 to remind ourselves about the solution
print(s2)

# calculate replacement costs for each priority action selected in s2
r2 <- replacement_costs(p2, s2)

# print output
print(r2)

The r1 table contains the replacement costs for each action and also various statistics associated with the solutions obtained when locking out each action. Here, each row corresponds to a different action. The "action" column contains the name of the actions (as per the actions table used when building the problem), the "cost" column contains the cost of the solutions obtained when each action was locked out, the "obj" column contains the objective value of the solutions when each action was locked out (i.e. the expected feature richness in this case), and the "rep_cost" column contains the replacement costs for the actions. Actions associated with larger replacement cost values are more irreplaceable, actions associated with missing (NA) replacement cost values were not selected for funding in the input solution (i.e. s2), and actions associated with infinite (Inf) replacement cost values are absolutely critical for meeting the constraints (though infinite values should only problems with minimum set objectives and potentially the baseline project).

# add replacement costs to action table
actions$rep_cost <- r2$rep_cost

# print actions, ordered by replacement cost
print(actions[order(actions$rep_cost, decreasing = TRUE), ])

# test correlation between selection frequencies and replacement costs
cor.test(x = actions$sel_freq, y = actions$rep_cost, method = "pearson")

# plot histogram of replacement costs,
ggplot(data = actions, aes(rep_cost)) +
geom_histogram(bins = 30, na.rm = TRUE) +
scale_x_continuous(labels = scales::comma) +
xlab("Replacement cost") +
ylab("Frequency")

Complementarity in project prioritizations

Broadly speaking, the principle of complementarity is that individual conservation actions should complement each other---in other words, they should not double up on the same biodiversity features---and when implemented together, conservation actions should conserve a comprehensive sample of biodiversity [@r23; @r27]. This principle was born from the profound realization that individual reserves need to provide habitat for different species in order to build a reserve network that provides habitat for many different species---even if this means selecting some individual reserves that do not provide habitat for as many species as other potential reserves [@r25]. In the context of project prioritization, this principle means that resources should be allocated in such a way that avoids doubling up on the same conservation features so that resources can be effectively allocated to as many conservation features as possible [@r5]. For instance, if decision makers consider it acceptable for features to have a 70% chance of persisting into the future, then we should avoid solutions which overly surpass this threshold (e.g. 99%) because we can allocate the limited resources to help other features reach this threshold. Such target thresholds can provide a transparent and effective method for establishing conservation priorities [@r24]. So, let's try developing a prioritization with conservation targets. Specifically, we will develop a prioritization that maximizes the number of features that have a 60% chance of persisting, subject to the same $1,000 budget as before.

# build problem
p4 <- problem(projects = projects, actions = actions, features = features,
              "name", "success", "name", "cost", "name") %>%
      add_max_targets_met_objective(budget = 1000) %>%
      add_absolute_targets(0.7) %>%
      add_binary_decisions()

# print problem
print(p4)

# solve problem
s4 <- solve(p4)

# print solution
print(s4)

# plot solution
plot(p4, s4)

But how would the number of features which meet the target change if we increased the budget? Or how would the number of features which meet the target change if we increased the target to 85%? These are common questions in project prioritization exercises [e.g. @r5; @r28; @r29]. So, let's try solving this problem with 70% and 85% targets under a range of different budgets and plot the relationships.

# specify budgets, ranging between zero and the total cost of all the budgets,
# with the total number of different budgets equaling 50
# (note that we would use a higher number for publications)
budgets <- seq(0, sum(actions$cost), length.out = 50)

# specify targets
targets <- c(0.7, 0.85)

# run prioritizations and compile results
comp_data <- lapply(targets, function(i) {
  o <- lapply(budgets, function(b) {
    problem(projects = projects, actions = actions, features = features,
            "name", "success", "name", "cost", "name") %>%
    add_max_targets_met_objective(budget = b) %>%
    add_absolute_targets(i) %>%
    add_binary_decisions() %>%
    add_default_solver(verbose = FALSE) %>%
    solve()
  })
  o <- as_tibble(do.call(rbind, o))
  o$budget <- budgets
  o$target <- paste0(i * 100, "%")
  o
})
comp_data <- as_tibble(do.call(rbind, comp_data))

# plot the relationship between the number of features that meet the target
# in a solution and the cost of a solution
ggplot(comp_data, aes(x = cost, y = obj, color = target)) +
geom_step() +
xlab("Solution cost ($)") +
ylab("Number of features with targets") +
labs(color = "Target")

We might also be interested in understanding how exactly how much it would cost to implement a set of management actions that would result in all of the features meeting a specific target. Let's see if we can find out how much it would cost to ensure that every feature has a 99% probability of persistence.

# build problem
p5 <- problem(projects = projects, actions = actions, features = features,
              "name", "success", "name", "cost", "name") %>%
      add_min_set_objective() %>%
      add_absolute_targets(0.99) %>%
      add_binary_decisions()

# print problem
print(p5)

# attempt to solve problem, but this will throw an error
s5 <- solve(p5)

assertthat::assert_that(inherits(try(solve(p5)), "try-error"))

We received an error instead of a solution. If we read the error message, then we can see that it is telling us---perhaps rather tersely---that there are no valid solutions to the problem (i.e. the problem is infeasible), because some features simply cannot obtain an 99% probability of persistence given the range of conservation projects that are available. So, let's see how much it would cost to ensure that every feature has a 60% chance of persistence.

# build problem
p6 <- problem(projects = projects, actions = actions, features = features,
              "name", "success", "name", "cost", "name") %>%
      add_min_set_objective() %>%
      add_absolute_targets(0.60) %>%
      add_binary_decisions()

# print problem
print(p6)

# solve problem
s6 <- solve(p6)

# print solution
print(s6)

# plot solution
plot(p6, s6)

Benchmarking conventional algorithms

Conventionally, heuristic algorithms have been used to develop project prioritizations [e.g. @r3; @r4]. Although solutions identified using these algorithms often perform better than solutions generated using using random processes [e.g. randomly selecting actions for funding until a budget is met; @r3], this is not an especially compelling benchmark. As talked about earlier, heuristic algorithms do not provide any guarantees on solution quality, and so should be avoided where possible [@r13]. To illustrate the pitfalls of relying on heuristic algorithms, let's generate a portfolio of solutions using a backwards heuristic algorithm.

# set budgets for which to create multiple solutions
budgets <- seq(0, sum(actions$cost), length.out = 100)

# generate solutions using heuristic algorithms
s7 <- lapply(budgets, function(b) {
  problem(projects = projects, actions = actions, features = features,
          "name", "success", "name", "cost", "name") %>%
  add_max_richness_objective(budget = b) %>%
  add_feature_weights("weight") %>%
  add_binary_decisions() %>%
  add_heuristic_solver(verbose = FALSE) %>%
  solve()
})
s7 <- as_tibble(do.call(rbind, s7))
s7$budget <- budgets

# print solutions
print(s7)

Now let's generate a portfolio of solutions using random processes.

# generate random solutions under the various budgets and store
# the objective value of the best and worst solutions
s8 <- lapply(budgets, function(b) {
  o <- problem(projects = projects, actions = actions, features = features,
              "name", "success", "name", "cost", "name") %>%
       add_max_richness_objective(budget = b) %>%
       add_feature_weights("weight") %>%
       add_binary_decisions() %>%
       add_random_solver(verbose = FALSE, number_solutions = 100) %>%
       solve()
  data.frame(budget = b, min_obj = min(o$obj), max_obj = max(o$obj))
})
s8 <- as_tibble(do.call(rbind, s8))

# print solutions
print(s8)

Now we can visualize how well the solutions identified using the heuristic algorithm compare to solutions generated using random processes. In the plot below, the orange line shows the performance of solutions generated using the heuristic algorithm, and the blue ribbon shows the performance of the best and worst solutions generated using random processes.

# make plot
ggplot() +
geom_ribbon(aes(x = budget, ymin = min_obj, ymax = max_obj), data = s8,
            color = "#3366FF26", fill = "#3366FF26") +
geom_step(aes(x = budget, y = obj), data = s7, color = "orange") +
scale_x_continuous(labels = scales::comma) +
scale_y_continuous(labels = scales::comma) +
xlab("Budget available ($)") +
ylab("Expected richness (objective function)") +
theme(axis.text.y = element_text(angle = 90, vjust = 1))

Here, we can see the that heuristic algorithm generally performs better than funding conservation projects using random processes. We can also see that for some budgets, the heuristic algorithm returns a worse solution (i.e. has a lower objective value) then solutions it found for lower budgets (i.e. where there is a step down in the orange line). Unfortunately, this behavior is normal because heuristic algorithms often deliver suboptimal solutions and the degree of suboptimality often varies depending on the budget. Despite these occasional drops in solution quality, you might be tempted to think that these results show that heuristic algorithms can perform pretty well on balance. But we can do better. Let's generate a series of solutions using exact algorithms.

# generate solutions
s9 <- lapply(budgets, function(b) {
  problem(projects = projects, actions = actions, features = features,
          "name", "success", "name", "cost", "name") %>%
  add_max_richness_objective(budget = b) %>%
  add_feature_weights("weight") %>%
  add_binary_decisions() %>%
  add_default_solver(verbose = FALSE) %>%
  solve()
})
s9 <- as_tibble(do.call(rbind, s9))
s9$budget <- budgets

# print solutions
print(s9)

Now let's redraw the previous graph and add a red line to the plot to represent the solutions generated using the exact algorithm solver.

# make plot
ggplot() +
geom_ribbon(aes(x = budget, ymin = min_obj, ymax = max_obj), data = s8,
            color = "#3366FF26", fill = "#3366FF26") +
geom_step(aes(x = budget, y = obj), data = s7, color = "orange") +
geom_step(aes(x = budget, y = obj), data = s9, color = "red") +
scale_x_continuous(labels = scales::comma) +
scale_y_continuous(labels = scales::comma) +
xlab("Budget available ($)") +
ylab("Expected richness (objective function)") +
theme(axis.text.y = element_text(angle = 90, hjust = 0.5, vjust = 1))

So, we can see that the exact algorithm solver performs much better than heuristic algorithms---even if heuristic algorithms perform better than random on average.

Conclusion

Hopefully, this tutorial has been useful. For more information and examples on using any of the functions presented in this tutorial, please refer to this package's documentation. For instance, you could learn about the mathematical formulations that underpin the objective functions (see ?add_max_richness_objective), how to lock in our lock out certain actions from the solutions (see ?constraints), or how to develop project prioritizations using phylogenies (see ?add_max_phylo_objective). But perhaps one of the best ways to learn how to use a new piece of software is to just try it out. Test it, try breaking it, make mistakes, and learn from them. We would recommend generating project prioritizations using simulated datasets (see ?simulate_ptm_data and ?simulate_ppp_data) and seeing if the solutions line up with what you expect. This way you can quickly verify that the problems you build actually mean what you think they mean. For instance, you can try playing around with the targets and see what effect they have on the solutions, or try playing around with weights and see what effect they have on the solutions.

Finally, if you have any questions about using the oppr R package or suggestions for improving its documentation or functionality (especially this tutorial), please post an issue on this package's online coding repository (https://github.com/prioritizr/oppr/issues).

Citation

citation("oppr")

References

Any scripts or data that you put into this service are public.

oppr documentation built on June 27, 2025, 9:06 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

oppr
Optimal Project Prioritization

Optimal Project Prioritization
In oppr: Optimal Project Prioritization

Overview

Tutorial

Introduction

Data simulation

Prioritizations with exact algorithms

Assessing action importance

Complementarity in project prioritizations

Benchmarking conventional algorithms

Conclusion

Citation

References

Try the oppr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

oppr Optimal Project Prioritization

Optimal Project Prioritization In oppr: Optimal Project Prioritization

Overview

Tutorial

Introduction

Data simulation

Prioritizations with exact algorithms

Assessing action importance

Complementarity in project prioritizations

Benchmarking conventional algorithms

Conclusion

Citation

References

Try the oppr package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

oppr
Optimal Project Prioritization

Optimal Project Prioritization
In oppr: Optimal Project Prioritization