Processing the output of outbreaker2 with o2ools
In o2ools: Tools for 'outbreaker2'

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  message = FALSE,
  warning = FALSE,
  out.width = "100%",
  fig.height = 7,
  dpi = 300
)

library(outbreaker2)
library(o2ools)

Introduction

The o2ools package provides helper functions to process and summarise the results from outbreaker2.\

Below we use the default fake_outbreak example in the outbreaker2 vignette.

# The input data 
head(linelist)

# outbreaker2 result
out

Identify

By default outbreaker2 will return a dataframe where each case is referred to by an integer based on the order of the cases in the linelist. The columns of this dataframe are named according to the parameters of interest. For example, t_inf_1 refers to the infection time of case 1, alpha_1 refers to the ancestor of case 1, etc.

To swap-in our real case IDs, we use identify():

out_id <- identify(out, ids = linelist$name)
head(out_id)

Get individual trees

To extract the individual trees from the posterior sample, we use get_trees(). This function will return a list of trees, one for each sample in the posterior. You can add additional outbreaker2 results to the trees, such as the infection times (t_inf), the number of generations (kappa) and other attributes from the linelist.

trees <- get_trees(out_id, kappa = TRUE, t_inf = TRUE, group = linelist$group)
#100th sample
head(trees[[100]])

Performance

Accuracy

Accuracy is the proportion of cases that are correctly identified across the posterior sample. You can only use this function if you have the true ancestry information for each case.

true_tree <- data.frame(
  from = as.character(outbreaker2::fake_outbreak$ances),
  to   = linelist$id,
  stringsAsFactors = FALSE
)

accuracy <- get_accuracy(out, true_tree)
hist(accuracy)

Entropy

Entropy measures the uncertainty in a case's ancestry. The more potential ancestors a case has, the greater its entropy. By default, entropy is bounded between 0 and 1, where 0 indicates no uncertainty about a case's ancestry, and 1 indicates maximum uncertainty.

entropy <- get_entropy(out_id)
barplot(
  entropy,
  horiz = TRUE,
  las = 1
  )

Visualisation

Augment the linelist with outbreaker2's results

To add the results from outbreaker2 to the linelist, we can use augment_linelist(). This function will add summary statistics for each case, such as the infection times inferred by outbreaker2.

# augment linelist with summaries of t_inf and kappa
o2linelist <- augment_linelist(
  out, linelist,
  params  = c("t_inf","kappa"),
  summary_fns = list(
    mean = function(x) mean(x, na.rm = TRUE),
    q25  = function(x) quantile(x, .25, na.rm = TRUE),
    q75  = function(x) quantile(x, .75, na.rm = TRUE)
  )
)
head(o2linelist)

Consensus tree

A consensus tree represents, for each case, their most frequent ancestor across the posterior sample. It provides a useful summary of the results but may violate the strict topological constraints of a tree, as it can include cycles (e.g. A is the most frequent ancestor of B, but B is also the most frequent ancestor of A). Here, frequency refers to the proportion of posterior samples in which the infector–infectee pair appears.

consensus_tree <- get_consensus(out)
head(consensus_tree)

We can plot the consensus tree using the augmented linelist and the epicontacts package.

library(epicontacts)
epi <- make_epicontacts(
  linelist = o2linelist,
  contacts = subset(consensus_tree, !is.na(from)),
  #remove NA edges (i.e. imports)
  directed = TRUE
)
# plot(epi) basic plot

# colour setting
epi$linelist$group <- factor(epi$linelist$group, levels = c("hcw", "patient"))
my_pal <- function(n) {
  c("orange", "purple")[1:n]
}

vis_epicontacts(
  epi,
  thin = FALSE, #show imports
  label = "name",
  node_shape = "group",
  shapes = c("hcw" = "user-md", "patient" = "bed"), # https://fontawesome.com/v4/icons/
  node_color = "group",
  edge_arrow = "to",
  col_pal       = my_pal,
  edge_col_pal  = my_pal,
)

Or we can convert the tree to an igraph/tidygraph object and plot it using the ggraph package.

library(igraph)
library(tidygraph)
library(ggraph)

g <- epicontacts:::as.igraph.epicontacts(epi) |>
  as_tbl_graph()

layout_data <- create_layout(g, layout = 'kk')
layout_data$x <- layout_data$onset

p <- ggraph(layout_data)+
  geom_edge_link(
    aes(
      edge_width = frequency,
      color = .N()$group[from] # .N() to accesses node data
    ),
    arrow = arrow(length = unit(2.5, 'mm')),
    end_cap = circle(3, 'mm')
  ) +
  geom_node_point(
    aes(fill = group), 
    shape = 21,
    colour = "black",
    size = 5
  ) +
  geom_node_text(
    aes(label = epicontacts_name),
    repel = TRUE, size = 3, nudge_y = 0.1, bg.color = "white", bg.r = 0.1
  ) +
  scale_edge_width("Posterior support",
                   range = c(0.1, 1),
                   breaks = c(0.8, 0.9, 1)) +
  scale_fill_manual(
    "",
    values = c("hcw" = "orange", "patient" = "purple"),
    breaks = c("hcw", "patient")
  ) +
  scale_edge_colour_manual(
    "",
    values = c("hcw" = "orange", "patient" = "purple"),
    breaks = c("hcw", "patient")
  )+
  theme_bw() +
  labs(x = "Onset date",
       y = "")+
  theme(
    axis.line.y = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    legend.position = "bottom"
  )

print(p)

Maximum posterior tree

The maximum posterior tree is the single most likely transmission tree, corresponding to the MCMC iteration with the highest posterior probability. You can extract it as follows:

max_post <- trees[[which.max(out$post)]]

#Same as before, create and plot an epicontacts object
epi <- make_epicontacts(
  linelist = o2linelist,
  contacts = subset(max_post, !is.na(from)), #remove NA edges (i.e. imports)
  directed = TRUE
)

You can also use filter_chain() to restrict the output to "trees" that meet a specific condition. For instance, to keep only direct transmissions (i.e. kappa == 1), run:

out_direct <- filter_chain(out, param = "kappa", thresh = 1, comparator = "==", target = "alpha")

This masks any transmission links involving intermediate cases, allowing you to focus on direct infector–infectee pairs.

Additional analyses

For additional analyses check the other vignettes on the package's website.

Any scripts or data that you put into this service are public.

o2ools documentation built on June 8, 2025, 10:18 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

o2ools
Tools for 'outbreaker2'

Processing the output of outbreaker2 with o2ools
In o2ools: Tools for 'outbreaker2'

Introduction

Identify

Get individual trees

Performance

Accuracy

Entropy

Visualisation

Augment the linelist with outbreaker2's results

Consensus tree

Maximum posterior tree

Additional analyses

Try the o2ools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

o2ools Tools for 'outbreaker2'

Processing the output of outbreaker2 with o2ools In o2ools: Tools for 'outbreaker2'

Introduction

Identify

Get individual trees

Performance

Accuracy

Entropy

Visualisation

Augment the linelist with outbreaker2's results

Consensus tree

Maximum posterior tree

Additional analyses

Try the o2ools package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

o2ools
Tools for 'outbreaker2'

Processing the output of outbreaker2 with o2ools
In o2ools: Tools for 'outbreaker2'