knitr::opts_chunk$set( collapse = TRUE, comment = "#>", message = FALSE, warning = FALSE, out.width = "100%", fig.height = 7, dpi = 300 )
library(outbreaker2) library(o2ools)
The o2ools package provides helper functions to process and summarise the results from outbreaker2.\
Below we use the default fake_outbreak
example in the outbreaker2 vignette.
# The input data head(linelist) # outbreaker2 result out
By default outbreaker2 will return a dataframe where each case is referred to by an integer based on the order of the cases in the linelist. The columns of this dataframe are named according to the parameters of interest. For example, t_inf_1
refers to the infection time of case 1, alpha_1
refers to the ancestor of case 1, etc.
To swap-in our real case IDs, we use identify()
:
out_id <- identify(out, ids = linelist$name) head(out_id)
To extract the individual trees from the posterior sample, we use get_trees()
. This function will return a list of trees, one for each sample in the posterior. You can add additional outbreaker2 results to the trees, such as the infection times (t_inf
), the number of generations (kappa
) and other attributes from the linelist.
trees <- get_trees(out_id, kappa = TRUE, t_inf = TRUE, group = linelist$group) #100th sample head(trees[[100]])
Accuracy is the proportion of cases that are correctly identified across the posterior sample. You can only use this function if you have the true ancestry information for each case.
true_tree <- data.frame( from = as.character(outbreaker2::fake_outbreak$ances), to = linelist$id, stringsAsFactors = FALSE ) accuracy <- get_accuracy(out, true_tree) hist(accuracy)
Entropy measures the uncertainty in a case's ancestry. The more potential ancestors a case has, the greater its entropy. By default, entropy is bounded between 0 and 1, where 0 indicates no uncertainty about a case's ancestry, and 1 indicates maximum uncertainty.
entropy <- get_entropy(out_id) barplot( entropy, horiz = TRUE, las = 1 )
To add the results from outbreaker2 to the linelist, we can use augment_linelist()
. This function will add summary statistics for each case, such as the infection times inferred by outbreaker2.
# augment linelist with summaries of t_inf and kappa o2linelist <- augment_linelist( out, linelist, params = c("t_inf","kappa"), summary_fns = list( mean = function(x) mean(x, na.rm = TRUE), q25 = function(x) quantile(x, .25, na.rm = TRUE), q75 = function(x) quantile(x, .75, na.rm = TRUE) ) ) head(o2linelist)
A consensus tree represents, for each case, their most frequent ancestor across the posterior sample. It provides a useful summary of the results but may violate the strict topological constraints of a tree, as it can include cycles (e.g. A is the most frequent ancestor of B, but B is also the most frequent ancestor of A). Here, frequency
refers to the proportion of posterior samples in which the infector–infectee pair appears.
consensus_tree <- get_consensus(out) head(consensus_tree)
We can plot the consensus tree using the augmented linelist and the epicontacts package.
library(epicontacts) epi <- make_epicontacts( linelist = o2linelist, contacts = subset(consensus_tree, !is.na(from)), #remove NA edges (i.e. imports) directed = TRUE ) # plot(epi) basic plot # colour setting epi$linelist$group <- factor(epi$linelist$group, levels = c("hcw", "patient")) my_pal <- function(n) { c("orange", "purple")[1:n] } vis_epicontacts( epi, thin = FALSE, #show imports label = "name", node_shape = "group", shapes = c("hcw" = "user-md", "patient" = "bed"), # https://fontawesome.com/v4/icons/ node_color = "group", edge_arrow = "to", col_pal = my_pal, edge_col_pal = my_pal, )
Or we can convert the tree to an igraph/tidygraph object and plot it using the ggraph package.
library(igraph) library(tidygraph) library(ggraph) g <- epicontacts:::as.igraph.epicontacts(epi) |> as_tbl_graph() layout_data <- create_layout(g, layout = 'kk') layout_data$x <- layout_data$onset p <- ggraph(layout_data)+ geom_edge_link( aes( edge_width = frequency, color = .N()$group[from] # .N() to accesses node data ), arrow = arrow(length = unit(2.5, 'mm')), end_cap = circle(3, 'mm') ) + geom_node_point( aes(fill = group), shape = 21, colour = "black", size = 5 ) + geom_node_text( aes(label = epicontacts_name), repel = TRUE, size = 3, nudge_y = 0.1, bg.color = "white", bg.r = 0.1 ) + scale_edge_width("Posterior support", range = c(0.1, 1), breaks = c(0.8, 0.9, 1)) + scale_fill_manual( "", values = c("hcw" = "orange", "patient" = "purple"), breaks = c("hcw", "patient") ) + scale_edge_colour_manual( "", values = c("hcw" = "orange", "patient" = "purple"), breaks = c("hcw", "patient") )+ theme_bw() + labs(x = "Onset date", y = "")+ theme( axis.line.y = element_blank(), axis.text.y = element_blank(), axis.ticks.y = element_blank(), panel.grid.major.y = element_blank(), panel.grid.minor.y = element_blank(), legend.position = "bottom" ) print(p)
The maximum posterior tree is the single most likely transmission tree, corresponding to the MCMC iteration with the highest posterior probability. You can extract it as follows:
max_post <- trees[[which.max(out$post)]] #Same as before, create and plot an epicontacts object epi <- make_epicontacts( linelist = o2linelist, contacts = subset(max_post, !is.na(from)), #remove NA edges (i.e. imports) directed = TRUE )
You can also use filter_chain()
to restrict the output to "trees" that meet a specific condition. For instance, to keep only direct transmissions (i.e. kappa == 1
), run:
out_direct <- filter_chain(out, param = "kappa", thresh = 1, comparator = "==", target = "alpha")
This masks any transmission links involving intermediate cases, allowing you to focus on direct infector–infectee pairs.
For additional analyses check the other vignettes on the package's website.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.