knitr::opts_chunk$set(echo = TRUE, cache = TRUE, warning = FALSE, message = FALSE) devtools::load_all() library(InfectionTrees) library(tidyr) library(dplyr) library(kableExtra) library(ggplot2) theme_set(theme_bw() + theme(axis.title = element_text()))
In this vignette, we will explore the data set provided here in data(tb_clean)
, which is a data set originally collected and analyzed in Xie et al. (2018).
library(InfectionTrees) library(tidyverse) library(kableExtra)
The data tb_clean
has r nrow(tb_clean)
rows and r ncol(tb_clean)
columns, where each row in the data corresponds to an individual with a detected case of Tuberculosis (TB).
The columns include demographic characteristics about the patients as well as two diagnostic tests of the disease, including a AFB sputum smear test, which we analyze here.
tb_sub <- tb_clean %>% mutate(cluster_id = group) %>% group_by(cluster_id) %>% mutate(cluster_size = n()) %>% ungroup()
This data has r length(unique(tb_sub$cluster_id))
clusters with the following distribution of cluster sizes:
sizes <- tb_sub %>% group_by(cluster_id) %>% summarize(cluster_size = n()) %>% ungroup() %>% group_by(cluster_size) %>% summarize(freq = n()) sizes %>% kable() %>% kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"), full_width = FALSE, position = "center")
This means that r round(sizes$freq[sizes$cluster_size == 1] / sum(sizes$freq) * 100, 1)
% of the clusters are singletons and r round(sum(sizes$freq[sizes$cluster_size %in% 1:3]) / sum(sizes$freq) * 100, 1)
% of the clusters have size at most 3.
A single cluster of individuals looks like the following:
tb_sub %>% filter(group == "27") %>% arrange(rel_time) %>% select(group, rel_time, everything()) %>% kable() %>% kable_styling(bootstrap_options = c("condensed", "hover", "striped", "responsive"), full_width = FALSE, position = "center")
In our data set of r nrow(tb_sub)
individuals, r sum(tb_sub$hivstatus == "Positive")
or r round(sum(tb_sub$hivstatus == "Positive") / nrow(tb_sub) * 100)
% of individuals are HIV+. In the clusters with more than 3 individuals, r sum(tb_sub$hivstatus == "Positive" & tb_sub$cluster_size > 3)
of r sum(tb_sub$cluster_size > 3)
or r round(sum(tb_sub$hivstatus == "Positive" & tb_sub$cluster_size > 3) / sum(tb_sub$cluster_size > 3) * 100)
% of individuals are HIV+.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.