knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 7, fig.align = "center" )
First load the package. We also load several other packages to help quickly explore the data.
library(getTBinR) library(dplyr) library(ggplot2) library(knitr)
Get TB burden data with a single function call. This will download the data if it has never been accessed and then save a local copy to R's temporary directory (see
tempdir()). If a local copy exists from the current session then this will be loaded instead.
tb_burden <- get_tb_burden() tb_burden
On top of the core datasets provided by default,
getTBinR also supports importing multiple other datasets. These include data on latent TB, HIV surveillance, intervention budgets, and outcomes. The currently supported datasets are listed below,
These datasets can be imported into
R by supplying the name of the required dataset to the
additional_datasets argument of
get_tb_burden (or any of the various plotting/summary functions). Alternatively, they can all be imported in one go using
additional_datasets = "all", as below,
get_tb_burden(additional_datasets = "all")
Once imported, these datasets can be used in the plotting and summary functions provided by
getTBinR (by passing them to their
df argument or using the
additional_datasets argument in each function).
The WHO provides a large, detailed, data dictionary for use with the TB burden data. However, searching through this dataset can be tedious. To streamline this process
getTBinR provides a search function to find the definition of a single or multiple variables. Again if not previously used this function will download the data dictionary to the temporary directory, but in subsequent uses will load a local copy.
vars_of_interest <- search_data_dict(var = c("country", "e_inc_100k", "e_inc_100k_lo", "e_inc_100k_hi")) knitr::kable(vars_of_interest)
We might also want to search the variable definitions for key phrases, for example mortality.
defs_of_interest <- search_data_dict(def = c("mortality")) knitr::kable(defs_of_interest)
Finally we could both search for a known variable and for key phrases in variable definitions.
vars_defs_of_interest <- search_data_dict(var = c("country"), def = c("mortality")) knitr::kable(vars_defs_of_interest)
search_data_dict can also be used to explore the details of the variables included in each dataset. For example if we could explore all the variables included in the Latent TB dataset,
dataset_of_interest <- search_data_dict(dataset = "Latent") knitr::kable(dataset_of_interest)
To start exploring the WHO TB data we map, the most recently available, global TB incidence rates. Mapping data can help identify spatial patterns.
getTBinR::map_tb_burden(metric = "e_inc_100k")
To showcase how quickly we can go from no data to plotting informative graphs we quickly explore incidence rates for all countries in the WHO data.
getTBinR::plot_tb_burden_overview(metric = "e_inc_100k")
Another way to compare incidence rates in countries is to look at the annual percentage change. The plot below only shows countries with a maximum incidence rate above 5 per 100,000.
higher_burden_countries <- tb_burden %>% group_by(country) %>% summarise(e_inc_100k = min(e_inc_100k)) %>% filter(e_inc_100k > 5) %>% pull(country) %>% unique getTBinR::plot_tb_burden_overview(metric = "e_inc_100k", interactive = FALSE, annual_change = TRUE, countries = higher_burden_countries)
We might also be interested in getting a regional/global overview of TB incidence rates (Hint: Use
search_data_dict to look up
e_inc_100k to see what role this is playing here). See
?plot_tb_burden_summary for more ways to summarise TB metrics.
getTBinR::plot_tb_burden_summary(conf = NULL, metric_label = "e_inc_100k")
We could also get a quick overview of TB in a given group of countries in comparison to regional and global trends by looking at the most recent data using
summarise_metric. This is used extensively in the supplied TB report (
render_tb_report) to provide summary statistics.
## Get a summary of TB incidence rates for the united kingdom and germany summarise_metric(metric = "e_inc_100k", countries = c("United Kingdom", "Germany")) %>% kable
Diving deeper into the data lets plot a sample of 9 countries using the inbuilt
plot_tb_burden function. Again plotting incidence rates, but this time with 95% confidence intervals. As you can see this isn't a hugely informative graph. Lets improve it!
## Take a random sample of countries sample_countries <- sample(unique(tb_burden$country), 9) plot_tb_burden(tb_burden, metric = "e_inc_100k", countries = sample_countries)
We have faceted by country so that we can more easily see what is going on. This allows us to easily explore between country variation - depending on the sample there is likely to be a lot of this.
plot_tb_burden(tb_burden, metric = "e_inc_100k", countries = sample_countries, facet = "country")
To explore within country variation we need to change the scale of the y axis.
plot_tb_burden(tb_burden, metric = "e_inc_100k", countries = sample_countries, facet = "country", scales = "free_y")
We might also be interested in mortality in both HIV negative and HIV positive cases in our sample countries. We can also look at this using
plot_tb_burden as follows. Note we can do this without specifying the TB burden data, the plotting function will automatically find it either locally or remotely.
plot_tb_burden(metric = "e_mort_exc_tbhiv_100k", countries = sample_countries, facet = "country", scales = "free_y")
plot_tb_burden(metric = "e_mort_tbhiv_100k", countries = sample_countries, facet = "country", scales = "free_y")
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.