sars2pack

Launch Rstudio Binder R-CMD-check

knitr::opts_chunk$set(warning=FALSE,message=FALSE, cache=TRUE,
                      fig.width=9, fig.height=6, out.width = '100%'
                      )
knitr::opts_knit$set(upload.fun = knitr::imgur_upload)

Questions immediately addressed by sars2pack datasets

Installation

# If you do not have BiocManager installed:
install.packages('BiocManager')

# Then, if sars2pack is not already installed:
BiocManager::install('seandavi/sars2pack')

After the one-time installation, load the packge to get started.

library(sars2pack)

Available datasets

library(knitr)
library(kableExtra)
library(tibble)
library(dplyr)
library(purrr)
library(sars2pack)
library(yaml)
b = available_datasets()
b %>% dplyr::mutate(url=sprintf('[LINK](%s)',url)) %>%
    mutate_all(linebreak) %>%
    arrange(data_type) %>%
    kable(booktabs=TRUE, escape=FALSE) %>%
    kable_styling("striped")

Case tracking

Updated tracking of city, county, state, national, and international confirmed cases, deaths, and testing is critical to driving policy, implementing interventions, and measuring their effectiveness. Case tracking datasets include date, a count of cases, deaths, testing, hospitalizations, and usually numerous other pieces of information related to location of reporting, etc.

Accessing case-tracking datasets is typically done with one function per dataset. The example here is data from the European Centers for Disease Control, or ECDC.

ecdc = ecdc_data()

Get a quick overview of the dataset.

head(ecdc)

The ecdc dataset is just a data.frame (actually, a tibble), so applying standard R or tidyverse functionality can get answers to basic questions with little code. The next code block generates a top10 of countries with the most deaths recorded to date. Note that if you do this on your own computer, the data will be updated to today's data values.

library(dplyr)
top10 = ecdc %>% filter(subset=='deaths') %>% 
    group_by(location_name) %>%
    filter(count==max(count)) %>%
    arrange(desc(count)) %>%
    head(10) %>% select(-starts_with('iso'),-continent,-subset) %>%
    mutate(rate_per_100k = 1e5*count/population_2019)

Finally, present a nice table of those countries:

knitr::kable(
    top10,
    caption = "Reported COVID-19-related deaths in ten most affected countries.",
    format = 'pandoc')

Examine the spread of the pandemic throughout the world by examining cumulative deaths reported for the top 10 countries above.

ecdc_top10 = ecdc %>% filter(location_name %in% top10$location_name & subset=='deaths')
plot_epicurve(ecdc_top10,
              filter_expression = count > 10, 
              color='location_name')

Comparing the features of disease spread is easiest if all curves are shifted to "start" at the same absolute level of infection. In this case, shift the origin for all countries to start at the first time point when more than 100 cumulative cases had been observed. Note how some curves cross others which is evidence of less infection control at the same relative time in the pandemic for that country (eg., Brazil).

ecdc_top10 %>% align_to_baseline(count>100,group_vars=c('location_name')) %>%
    plot_epicurve(date_column = 'index',color='location_name')

Contributions

Pull requests are gladly accepted on Github.

Adding new datasets

See the Adding new datasets vignette.

Similar work



seandavi/sars2pack documentation built on May 13, 2022, 3:41 p.m.