knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%", message=FALSE, warning=FALSE ) library(coronavirus)
The coronavirus package provides a tidy format for the COVID-19 dataset collected by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The dataset includes daily new and death cases between January 2020 and March 2023 and recovery cases until August 2022.
More details available here, and a csv
format of the package dataset available here
Data source: https://github.com/CSSEGISandData/COVID-19
Additional documentation available on the following vignettes:
Install the CRAN version:
install.packages("coronavirus")
Install the Github version (refreshed on a daily bases):
# install.packages("devtools") devtools::install_github("RamiKrispin/coronavirus")
The package provides the following two datasets:
date
- The date of the observation, using Date
classprovince
- Name of province/state, for countries where data is provided split across multiple provinces/statescountry
- Name of country/regionlat
- The latitude codelong
- The longitude code type
- An indicator for the type of cases (confirmed, death, recovered)cases
- Number of cases on given dateuid
- Country codeprovince_state
- Province or state if applicableiso2
- Officially assigned country code identifiers with two-letteriso3
- Officially assigned country code identifiers with three-lettercode3
- UN country codefips
- Federal Information Processing Standards code that uniquely identifies counties within the USAcombined_key
- Country and province (if applicable)population
- Country or province populationcontinent_name
- Continent namecontinent_code
- Continent codecovid19_vaccine - a tidy (long) format of the the Johns Hopkins Centers for Civic Impact global vaccination dataset by country. This dataset includes the following columns:
country_region
- Country or region namedate
- Data collection date in YYYY-MM-DD formatdoses_admin
- Cumulative number of doses administered. When a vaccine requires multiple doses, each one is counted independentlypeople_partially_vaccinated
- Cumulative number of people who received at least one vaccine dose. When the person receives a prescribed second dose, it is not counted twicepeople_fully_vaccinated
- Cumulative number of people who received all prescribed doses necessary to be considered fully vaccinatedreport_date_string
- Data report date in YYYY-MM-DD formatuid
- Country codeprovince_state
- Province or state if applicableiso2
- Officially assigned country code identifiers with two-letteriso3
- Officially assigned country code identifiers with three-lettercode3
- UN country codefips
- Federal Information Processing Standards code that uniquely identifies counties within the USAlat
- Latitudelong
- Longitudecombined_key
- Country and province (if applicable)population
- Country or province populationcontinent_name
- Continent namecontinent_code
- Continent codeThe refresh_coronavirus_jhu
function enables to load of the data directly from the package repository using the Covid19R project data standard format:
covid19_df <- refresh_coronavirus_jhu() head(covid19_df)
data("coronavirus") head(coronavirus)
Summary of the total confrimed cases by country (top 20):
library(dplyr) summary_df <- coronavirus %>% filter(type == "confirmed") %>% group_by(country) %>% summarise(total_cases = sum(cases)) %>% arrange(-total_cases) summary_df %>% head(20)
Summary of new cases during the past 24 hours by country and type (as of r max(coronavirus$date)
):
library(tidyr) coronavirus %>% filter(date == max(date)) %>% select(country, type, cases) %>% group_by(country, type) %>% summarise(total_cases = sum(cases)) %>% pivot_wider(names_from = type, values_from = total_cases) %>% arrange(-confirmed)
Plotting daily confirmed and death cases in Brazil:
library(plotly) coronavirus %>% group_by(type, date) %>% summarise(total_cases = sum(cases)) %>% pivot_wider(names_from = type, values_from = total_cases) %>% arrange(date) %>% mutate(active = confirmed - death - recovery) %>% mutate(active_total = cumsum(active), recovered_total = cumsum(recovery), death_total = cumsum(death)) %>% plot_ly(x = ~ date, y = ~ active_total, name = 'Active', fillcolor = '#1f77b4', type = 'scatter', mode = 'none', stackgroup = 'one') %>% add_trace(y = ~ death_total, name = "Death", fillcolor = '#E41317') %>% add_trace(y = ~recovered_total, name = 'Recovered', fillcolor = 'forestgreen') %>% layout(title = "Distribution of Covid19 Cases Worldwide", legend = list(x = 0.1, y = 0.9), yaxis = list(title = "Number of Cases"), xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))
library(plotly) df <- coronavirus %>% filter(country == "Brazil", is.na(province)) p_1 <- plot_ly(data = df %>% filter(type == "confirmed"), x = ~ date, y = ~ cases, name = "Confirmed", type = "scatter", mode = "line") %>% layout(yaxis = list(title = "Cases"), xaxis = list(title = "")) p_2 <- plot_ly(data = df %>% filter(type == "death"), x = ~ date, y = ~ cases, name = "Death", line = list(color = "red"), type = "scatter", mode = "line") %>% layout(yaxis = list(title = "Cases"), xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering")) p1 <- subplot(p_1, p_2, nrows = 2, titleX = TRUE, titleY = TRUE) %>% layout(title = "Brazil - Daily Confirmed and Death Cases", margin = list(t = 60, b = 60, l = 40, r = 40), legend = list(x = 0.05, y = 1)) orca(p1, "man/figures/brazil_cases.svg")
Plot the confirmed cases distribution by counrty with treemap plot:
conf_df <- coronavirus %>% filter(type == "confirmed") %>% group_by(country) %>% summarise(total_cases = sum(cases)) %>% arrange(-total_cases) %>% mutate(parents = "Confirmed") %>% ungroup() plot_ly(data = conf_df, type= "treemap", values = ~total_cases, labels= ~ country, parents= ~parents, domain = list(column=0), name = "Confirmed", textinfo="label+value+percent parent")
conf_df <- coronavirus %>% filter(type == "confirmed") %>% group_by(country) %>% summarise(total_cases = sum(cases), .groups = "drop") %>% arrange(-total_cases) %>% mutate(parents = "Confirmed") %>% ungroup() p2 <- plot_ly(data = conf_df, type= "treemap", values = ~total_cases, labels= ~ country, parents= ~parents, domain = list(column=0), name = "Confirmed", textinfo="label+value+percent parent") orca(p2, "man/figures/treemap_conf.svg")
data(covid19_vaccine) head(covid19_vaccine)
Taking a snapshot of the data from the most recent date available and calculate the ratio between total doses admin and the population size:
df_summary <- covid19_vaccine |> filter(date == max(date)) |> select(date, country_region, doses_admin, total = people_at_least_one_dose, population, continent_name) |> mutate(doses_pop_ratio = doses_admin / population, total_pop_ratio = total / population) |> filter(country_region != "World", !is.na(population), !is.na(total)) |> arrange(- total) head(df_summary, 10)
Plot of the total doses and population ratio by country:
# Setting the diagonal lines range line_start <- 10000 line_end <- 1500 * 10 ^ 6 # Filter the data d <- df_summary |> filter(country_region != "World", !is.na(population), !is.na(total)) # Replot it p3 <- plot_ly() |> add_markers(x = d$population, y = d$total, text = ~ paste("Country: ", d$country_region, "<br>", "Population: ", d$population, "<br>", "Total Doses: ", d$total, "<br>", "Ratio: ", round(d$total_pop_ratio, 2), sep = ""), color = d$continent_name, type = "scatter", mode = "markers") |> add_lines(x = c(line_start, line_end), y = c(line_start, line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_lines(x = c(line_start, line_end), y = c(0.5 * line_start, 0.5 * line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_lines(x = c(line_start, line_end), y = c(0.25 * line_start, 0.25 * line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_annotations(text = "1:1", x = log10(line_end * 1.25), y = log10(line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "1:2", x = log10(line_end * 1.25), y = log10(0.5 * line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "1:4", x = log10(line_end * 1.25), y = log10(0.25 * line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "Source: Johns Hopkins University - Centers for Civic Impact", showarrow = FALSE, xref = "paper", yref = "paper", x = -0.05, y = - 0.33) |> layout(title = "Covid19 Vaccine - Total Doses vs. Population Ratio (Log Scale)", margin = list(l = 50, r = 50, b = 90, t = 70), yaxis = list(title = "Number of Doses", type = "log"), xaxis = list(title = "Population Size", type = "log"), legend = list(x = 0.75, y = 0.05))
# Setting the diagonal lines range line_start <- 10000 line_end <- 1500 * 10 ^ 6 # Filter the data d <- df_summary |> filter(country_region != "World", !is.na(population), !is.na(total)) # Replot it p3 <- plot_ly() |> add_markers(x = d$population, y = d$total, text = ~ paste("Country: ", d$country_region, "<br>", "Population: ", d$population, "<br>", "Total Doses: ", d$total, "<br>", "Ratio: ", round(d$total_pop_ratio, 2), sep = ""), color = d$continent_name, type = "scatter", mode = "markers") |> add_lines(x = c(line_start, line_end), y = c(line_start, line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_lines(x = c(line_start, line_end), y = c(0.5 * line_start, 0.5 * line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_lines(x = c(line_start, line_end), y = c(0.25 * line_start, 0.25 * line_end), showlegend = FALSE, line = list(color = "gray", width = 0.5)) |> add_annotations(text = "1:1", x = log10(line_end * 1.25), y = log10(line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "1:2", x = log10(line_end * 1.25), y = log10(0.5 * line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "1:4", x = log10(line_end * 1.25), y = log10(0.25 * line_end * 1.25), showarrow = FALSE, textangle = -25, font = list(size = 8), xref = "x", yref = "y") |> add_annotations(text = "Source: Johns Hopkins University - Centers for Civic Impact", showarrow = FALSE, xref = "paper", yref = "paper", x = -0.05, y = - 0.33) |> layout(title = "Covid19 Vaccine - Total Doses vs. Population Ratio (Log Scale)", margin = list(l = 50, r = 50, b = 90, t = 70), yaxis = list(title = "Number of Doses", type = "log"), xaxis = list(title = "Population Size", type = "log"), legend = list(x = 0.75, y = 0.05)) orca(p3, "man/figures/country_summary.svg")
Note: Currently, the dashboard is under maintenance due to recent changes in the data structure. Please see this issue
A supporting dashboard is available here
The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.