In RamiKrispin/coronavirus: The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

knitr::opts_chunk$set(
  collapse = TRUE, 
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%",
  message=FALSE, 
  warning=FALSE
)

library(coronavirus)

coronavirus

The coronavirus package provides a tidy format for the COVID-19 dataset collected by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins University. The dataset includes daily new and death cases between January 2020 and March 2023 and recovery cases until August 2022.

More details available here, and a csv format of the package dataset available here

Data source: https://github.com/CSSEGISandData/COVID-19

Source: Centers for Disease Control and Prevention's Public Health Image Library

Important Notes

As of March 10th, 2023, JHU CCSE stopped collecting and tracking new cases
As of August 4th, 2022 JHU CCSE stopped track recovery cases, please see this issue for more details
Negative values and/or anomalies may occurred in the data for the following reasons:
- The calculation of the daily cases from the raw data which is in cumulative format is done by taking the daily difference. In some cases, some retro updates not tie to the day that they actually occurred such as removing false positive cases
- Anomalies or error in the raw data
- Please see this issue for more details

Vignettes

Additional documentation available on the following vignettes:

Installation

Install the CRAN version:

install.packages("coronavirus")

Install the Github version (refreshed on a daily bases):

# install.packages("devtools")
devtools::install_github("RamiKrispin/coronavirus")

Datasets

The package provides the following two datasets:

coronavirus - tidy (long) format of the JHU CCSE datasets. That includes the following columns:
- date - The date of the observation, using Date class
- province - Name of province/state, for countries where data is provided split across multiple provinces/states
- country - Name of country/region
- lat - The latitude code
- long - The longitude code
- type - An indicator for the type of cases (confirmed, death, recovered)
- cases - Number of cases on given date
- uid - Country code
- province_state - Province or state if applicable
- iso2 - Officially assigned country code identifiers with two-letter
- iso3 - Officially assigned country code identifiers with three-letter
- code3 - UN country code
- fips - Federal Information Processing Standards code that uniquely identifies counties within the USA
- combined_key - Country and province (if applicable)
- population - Country or province population
- continent_name - Continent name
- continent_code - Continent code
covid19_vaccine - a tidy (long) format of the the Johns Hopkins Centers for Civic Impact global vaccination dataset by country. This dataset includes the following columns:
- country_region - Country or region name
- date - Data collection date in YYYY-MM-DD format
- doses_admin - Cumulative number of doses administered. When a vaccine requires multiple doses, each one is counted independently
- people_partially_vaccinated - Cumulative number of people who received at least one vaccine dose. When the person receives a prescribed second dose, it is not counted twice
- people_fully_vaccinated - Cumulative number of people who received all prescribed doses necessary to be considered fully vaccinated
- report_date_string - Data report date in YYYY-MM-DD format
- uid - Country code
- province_state - Province or state if applicable
- iso2 - Officially assigned country code identifiers with two-letter
- iso3 - Officially assigned country code identifiers with three-letter
- code3 - UN country code
- fips - Federal Information Processing Standards code that uniquely identifies counties within the USA
- lat - Latitude
- long - Longitude
- combined_key - Country and province (if applicable)
- population - Country or province population
- continent_name - Continent name
- continent_code - Continent code

The refresh_coronavirus_jhu function enables to load of the data directly from the package repository using the Covid19R project data standard format:

covid19_df <- refresh_coronavirus_jhu()

head(covid19_df)

Usage

data("coronavirus")

head(coronavirus)

Summary of the total confrimed cases by country (top 20):

library(dplyr)

summary_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases)

summary_df %>% head(20)

Summary of new cases during the past 24 hours by country and type (as of r max(coronavirus$date)):

library(tidyr)

coronavirus %>% 
  filter(date == max(date)) %>%
  select(country, type, cases) %>%
  group_by(country, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type,
              values_from = total_cases) %>%
  arrange(-confirmed)

Plotting daily confirmed and death cases in Brazil:

library(plotly)

coronavirus %>% 
  group_by(type, date) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(date) %>%
  mutate(active = confirmed - death - recovery) %>%
  mutate(active_total = cumsum(active),
                recovered_total = cumsum(recovery),
                death_total = cumsum(death)) %>%
  plot_ly(x = ~ date,
                  y = ~ active_total,
                  name = 'Active', 
                  fillcolor = '#1f77b4',
                  type = 'scatter',
                  mode = 'none', 
                  stackgroup = 'one') %>%
  add_trace(y = ~ death_total, 
             name = "Death",
             fillcolor = '#E41317') %>%
  add_trace(y = ~recovered_total, 
            name = 'Recovered', 
            fillcolor = 'forestgreen') %>%
  layout(title = "Distribution of Covid19 Cases Worldwide",
         legend = list(x = 0.1, y = 0.9),
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

library(plotly)

df <- coronavirus %>%
  filter(country == "Brazil",
         is.na(province))

p_1 <- plot_ly(data = df %>% filter(type == "confirmed"),
        x = ~ date,
        y = ~ cases,
        name = "Confirmed",
        type = "scatter",
        mode = "line") %>%
  layout(yaxis = list(title = "Cases"),
         xaxis = list(title = ""))

p_2 <- plot_ly(data = df %>% filter(type == "death"),
              x = ~ date,
              y = ~ cases,
              name = "Death",
              line = list(color = "red"),
              type = "scatter",
              mode = "line") %>%
  layout(yaxis = list(title = "Cases"),
         xaxis = list(title = "Source: Johns Hopkins University Center for Systems Science and Engineering"))

p1 <- subplot(p_1, p_2, nrows = 2, 
              titleX = TRUE,
              titleY = TRUE) %>%
  layout(title = "Brazil - Daily Confirmed and Death Cases",
         margin = list(t = 60, b = 60, l = 40, r = 40),
         legend = list(x = 0.05, y = 1))

orca(p1, "man/figures/brazil_cases.svg")

Plot the confirmed cases distribution by counrty with treemap plot:

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 

  plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

conf_df <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(country) %>%
  summarise(total_cases = sum(cases), .groups = "drop") %>%
  arrange(-total_cases) %>%
  mutate(parents = "Confirmed") %>%
  ungroup() 

p2 <-   plot_ly(data = conf_df,
          type= "treemap",
          values = ~total_cases,
          labels= ~ country,
          parents=  ~parents,
          domain = list(column=0),
          name = "Confirmed",
          textinfo="label+value+percent parent")

orca(p2, "man/figures/treemap_conf.svg")

data(covid19_vaccine)

head(covid19_vaccine)

Taking a snapshot of the data from the most recent date available and calculate the ratio between total doses admin and the population size:

df_summary <- covid19_vaccine |>
  filter(date == max(date)) |>
  select(date, country_region, doses_admin, total = people_at_least_one_dose, population, continent_name) |>
  mutate(doses_pop_ratio = doses_admin / population,
         total_pop_ratio = total / population) |>
  filter(country_region != "World", 
         !is.na(population),
         !is.na(total)) |>
  arrange(- total)

head(df_summary, 10)

Plot of the total doses and population ratio by country:

# Setting the diagonal lines range
line_start <- 10000
line_end <- 1500 * 10 ^ 6

# Filter the data
d <- df_summary |> 
  filter(country_region != "World", 
         !is.na(population),
         !is.na(total)) 


# Replot it
p3 <- plot_ly() |>
  add_markers(x = d$population,
              y = d$total,
              text = ~ paste("Country: ", d$country_region, "<br>",
                             "Population: ", d$population, "<br>",
                             "Total Doses: ", d$total, "<br>",
                             "Ratio: ", round(d$total_pop_ratio, 2), 
                             sep = ""),
              color = d$continent_name,
              type = "scatter",
              mode = "markers") |>
  add_lines(x = c(line_start, line_end),
            y = c(line_start, line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>
  add_lines(x = c(line_start, line_end),
            y = c(0.5 * line_start, 0.5 * line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>

  add_lines(x = c(line_start, line_end),
            y = c(0.25 * line_start, 0.25 * line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>
  add_annotations(text = "1:1",
                  x = log10(line_end * 1.25),
                  y = log10(line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "1:2",
                  x = log10(line_end * 1.25),
                  y = log10(0.5 * line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "1:4",
                  x = log10(line_end * 1.25),
                  y = log10(0.25 * line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "Source: Johns Hopkins University - Centers for Civic Impact",
                  showarrow = FALSE,
                  xref = "paper",
                  yref = "paper",
                  x = -0.05, y = - 0.33) |>
  layout(title = "Covid19 Vaccine - Total Doses vs. Population Ratio (Log Scale)",
         margin = list(l = 50, r = 50, b = 90, t = 70),
         yaxis = list(title = "Number of Doses",
                      type = "log"),
         xaxis = list(title = "Population Size",
                      type = "log"),
         legend = list(x = 0.75, y = 0.05))

# Setting the diagonal lines range
line_start <- 10000
line_end <- 1500 * 10 ^ 6

# Filter the data
d <- df_summary |> 
  filter(country_region != "World", 
         !is.na(population),
         !is.na(total)) 


# Replot it
p3 <- plot_ly() |>
  add_markers(x = d$population,
              y = d$total,
              text = ~ paste("Country: ", d$country_region, "<br>",
                             "Population: ", d$population, "<br>",
                             "Total Doses: ", d$total, "<br>",
                             "Ratio: ", round(d$total_pop_ratio, 2), 
                             sep = ""),
              color = d$continent_name,
              type = "scatter",
              mode = "markers") |>
  add_lines(x = c(line_start, line_end),
            y = c(line_start, line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>
  add_lines(x = c(line_start, line_end),
            y = c(0.5 * line_start, 0.5 * line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>

  add_lines(x = c(line_start, line_end),
            y = c(0.25 * line_start, 0.25 * line_end),
            showlegend = FALSE,
            line = list(color = "gray", width = 0.5)) |>
  add_annotations(text = "1:1",
                  x = log10(line_end * 1.25),
                  y = log10(line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "1:2",
                  x = log10(line_end * 1.25),
                  y = log10(0.5 * line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "1:4",
                  x = log10(line_end * 1.25),
                  y = log10(0.25 * line_end * 1.25),
                  showarrow = FALSE,
                  textangle = -25,
                  font = list(size = 8),
                  xref = "x",
                  yref = "y") |>
  add_annotations(text = "Source: Johns Hopkins University - Centers for Civic Impact",
                  showarrow = FALSE,
                  xref = "paper",
                  yref = "paper",
                  x = -0.05, y = - 0.33) |>
  layout(title = "Covid19 Vaccine - Total Doses vs. Population Ratio (Log Scale)",
         margin = list(l = 50, r = 50, b = 90, t = 70),
         yaxis = list(title = "Number of Doses",
                      type = "log"),
         xaxis = list(title = "Population Size",
                      type = "log"),
         legend = list(x = 0.75, y = 0.05))

orca(p3, "man/figures/country_summary.svg")

Dashboard

Note: Currently, the dashboard is under maintenance due to recent changes in the data structure. Please see this issue

A supporting dashboard is available here

Data Sources

The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:

World Health Organization (WHO): https://www.who.int/
DXY.cn. Pneumonia. 2020. https://ncov.dxy.cn/ncovh5/view/pneumonia.
BNO News: https://bnonews.com/index.php/2020/04/the-latest-coronavirus-cases/
National Health Commission of the People’s Republic of China (NHC):
http:://www.nhc.gov.cn/xcs/yqtb/list_gzbd.shtml
China CDC (CCDC): http:://weekly.chinacdc.cn/news/TrackingtheEpidemic.htm
Hong Kong Department of Health: https://www.chp.gov.hk/en/features/102465.html
Macau Government: https://www.ssm.gov.mo/portal/
Taiwan CDC: https://sites.google.com/cdc.gov.tw/2019ncov/taiwan?authuser=0
US CDC: https://www.cdc.gov/coronavirus/2019-ncov/index.html
Government of Canada: https://www.canada.ca/en/public-health/services/diseases/2019-novel-coronavirus-infection/symptoms.html
Australia Government Department of Health:https://www.health.gov.au/health-alerts/covid-19
European Centre for Disease Prevention and Control (ECDC): https://www.ecdc.europa.eu/en/covid-19/country-overviews
Ministry of Health Singapore (MOH): https://www.moh.gov.sg/covid-19
Italy Ministry of Health: https://www.salute.gov.it/nuovocoronavirus
1Point3Arces: https://coronavirus.1point3acres.com/en
WorldoMeters: https://www.worldometers.info/coronavirus/
COVID Tracking Project: https://covidtracking.com/data/. (US Testing and Hospitalization Data. We use the maximum reported value from "Currently" and "Cumulative" Hospitalized for our hospitalization number reported for each state.)
French Government: https://dashboard.covid19.data.gouv.fr/
COVID Live (Australia): https://covidlive.com.au/
Washington State Department of Health:https://doh.wa.gov/emergencies/covid-19
Maryland Department of Health: https://coronavirus.maryland.gov/
New York State Department of Health: https://health.data.ny.gov/Health/New-York-State-Statewide-COVID-19-Testing/xdss-u53e/data
NYC Department of Health and Mental Hygiene: https://www.nyc.gov/site/doh/covid/covid-19-data.page and https://github.com/nychealth/coronavirus-data
Florida Department of Health Dashboard: https://services1.arcgis.com/CY1LXxl9zlJeBuRZ/arcgis/rest/services/Florida_COVID19_Cases/FeatureServer/0 and https://fdoh.maps.arcgis.com/apps/dashboards/index.html#/8d0de33f260d444c852a615dc7837c86
Palestine (West Bank and Gaza): https://corona.ps/details
Israel: https://govextra.gov.il/ministry-of-health/corona/corona-virus/
Colorado: https://covid19.colorado.gov/