In dmglandon/coronavirus_test: The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

knitr::opts_chunk$set(echo = TRUE, 
                      message=FALSE, 
                      warning=FALSE, 
                      fig.height=5, 
                      fig.width=8,
                      collapse = TRUE,
                      comment = "#>")

The coronavirus dataset

The coronavirus dataset provides a snapshot of the daily confirmed, recovered, and death cases of the Coronavirus (the 2019 Novel Coronavirus COVID-19) by geographic location (i.e., country/province). Let's load the dataset from the coronavirus package:

library(coronavirus)

data(coronavirus)

The dataset has the following fields:

date - The date of the summary
Province.State - The province or state, when applicable
Country.Region - The country or region name
Lat - Latitude point
Long- Longitude point
cases - the number of daily cases (corresponding to the case type)
type - the type of case (i.e., confirmed, death, and recovered)

We can use the head and str functions to see the structure of the dataset:

head(coronavirus)

str(coronavirus)

Querying and analyzing the coronavirus dataset

We will use the dplyr and tidyr packages to query, transform, reshape, and keep the data tidy, the plotly package to plot the data and the DT package to view it:

library(dplyr)
library(tidyr)
library(plotly)
library(DT)

Cases summary

Let's start with summarizing the total number of cases by type as of r max(coronavirus$date) and then plot it:

total_cases <- coronavirus %>% 
  group_by(type) %>%
  summarise(cases = sum(cases)) %>%
  mutate(type = factor(type, levels = c("confirmed", "recovered", "death")))

total_cases

plot_ly(data = total_cases, 
        x = ~ type, 
        y = ~cases, 
        type = 'bar',
        text = ~ paste(type, cases, sep = ": "),
    hoverinfo = 'text') %>%
  layout(title = "Coronavirus - Cases Distribution",
         yaxis = list(title = "Number of Cases"),
         xaxis = list(title = "Case Type"),
         hovermode = "compare")

We can learn from this table that, so far worldwide, the recovery rate is r paste(round(100 *total_cases$cases[3] / total_cases$cases[1], 2), "%", sep = "") and the death rate is r paste(round(100 *total_cases$cases[2] / total_cases$cases[1], 2), "%", sep = "").

Top effected countries

The next table provides an overview of the ten countries with the highest confirmed cases. We will use the datatable function from the DT package to view the table:

confirmed_country <- coronavirus %>% 
  filter(type == "confirmed") %>%
  group_by(Country.Region) %>%
  summarise(total_cases = sum(cases)) %>%
  mutate(perc = total_cases / sum(total_cases)) %>%
  arrange(-total_cases)

confirmed_country %>%
  head(10) %>%
  datatable(rownames = FALSE,
            colnames = c("Country", "Cases", "Perc of Total")) %>%
  formatPercentage("perc", 2)

As r paste(round(100 * confirmed_country$perc[which(confirmed_country$Country.Region == "Mainland China")],1), "%", sep = "") of the confirmed cases are in China, to get a better overview of the worldwide dist of the virus, let's exclude China and plot the rest of the world dist:

coronavirus %>% 
  filter(type == "confirmed", 
         Country.Region != "Mainland China") %>%
  group_by(Country.Region) %>%
  summarise(total_cases = sum(cases)) %>%
  arrange(-total_cases) %>%
  mutate(country = factor(Country.Region, levels = Country.Region)) %>%
  ungroup() %>%
  plot_ly(labels = ~ country, 
          values = ~ total_cases,
          type = "pie",
          textposition = 'inside',
        textinfo = 'label+percent',
        insidetextfont = list(color = '#FFFFFF'),
        hoverinfo = 'text',
        text = ~ paste(country, "<br />",
                     "Number of confirmed cases: ", total_cases, sep = "")) %>%
  layout(title = "Coronavirus - Confirmed Cases (Excluding China)")

Recovery and death rates

Similarly, we can use the pivot_wider function from the tidyr package (in addition to the dplyr functions we used above) to get an overview of the three types of cases (confirmed, recovered, and death). We then will use it to derive the recovery and death rate by country. As for most of the countries, there is not enough information about the results of the confirmed cases, we will filter the data for countries with at least 25 confirmed cases and above:

coronavirus %>% 
  filter(Country.Region != "Others") %>%
  group_by(Country.Region, type) %>%
  summarise(total_cases = sum(cases)) %>%
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(- confirmed) %>%
  filter(confirmed >= 25) %>%
  mutate(recover_rate = recovered / confirmed,
         death_rate = death / confirmed)  %>%
  datatable(rownames = FALSE,
            colnames = c("Country", "Confirmed", "Recovered", "Death", "Recovery Rate", "Death Rate")) %>%
   formatPercentage("recover_rate", 2) %>%
   formatPercentage("death_rate", 2)

Note that it will be misleading to make any conclusion about the recovery and death rate. As there is no detail information about:

There is no measurement between the time a case was confirmed and recovery or death. This is not an apple to apple comparison, as the outbreak did not start at the same time in all the affected countries.
As age plays a critical role in the probability of survival from the virus, we cannot make a comparison between different cases without having more demographic information.

Diving into China

The following plot describes the overall distribution of the total confirmed cases in China by province:

coronavirus %>% 
  filter(Country.Region == "China",
         type == "confirmed") %>%
  group_by(Province.State, type) %>%
  summarise(total_cases = sum(cases)) %>%  
  pivot_wider(names_from = type, values_from = total_cases) %>%
  arrange(- confirmed) %>%
  plot_ly(labels = ~ Province.State, 
                  values = ~confirmed, 
                  type = 'pie',
                  textposition = 'inside',
                  textinfo = 'label+percent',
                  insidetextfont = list(color = '#FFFFFF'),
                  hoverinfo = 'text',
                  text = ~ paste(Province.State, "<br />",
                                 "Number of confirmed cases: ", confirmed, sep = "")) %>%
  layout(title = "Total China Confirmed Cases Dist. by Province")

dmglandon/coronavirus_test documentation built on March 23, 2020, 12:44 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

dmglandon/coronavirus_test
The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

In dmglandon/coronavirus_test: The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

The coronavirus dataset

Querying and analyzing the coronavirus dataset

Cases summary

Top effected countries

Recovery and death rates

Diving into China

R Package Documentation

Browse R Packages

We want your feedback!

dmglandon/coronavirus_test The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

In dmglandon/coronavirus_test: The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

The coronavirus dataset

Querying and analyzing the coronavirus dataset

Cases summary

Top effected countries

Recovery and death rates

Diving into China

R Package Documentation

Browse R Packages

We want your feedback!

dmglandon/coronavirus_test
The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset