The coronavirus package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic. The raw data pulled from the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) Coronavirus repository.
More details available here,
and a csv
format of the package dataset available
here
A summary dashboard is available here
As this an ongoing situation, frequent changes in the data format may occur, please visit the package news to get updates about those changes
Install the CRAN version:
install.packages("coronavirus")
Install the Github version (refreshed on a daily bases):
# install.packages("devtools")
devtools::install_github("covid19r/coronavirus")
The package contains a single dataset - coronavirus
:
library(coronavirus)
data("coronavirus")
This coronavirus
dataset has the following fields:
head(coronavirus)
#> Province.State Country.Region Lat Long date cases type
#> 1 Afghanistan 33 65 2020-01-22 0 confirmed
#> 2 Afghanistan 33 65 2020-01-23 0 confirmed
#> 3 Afghanistan 33 65 2020-01-24 0 confirmed
#> 4 Afghanistan 33 65 2020-01-25 0 confirmed
#> 5 Afghanistan 33 65 2020-01-26 0 confirmed
#> 6 Afghanistan 33 65 2020-01-27 0 confirmed
tail(coronavirus)
#> Province.State Country.Region Lat Long date cases type
#> 56457 Zhejiang China 29.1832 120.0934 2020-03-30 0 recovered
#> 56458 Zhejiang China 29.1832 120.0934 2020-03-31 1 recovered
#> 56459 Zhejiang China 29.1832 120.0934 2020-04-01 0 recovered
#> 56460 Zhejiang China 29.1832 120.0934 2020-04-02 2 recovered
#> 56461 Zhejiang China 29.1832 120.0934 2020-04-03 0 recovered
#> 56462 Zhejiang China 29.1832 120.0934 2020-04-04 1 recovered
Here is an example of a summary total cases by region and type (top 20):
library(dplyr)
summary_df <- coronavirus %>% group_by(Country.Region, type) %>%
summarise(total_cases = sum(cases)) %>%
arrange(-total_cases)
summary_df %>% head(20)
#> # A tibble: 20 x 3
#> # Groups: Country.Region [12]
#> Country.Region type total_cases
#> <chr> <chr> <int>
#> 1 US confirmed 308850
#> 2 Spain confirmed 126168
#> 3 Italy confirmed 124632
#> 4 Germany confirmed 96092
#> 5 France confirmed 90848
#> 6 China confirmed 82543
#> 7 China recovered 76946
#> 8 Iran confirmed 55743
#> 9 United Kingdom confirmed 42477
#> 10 Spain recovered 34219
#> 11 Germany recovered 26400
#> 12 Turkey confirmed 23934
#> 13 Italy recovered 20996
#> 14 Switzerland confirmed 20505
#> 15 Iran recovered 19736
#> 16 Belgium confirmed 18431
#> 17 Netherlands confirmed 16727
#> 18 France recovered 15572
#> 19 Italy death 15362
#> 20 US recovered 14652
Summary of new cases during the past 24 hours by country and type (as of 2020-04-04):
library(tidyr)
coronavirus %>%
filter(date == max(date)) %>%
select(country = Country.Region, type, cases) %>%
group_by(country, type) %>%
summarise(total_cases = sum(cases)) %>%
pivot_wider(names_from = type,
values_from = total_cases) %>%
arrange(-confirmed)
#> # A tibble: 181 x 4
#> # Groups: country [181]
#> country confirmed death recovered
#> <chr> <int> <int> <int>
#> 1 US 33264 1320 4945
#> 2 France 25646 1054 1437
#> 3 Spain 6969 749 3706
#> 4 Germany 4933 169 1825
#> 5 Italy 4805 681 1238
#> 6 United Kingdom 3788 709 7
#> 7 Turkey 3013 76 302
#> 8 Iran 2560 158 1801
#> 9 Belgium 1661 140 375
#> 10 Brazil 1304 86 0
#> # … with 171 more rows
The raw data pulled and arranged by the Johns Hopkins University Center for Systems Science and Engineering (JHU CCSE) from the following resources:
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.