README.md

COVID19analytics

This package curate (downloads, clean, consolidate, smooth) data from Johns Hopkins and Our world in data for analysing international outbreak of COVID-19.

It includes several visualizations of the COVID-19 international outbreak.

Package

| Release | Usage | Development | |:---------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | | minimal R version | Travis | | CRAN | | codecov | | | | Project Status: Active – The project has reached a stable, usable state and is being actively developed. |

How to get started (Development version)

Install the R package using the following commands on the R console:

# install.packages("devtools")
devtools::install_github("rOpenStats/COVID19analytics", build_opts = NULL)

g First configurate environment variables with your preferred configurations in ~/.Renviron. COVID19analytics_data_dir is mandatory while COVID19analytics_credits can be configured if you want to publish your own research with space separated alias. Mention previous authors where corresponding

COVID19analytics_data_dir = "~/.R/COVID19analytics"
# If you want to generate your own reports
COVID19analytics_credits = "@alias1 @alias2 @aliasn"

How to use it

library(COVID19analytics) 
#> Warning: replacing previous import 'ggplot2::Layout' by 'lgr::Layout' when
#> loading 'COVID19analytics'
#> Warning: replacing previous import 'readr::col_factor' by 'scales::col_factor'
#> when loading 'COVID19analytics'
#> Warning: replacing previous import 'readr::local_edition' by
#> 'testthat::local_edition' when loading 'COVID19analytics'
#> Warning: replacing previous import 'magrittr::is_less_than' by
#> 'testthat::is_less_than' when loading 'COVID19analytics'
#> Warning: replacing previous import 'readr::edition_get' by
#> 'testthat::edition_get' when loading 'COVID19analytics'
#> Warning: replacing previous import 'magrittr::not' by 'testthat::not' when
#> loading 'COVID19analytics'
#> Warning: replacing previous import 'magrittr::equals' by 'testthat::equals'
#> when loading 'COVID19analytics'
#> Warning: replacing previous import 'dplyr::matches' by 'testthat::matches' when
#> loading 'COVID19analytics'
#> Warning: replacing previous import 'magrittr::extract' by 'tidyr::extract' when
#> loading 'COVID19analytics'
#> Warning: replacing previous import 'testthat::matches' by 'tidyr::matches' when
#> loading 'COVID19analytics'
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(knitr)
library(lgr)
log.dir <- file.path(getEnv("data_dir"), "logs")
dir.create(log.dir, recursive = TRUE, showWarnings = FALSE)
log.file <- file.path(log.dir, "covid19analytics.log")
lgr::get_logger("root")$add_appender(AppenderFile$new(log.file))
lgr::threshold("info", lgr::get_logger("root"))
lgr::threshold("info", lgr::get_logger("COVID19ARCurator"))
data.processor <- COVID19DataProcessor$new(provider = "JohnsHopkingsUniversity", missing.values = "imputation")

#dummy <- data.processor$preprocess() is setupData + transform is the preprocess made by data provider
dummy <- data.processor$setupData()
#> INFO  [09:43:39.869]  {stage: `processor-setup`}
#> INFO  [09:43:41.100] Checking required downloaded  {downloaded.max.date: `2023-03-09`, daily.update.time: `21:00:00`, current.datetime: `2023-11-28 09:43:41.089794`, download.flag: `TRUE`}
#> INFO  [09:43:45.222] Checking required downloaded  {downloaded.max.date: `2023-03-09`, daily.update.time: `21:00:00`, current.datetime: `2023-11-28 09:43:44.95936`, download.flag: `TRUE`}
#> INFO  [09:43:48.290] Checking required downloaded  {downloaded.max.date: `2023-03-09`, daily.update.time: `21:00:00`, current.datetime: `2023-11-28 09:43:48.287334`, download.flag: `TRUE`}
#> INFO  [09:43:51.153]  {stage: `data loaded`}
#> INFO  [09:43:51.155]  {stage: `data-setup`}
dummy <- data.processor$transform()
#> INFO  [09:43:51.156] Executing transform
#> INFO  [09:43:51.157] Executing consolidate
#> INFO  [09:44:05.717]  {stage: `consolidated`}
#> INFO  [09:44:05.718] Executing standarize
#> INFO  [09:44:06.832] gathering DataModel
#> INFO  [09:44:06.836]  {stage: `datamodel-setup`}
# Curate is the process made by missing values method
dummy <- data.processor$curate()
#> INFO  [09:44:06.851]  {stage: `loading-aggregated-data-model`}
#> Warning: Some values were not matched unambiguously: Antarctica
#> Warning: Some values were not matched unambiguously: Micronesia
#> Warning: Some values were not matched unambiguously: MS Zaandam
#> Warning: Some values were not matched unambiguously: Summer Olympics 2020
#> Warning: Some values were not matched unambiguously: Winter Olympics 2022
#> INFO  [09:44:09.642]  {stage: `calculating-rates`}
#> INFO  [09:44:09.819]  {stage: `making-data-comparison`}
#> INFO  [09:44:14.281]  {stage: `applying-missing-values-method`}
#> INFO  [09:44:14.283]  {stage: `Starting first imputation`}
#> INFO  [09:44:14.407]  {stage: `calculating-rates`}
#> INFO  [09:44:14.514]  {stage: `making-data-comparison-2`}
#> INFO  [09:44:17.923]  {stage: `calculating-top-countries`}
#> INFO  [09:44:17.957]  {stage: `curated`}

current.date <- max(data.processor$getData()$date)

rg <- ReportGeneratorEnhanced$new(data.processor)
rc <- ReportGeneratorDataComparison$new(data.processor = data.processor)

top.countries <- data.processor$top.countries
international.countries <- unique(c(data.processor$top.countries,
                                    "China", "Japan", "Singapore", "Korea, South"))
latam.countries <- sort(c("Mexico",
                     data.processor$countries$getCountries(division = "sub.continent", name = "Caribbean"),
                     data.processor$countries$getCountries(division = "sub.continent", name = "Central America"),
                     data.processor$countries$getCountries(division = "sub.continent", name = "South America")))
# Top 10 daily cases confirmed increment
kable((data.processor$getData() %>%
  filter(date == current.date) %>%
  select(country, date, rate.inc.daily, confirmed.inc, confirmed, deaths, deaths.inc) %>%
  arrange(desc(confirmed.inc)) %>%
  filter(confirmed >=10))[1:10,])

| country | date | rate.inc.daily | confirmed.inc | confirmed | deaths | deaths.inc | |:---------------|:-----------|---------------:|--------------:|----------:|--------:|-----------:| | US | 2023-03-09 | 0.0005 | 46931 | 103802702 | 1123836 | 590 | | United Kingdom | 2023-03-09 | 0.0012 | 28783 | 24658705 | 220721 | 0 | | Australia | 2023-03-09 | 0.0012 | 13926 | 11399460 | 19574 | 115 | | Russia | 2023-03-09 | 0.0006 | 12385 | 22075858 | 388478 | 38 | | Belgium | 2023-03-09 | 0.0024 | 11570 | 4739365 | 33814 | 39 | | Korea, South | 2023-03-09 | 0.0003 | 10335 | 30615522 | 34093 | 12 | | Japan | 2023-03-09 | 0.0003 | 9834 | 33320438 | 72997 | 80 | | Germany | 2023-03-09 | 0.0002 | 7829 | 38249060 | 168935 | 127 | | France | 2023-03-09 | 0.0002 | 6308 | 39866718 | 166176 | 11 | | Austria | 2023-03-09 | 0.0009 | 5283 | 5961143 | 21970 | 21 |

# Top 10 daily deaths increment
kable((data.processor$getData() %>%
  filter(date == current.date) %>%
  select(country, date, rate.inc.daily, confirmed.inc, confirmed, deaths, deaths.inc) %>%
  arrange(desc(deaths.inc)))[1:10,])

| country | date | rate.inc.daily | confirmed.inc | confirmed | deaths | deaths.inc | |:----------|:-----------|---------------:|--------------:|----------:|--------:|-----------:| | US | 2023-03-09 | 0.0005 | 46931 | 103802702 | 1123836 | 590 | | Germany | 2023-03-09 | 0.0002 | 7829 | 38249060 | 168935 | 127 | | Australia | 2023-03-09 | 0.0012 | 13926 | 11399460 | 19574 | 115 | | Japan | 2023-03-09 | 0.0003 | 9834 | 33320438 | 72997 | 80 | | Sweden | 2023-03-09 | 0.0003 | 804 | 2699339 | 23777 | 46 | | Belgium | 2023-03-09 | 0.0024 | 11570 | 4739365 | 33814 | 39 | | Russia | 2023-03-09 | 0.0006 | 12385 | 22075858 | 388478 | 38 | | Finland | 2023-03-09 | 0.0005 | 668 | 1463644 | 8967 | 31 | | Austria | 2023-03-09 | 0.0009 | 5283 | 5961143 | 21970 | 21 | | Poland | 2023-03-09 | 0.0005 | 3459 | 6444960 | 119010 | 21 |

 rg$ggplotTopCountriesStackedBarDailyInc(included.countries = latam.countries, countries.text = "Latam countries")
#> Warning: Removed 144 rows containing missing values (`position_stack()`).

rc$ggplotComparisonExponentialGrowth(included.countries = latam.countries, countries.text = "Latam countries",   
                                     field = "confirmed", y.label = "Confirmed", min.cases = 100)
#> Warning: ggrepel: 7 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

rc$ggplotComparisonExponentialGrowth(included.countries = latam.countries, countries.text = "Latam countries",   
                                     field = "remaining.confirmed", y.label = "Active cases", min.cases = 100)
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: ggrepel: 5 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

rc$ggplotComparisonExponentialGrowth(included.countries = latam.countries, field = "deaths", y.label = "Deaths", min.cases = 1)


rg$ggplotCrossSection(included.countries = latam.countries,
                       field.x = "confirmed",
                       field.y = "fatality.rate.max",
                       plot.description  = "Cross section Confirmed vs  Death rate min",
                       log.scale.x = TRUE,
                       log.scale.y = FALSE)
#> Warning: Removed 144 rows containing missing values (`geom_line()`).


rg$ggplotCountriesLines(included.countries = latam.countries, countries.text = "Latam countries",
                        field = "confirmed.inc", log.scale = TRUE)
#> Warning: Removed 144 rows containing missing values (`geom_line()`).

rg$ggplotCountriesLines(included.countries = latam.countries, countries.text = "Latam countries",
                        field = "deaths.inc", log.scale = TRUE)
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 12 rows containing missing values (`geom_point()`).
#> Warning: Removed 144 rows containing missing values (`geom_line()`).

rg$ggplotCountriesLines(included.countries = latam.countries, countries.text = "Latam countries",
                        field = "rate.inc.daily", log.scale = TRUE)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 321 rows containing missing values (`geom_line()`).
#> Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
#> Warning: ggrepel: 23 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

rg$ggplotTopCountriesStackedBarDailyInc(top.countries)
#> Warning: There were 3 warnings in `mutate()`.
#> The first warning was:
#> ℹ In argument: `country = fct_reorder(country, desc(max.count))`.
#> ℹ In group 1: `country = "US"`.
#> Caused by warning:
#> ! `fct_reorder()` removing 1143 missing values.
#> ℹ Use `.na_rm = TRUE` to silence this message.
#> ℹ Use `.na_rm = FALSE` to preserve NAs.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 2 remaining warnings.
#> Warning: Removed 69 rows containing missing values (`position_stack()`).

rc$ggplotComparisonExponentialGrowth(included.countries = international.countries, 
                                     field = "confirmed", y.label = "Confirmed", min.cases = 100)
#> Warning: Removed 2 rows containing missing values (`geom_line()`).
#> Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

rc$ggplotComparisonExponentialGrowth(included.countries = international.countries, 
                                     field = "remaining.confirmed", y.label = "Active cases", min.cases = 100)
#> Warning: Removed 2 rows containing missing values (`geom_line()`).
#> ggrepel: 4 unlabeled data points (too many overlaps). Consider increasing max.overlaps

rc$ggplotComparisonExponentialGrowth(included.countries = international.countries, field = "deaths", 
                                     y.label = "Deaths", min.cases = 1)
#> Warning: Removed 2 rows containing missing values (`geom_line()`).

rg$ggplotCrossSection(included.countries = international.countries,
                       field.x = "confirmed",
                       field.y = "fatality.rate.max",
                       plot.description  = "Cross section Confirmed vs Death rate min",
                       log.scale.x = TRUE,
                       log.scale.y = FALSE)
#> Warning: Removed 78 rows containing missing values (`geom_line()`).

rg$ggplotCountriesLines(field = "confirmed.inc", log.scale = TRUE)
#> Warning: Removed 66 rows containing missing values (`geom_line()`).

rg$ggplotCountriesLines(field = "deaths.inc", log.scale = TRUE)
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis
#> Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 17 rows containing missing values (`geom_point()`).
#> Warning: Removed 66 rows containing missing values (`geom_line()`).

rg$ggplotCountriesLines(field = "rate.inc.daily", log.scale = TRUE)
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning in self$trans$transform(x): NaNs produced
#> Warning: Transformation introduced infinite values in continuous y-axis
#> Warning: Removed 176 rows containing missing values (`geom_line()`).
#> Warning: Removed 1 rows containing missing values (`geom_text_repel()`).
#> Warning: ggrepel: 4 unlabeled data points (too many overlaps). Consider
#> increasing max.overlaps

rg$ggplotTopCountriesPie()

rg$ggplotTopCountriesBarPlots()

rg$ggplotCountriesBarGraphs(selected.country = "Argentina")

References

Yanchang Zhao, COVID-19 Data Analysis with Tidyverse and Ggplot2 - China. RDataMining.com, 2020.

URL: http://www.rdatamining.com/docs/Coronavirus-data-analysis-china.pdf.



rOpenStats/Covid19Analytics documentation built on Dec. 7, 2023, 9:28 p.m.