In RamiKrispin/coronavirus: The 2019 Novel Coronavirus COVID-19 (2019-nCoV) Dataset

options(width = 1000)
knitr::opts_chunk$set(
  echo = FALSE,
  collapse = TRUE,
  message=FALSE, 
  warning=FALSE,
  cache = TRUE,
  comment = "")

library(JuliaConnectoR)

The coronavirus R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic and the vaccination efforts by country. In addition, the data is available in a CSV format to enable none R users to pull and use it seamlessly.

The goal of this vignette is to demonstrate how to load the data into Julia. In the following example, we will load the coronavirus dataset, a tidy format of the John Hopkins COVID19 dataset, providing a daily summary of the COVID19 cases by type (confirmed, recovered, death), and by country.

The Julia source code can be find here.

Required Julia libraries

For loading the data from the coronavirus Github repository, summarized and plot it we will use the following libraries:

HTTP - to load the data from url
CSV - to read the CSV file
DataFrames - to store the data as dataframe and manipulate it (filter, group by, etc.)
PlotlyJS - to visualize the data

using Pkg, CSV, DataFrames, PlotlyJS

juliaEval('using Pkg
          Pkg.activate("covid19_env")
          Pkg.add(PackageSpec(name = "PlotlyJS", version = "0.18.8"))
          Pkg.add(PackageSpec(name = "CSV", version = "0.9.6"))
          Pkg.add(PackageSpec(name = "Chain", version = "0.4.8"))
          Pkg.add(PackageSpec(name = "DataFrames", version = "1.2.2"))
          using  CSV, DataFrames, Chain, PlotlyJS')

We will use the following packages versions:

Pkg.status()

juliaEval("Pkg.status()")

Loading the data

The .\csv\ folder on the package repository contains the package data in a CSV format. Due to size limitations, each year of data is stored on a separate file under the name coronavirus_ + year. To load all the files, we will use the below for loop to pull the corresponding files from 2020 to 2023. Since the files are being loaded from a URL we will use the download function and load the files with the CSV.File and transform it to a DataFrame with the DataFrame function:

url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_"

df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])

juliaEval('url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_"

df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])')

Note: similarly, you can load the vaccine data by using the following end point: https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/covid19_vaccine.csv

Let's review the appended file:

df

juliaEval('df')

We will use the describe function to review the dataframe attributes:

describe(df)

juliaEval('describe(df)')

The dataset has the following fields:

date - The date of the summary
province - The province or state, when applicable
country - The country or region name
Lat - Latitude point
Long - Longitude point
type - The type of case (i.e., confirmed, death)
cases - The number of daily cases (corresponding to the case type)
uid - Country code
iso2 - Officially assigned country code identifiers with two-letter
iso3 - Officially assigned country code identifiers with three-letter
code3 - UN country code
combined_key - Country and province (if applicable)
population - Country or province population
continent_name - Continent name
continent_code - Continent code

Data summary

Once we load the data, it would be interesting to explore the data using summary tables. In the following examples, we will create a summary of cases (confirmed and death) by country. We will use the chain function to filter the case type, group by country, aggregated by cases, and sort by the number of cases. We will start with a confirmed cases summary:

@chain df begin
    filter(:type => ==("confirmed"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end

juliaEval('@chain df begin
    filter(:type => ==("confirmed"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort([:cases_sum], rev = true)
end')

Similarly, we can summarize the total number of death cases by country:

@chain df begin
    filter(:type => ==("death"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end

juliaEval('@chain df begin
    filter(:type => ==("death"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end')

Visualize the data

Last but not least, we will plot the data using PlotlyJS, Plotly version for Julia. In the following example, we will generate a side-by-side plot of Brazil's total daily confirmed and death cases. We will start by filtering the cases in Brazil by case type - confirmed and death:

df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df)
df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df)

juliaEval('df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df)
df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df)
')

To plot the confirmed and death cases side by side, we will use the make_subplots function to set the grid and add the plots with the add_trach function:

p = make_subplots(rows=2, 
                  cols=1, 
                  shared_xaxes=true, 
                  x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering",
                  vertical_spacing=0.02)

add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1)
add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1)

relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases")
p

juliaEval('p = make_subplots(rows=2, 
                  cols=1, 
                  shared_xaxes=true, 
                  x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering",
                  vertical_spacing=0.02)

add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1)
add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1)

relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases")
savefig(p, "brazil.png")')

Note: The current markdown format does not support interactive HTML objects. Therefore, the plot out above was saved as a png file and lost its interactivity attributes.

RamiKrispin/coronavirus documentation built on March 22, 2023, 7:27 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com