options(width = 1000)
knitr::opts_chunk$set(
  echo = FALSE,
  collapse = TRUE,
  message=FALSE, 
  warning=FALSE,
  cache = TRUE,
  comment = "")

library(JuliaConnectoR)

The coronavirus R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic and the vaccination efforts by country. In addition, the data is available in a CSV format to enable none R users to pull and use it seamlessly.

The goal of this vignette is to demonstrate how to load the data into Julia. In the following example, we will load the coronavirus dataset, a tidy format of the John Hopkins COVID19 dataset, providing a daily summary of the COVID19 cases by type (confirmed, recovered, death), and by country.

The Julia source code can be find here.

Required Julia libraries

For loading the data from the coronavirus Github repository, summarized and plot it we will use the following libraries:

using Pkg, CSV, DataFrames, PlotlyJS
juliaEval('using Pkg
          Pkg.activate("covid19_env")
          Pkg.add(PackageSpec(name = "PlotlyJS", version = "0.18.8"))
          Pkg.add(PackageSpec(name = "CSV", version = "0.9.6"))
          Pkg.add(PackageSpec(name = "Chain", version = "0.4.8"))
          Pkg.add(PackageSpec(name = "DataFrames", version = "1.2.2"))
          using  CSV, DataFrames, Chain, PlotlyJS')

We will use the following packages versions:

Pkg.status()
juliaEval("Pkg.status()")

Loading the data

The .\csv\ folder on the package repository contains the package data in a CSV format. Due to size limitations, each year of data is stored on a separate file under the name coronavirus_ + year. To load all the files, we will use the below for loop to pull the corresponding files from 2020 to 2023. Since the files are being loaded from a URL we will use the download function and load the files with the CSV.File and transform it to a DataFrame with the DataFrame function:

url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_"

df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])
juliaEval('url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_"

df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])')

Note: similarly, you can load the vaccine data by using the following end point: https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/covid19_vaccine.csv

Let's review the appended file:

df
juliaEval('df')

We will use the describe function to review the dataframe attributes:

describe(df) 
juliaEval('describe(df)')

The dataset has the following fields:

Data summary

Once we load the data, it would be interesting to explore the data using summary tables. In the following examples, we will create a summary of cases (confirmed and death) by country. We will use the chain function to filter the case type, group by country, aggregated by cases, and sort by the number of cases. We will start with a confirmed cases summary:

@chain df begin
    filter(:type => ==("confirmed"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end
juliaEval('@chain df begin
    filter(:type => ==("confirmed"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort([:cases_sum], rev = true)
end')

Similarly, we can summarize the total number of death cases by country:

@chain df begin
    filter(:type => ==("death"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end
juliaEval('@chain df begin
    filter(:type => ==("death"),_)
    groupby([:combined_key, :type])
    combine([:cases] .=> sum)
    sort!([:cases_sum], rev = true)
end')

Visualize the data

Last but not least, we will plot the data using PlotlyJS, Plotly version for Julia. In the following example, we will generate a side-by-side plot of Brazil's total daily confirmed and death cases. We will start by filtering the cases in Brazil by case type - confirmed and death:

df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df)
df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df)
juliaEval('df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df)
df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df)
')

To plot the confirmed and death cases side by side, we will use the make_subplots function to set the grid and add the plots with the add_trach function:

p = make_subplots(rows=2, 
                  cols=1, 
                  shared_xaxes=true, 
                  x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering",
                  vertical_spacing=0.02)

add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1)
add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1)

relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases")
p
juliaEval('p = make_subplots(rows=2, 
                  cols=1, 
                  shared_xaxes=true, 
                  x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering",
                  vertical_spacing=0.02)

add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1)
add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1)

relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases")
savefig(p, "brazil.png")')

Note: The current markdown format does not support interactive HTML objects. Therefore, the plot out above was saved as a png file and lost its interactivity attributes.



RamiKrispin/coronavirus documentation built on March 22, 2023, 7:27 p.m.