options(width = 1000) knitr::opts_chunk$set( echo = FALSE, collapse = TRUE, message=FALSE, warning=FALSE, cache = TRUE, comment = "") library(JuliaConnectoR)
The coronavirus R package provides a tidy format dataset of the 2019 Novel Coronavirus COVID-19 (2019-nCoV) epidemic and the vaccination efforts by country. In addition, the data is available in a CSV format to enable none R users to pull and use it seamlessly.
The goal of this vignette is to demonstrate how to load the data into Julia. In the following example, we will load the coronavirus
dataset, a tidy format of the John Hopkins COVID19 dataset, providing a daily summary of the COVID19 cases by type (confirmed, recovered, death), and by country.
The Julia source code can be find here.
For loading the data from the coronavirus
Github repository, summarized and plot it we will use the following libraries:
using Pkg, CSV, DataFrames, PlotlyJS
juliaEval('using Pkg Pkg.activate("covid19_env") Pkg.add(PackageSpec(name = "PlotlyJS", version = "0.18.8")) Pkg.add(PackageSpec(name = "CSV", version = "0.9.6")) Pkg.add(PackageSpec(name = "Chain", version = "0.4.8")) Pkg.add(PackageSpec(name = "DataFrames", version = "1.2.2")) using CSV, DataFrames, Chain, PlotlyJS')
We will use the following packages versions:
Pkg.status()
juliaEval("Pkg.status()")
The .\csv\
folder on the package repository contains the package data in a CSV format. Due to size limitations, each year of data is stored on a separate file under the name coronavirus_
+ year. To load all the files, we will use the below for
loop to pull the corresponding files from 2020 to 2023. Since the files are being loaded from a URL we will use the download
function and load the files with the CSV.File
and transform it to a DataFrame
with the DataFrame
function:
url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_" df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])
juliaEval('url = "https://raw.githubusercontent.com/RamiKrispin/coronavirus/main/csv/coronavirus_" df = reduce(vcat, [DataFrame(CSV.File(download(join([url, i, ".csv"])), missingstring= "NA")) for i in [2020:1:2023;]])')
Note: similarly, you can load the vaccine data by using the following end point:
https://raw.githubusercontent.com/RamiKrispin/coronavirus/master/csv/covid19_vaccine.csv
Let's review the appended file:
df
juliaEval('df')
We will use the describe
function to review the dataframe attributes:
describe(df)
juliaEval('describe(df)')
The dataset has the following fields:
date
- The date of the summaryprovince
- The province or state, when applicablecountry
- The country or region nameLat
- Latitude pointLong
- Longitude pointtype
- The type of case (i.e., confirmed, death)cases
- The number of daily cases (corresponding to the case type)uid
- Country codeiso2
- Officially assigned country code identifiers with two-letteriso3
- Officially assigned country code identifiers with three-lettercode3
- UN country codecombined_key
- Country and province (if applicable)population
- Country or province populationcontinent_name
- Continent namecontinent_code
- Continent codeOnce we load the data, it would be interesting to explore the data using summary tables. In the following examples, we will create a summary of cases (confirmed and death) by country. We will use the chain
function to filter the case type, group by country, aggregated by cases, and sort by the number of cases. We will start with a confirmed cases summary:
@chain df begin filter(:type => ==("confirmed"),_) groupby([:combined_key, :type]) combine([:cases] .=> sum) sort!([:cases_sum], rev = true) end
juliaEval('@chain df begin filter(:type => ==("confirmed"),_) groupby([:combined_key, :type]) combine([:cases] .=> sum) sort([:cases_sum], rev = true) end')
Similarly, we can summarize the total number of death cases by country:
@chain df begin filter(:type => ==("death"),_) groupby([:combined_key, :type]) combine([:cases] .=> sum) sort!([:cases_sum], rev = true) end
juliaEval('@chain df begin filter(:type => ==("death"),_) groupby([:combined_key, :type]) combine([:cases] .=> sum) sort!([:cases_sum], rev = true) end')
Last but not least, we will plot the data using PlotlyJS, Plotly version for Julia. In the following example, we will generate a side-by-side plot of Brazil's total daily confirmed and death cases. We will start by filtering the cases in Brazil by case type - confirmed
and death
:
df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df) df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df)
juliaEval('df_brazil_confirmed = filter(row -> row.country == "Brazil" && row.type == "confirmed", df) df_brazil_death = filter(row -> row.country == "Brazil" && row.type == "death", df) ')
To plot the confirmed
and death
cases side by side, we will use the make_subplots
function to set the grid and add the plots with the add_trach
function:
p = make_subplots(rows=2, cols=1, shared_xaxes=true, x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering", vertical_spacing=0.02) add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1) add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1) relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases") p
juliaEval('p = make_subplots(rows=2, cols=1, shared_xaxes=true, x_title = "Source: Johns Hopkins University Center for Systems Science and Engineering", vertical_spacing=0.02) add_trace!(p, scatter(df_brazil_confirmed,x=:date, y=:cases, name = "Confirmed"), row=1, col=1) add_trace!(p, scatter(df_brazil_death,x=:date, y=:cases, name = "Death"), row=2, col=1) relayout!(p, title_text="Brazil - Daily Confirmed and Death Cases") savefig(p, "brazil.png")')
Note: The current markdown format does not support interactive HTML objects. Therefore, the plot out above was saved as a png file and lost its interactivity attributes.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.