library(covid19sf) knitr::opts_chunk$set( fig.height=5, fig.width=8, message=FALSE, warning=FALSE, collapse = TRUE, comment = "#>" )
The covid19sf_population
dataset provides a daily summary of COVID-19 cases by different demographic types. That includes the following groups:
Note: Unfortunately, the groups are separated, and there is no option to run cross-group analysis (e.g., distribution by age and gender, etc.).
Below are some examples for exploring the dataset by a specific demographic group using the characteristic_type
variable to filter the dataset.
To get cases by age group we will filter the dataset to Age Group
:
library(dplyr) df_age <- covid19sf_population %>% filter(characteristic_type == "Age Group") head(df_age)
The age groups on this dataset are:
In the following example, we will use the plotly package to visualize the daily new cases distributions by age group. First, let's sort the age group and set the characteristic_group
variable as ordered variable:
library(plotly) age_order <- df_age %>% select(characteristic_group, characteristic_group_sort_order) %>% distinct() %>% arrange(characteristic_group_sort_order) df_age$characteristic_group <- factor(df_age$characteristic_group, levels = age_order$characteristic_group)
Next, we will use the box
plot option to create a box-plot by age group:
plot_ly(df_age, color = ~ characteristic_group, y = ~ new_cases, boxpoints = "all", jitter = 0.3, pointpos = -1.8, type = "box" ) %>% layout(title = "Distribution of Daily New COVID-19 Cases in San Francisco by Age Group", yaxis = list(title = "Number of Cases"), xaxis = list(title = "Source: San Francisco Department of Public Health"), legend = list(x = 0.9, y = 0.9), margin = list(t = 60, b = 60, l = 60, r = 60))
As shown in the box-plot above, the distribution of cases for age groups 20 to 29, 30 to 39, and 40 to 49 is relevantly wider with respect to other age groups. It is hard to conclude about age group distribution without some information about the overall population proportion of each age group. This information can be obtained from the population_estimate
variable.
The next plot describes the distribution of the cumulative cases by age group as of the most recent date in the data:
df_age %>% filter(specimen_collection_date == max(specimen_collection_date)) %>% plot_ly(values = ~ cumulative_cases, labels = ~ characteristic_group, type = "pie", textposition = 'inside', textinfo = 'label+percent', insidetextfont = list(color = '#FFFFFF'), hoverinfo = 'text', text = ~paste(" Age Group:", characteristic_group, "<br>", "Total:", cumulative_cases, "<br>", "Population Estimation:", population_estimate, paste("(",round(100* cumulative_cases/population_estimate, 1) ,"%)", sep = ""))) %>% layout(title = ~ paste("Total Cases Dist. by Age Group as of", max(specimen_collection_date)), margin = list(t = 60, b = 20, l = 30, r = 60))
Similarly, we can review the distribution of cases by gender group:
df_gender <- covid19sf_population %>% filter(characteristic_type == "Gender") head(df_gender)
The gender group has the following categories:
unique(df_gender$characteristic_group)
The following table provides the cases cumulative distribution of cases by gender. We will use the tidyr package to spread the data by gender group with the pivot_wider
function::
library(tidyr) df_gender %>% filter(specimen_collection_date == max(specimen_collection_date)) %>% select(specimen_collection_date, characteristic_group, cumulative_cases) %>% pivot_wider(names_from = characteristic_group, values_from = cumulative_cases)
In the next example, we will plot the cumulative confirmed cases by race and ethnicity group:
covid19sf_population %>% filter(characteristic_type == "Race/Ethnicity") %>% arrange(specimen_collection_date) %>% plot_ly(x = ~ specimen_collection_date, y = ~ cumulative_cases, type = 'scatter', mode = 'none', color = ~characteristic_group, stackgroup = 'one') %>% layout(title = "San Francisco Total COVID-19 Confirmed Cases Dist. by Race and Ethnicity", legend = list(x = 0.05, y = 0.9), yaxis = list(title = "Number of Cases", tickformat = ".0f"), xaxis = list(title = "Source: San Francisco Department of Public Health"), hovermode = "compare")
The Homelessness
group provides information about number of new and cumulative COVID-19 cases of homeless in San Francisco:
The following plot describe the daily number of new COVID-19 cases:
covid19sf_population %>% filter(characteristic_type == "Homelessness") %>% plot_ly(x = ~ specimen_collection_date, y = ~ new_cases, color = ~ characteristic_group, type = "scatter", mode = "lines") %>% layout(title = "Confirmed New COVID-19 Cases by Housing Status", yaxis = list(title = "New Cases"), xaxis = list(title = "Source: San Francisco Department of Public Health"), hovermode = "compare")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.