knitr::opts_chunk$set( dpi = 150, collapse = TRUE, comment = "#>", out.width = "100%", fig.width = 8, warning = FALSE, message = FALSE )
The rcaaqs
package strives to automate the calculation of CAAQS metrics outlined by the Canadian Council of Ministers of the Environment. All calculations are derived from the Guidance Document on Achievement Determination Canadian Ambient Air Quality Standards for Fine Particulate Matter and Ozone and Methods for SO~2~ and NO~2~ outlined by Environmental Reporting BC
See the section "Under the hood" for a better idea of how rcaaqs
performs the underlying calculations.
There are 7 CAAQS metrics for 4 different pollutants, in the rcaaqs
package each has a corresponding function:
pm_24h_caaqs()
- Fine Particulate Matter (PM2.5) over 24 hours (3-yr rolling average)pm_annual_caaqs()
- Fine Particulate Matter (PM2.5) over a year (3-yr rolling average)o3_caaqs()
- Ozone over 8 hours (3-yr rolling average)so2_3yr_caaqs()
- SO~2~ over 1 hour (3-yr rolling average)so2_1yr_caaqs()
- SO~2~ over a yearno2_3yr_caaqs()
- NO~2~ over 1 hour (3-yr rolling average)no2_1yr_caaqs()
- NO~2~ over a yearWhile the underlying calculations are different, in general the procedure to calculate any given metric is similar. Therefore we will focus on Annual PM2.5.
Let's start by loading some helpful packages
# For plotting library(ggplot2) # For data manipulation library(dplyr) library(tidyr) # For calculations library(rcaaqs)
The rcaaqs
package contains several sample data sets, let's take a look at the Ozone (O~3~) sample data:
#head(o3_sample_data) head(pm25_sample_data)
The clean_neg()
and date_fill()
are useful helper functions for ensuring all the pollutant values are positive or zero, and that you are starting with a complete dataframe (i.e. no missing hourly readings):
pm25_sample_data_clean <- pm25_sample_data %>% mutate(value = clean_neg(value, type = "pm25")) pm25_sample_data_clean <- pm25_sample_data_clean %>% group_by(ems_id, site) %>% do(., date_fill(., date_col = "date_time", fill_cols = c("ems_id", "site"), interval = "1 hour")) %>% ungroup() head(pm25_sample_data_clean)
Calculating a CAAQS metric is done using one of the 7 the caaqs_metrics
functions. Note you must include grouping variables (i.e. station information in our case here):
pm <- pm_annual_caaqs(pm25_sample_data_clean, by = c("ems_id", "site")) pm
The resulting object is of class caaqs
plus an additional class denoting the
parameter analyzed:
class(pm)
The object contains all of the intermediate analysis results as well as the final caaqs results.
There are accessor/extractor functions to get the different components. See
?get_caaqs
for the different functions. We can get the caaqs results
using the get_caaqs()
function:
get_caaqs(pm)
Besides the actual CAAQS metric values (metric_value
), there are several important columns in the caaqs results dataframe.
CAAQS details
caaqs_year
- The year for which the CAAQS metric is compiledmin_year
& max_year
- The first year and last years included in the rolling average (in this example)n_years
- The total number of years included in a metric. Note that metric_values
are NA for any year that has < 2 years includedCAAQS values
metric
- The name of the metric calculatedmetric_value
- The actual CAAQS metric, calculated and rounded according to the documentationcaaqs
- The 'achievement' level of the CAAQS metricData completeness
flag_daily_incomplete
- Whether any of the daily summaries were kept, but flagged as incompleteflag_yearly_incomplete
- Whether any of the yearly summaries were kept, but flagged as incompleteflag_two_of_three_years
- Whether the 3-yr rolling average is based on only 2 yearsNote that in all cases values of NA
indicate that flagging didn't apply to this metric (see CAAQS Documentation for more details)
Prior to determining management levels, jurisdictions have the option of adjusting their air zone metric values to remove external influences over which jurisdictions have little or no control, such as those related to transboundary flows and exceptional events. These arrangements aim to ensure that jurisdictions are responsible for managing only the emissions sources they can control.
Excluding Dates based on Exceptional Events or Transboundary Flows
First let's take a look at the daily averages. To do this, we need to get the intermediate daily analysis from the CAAQS results useing the extractor functions:
pm_daily_avg <- get_daily(pm) pm_daily_avg
ggplot(data = pm_daily_avg, aes(x = date, y = avg_24h)) + geom_line() + facet_wrap(~ site)
We can also plot the time series with CAAQS results using the built-in plot_ts
function, providing an id (and the column in which to look) of the station we
wish to plot:
plot_ts(pm, id = "0310172", id_col = "ems_id", rep_yr = 2013)
Let's say, for illustration purposes, that all high values in the summer of 2012 were due to forest fires and we want to exclude them.
We'll filter()
the daily data to just these dates and select()
only the columns that reflect the grouping of our data and the dates.
high_dates <- pm_daily_avg %>% filter(date >= as.Date("2012-05-01"), date < as.Date("2012-10-01"), avg_24h > 10) %>% select(ems_id, site, date) high_dates
Determine Air Quality Management System Management Levels
Now we can pass this excluded data on to the caaqs_management()
function via the exclude_df
argument, which specifies the data frame, and the exclude_df_dt
which
specifies the name of the column holding the dates.
pm_excl <- caaqs_management(pm, exclude_df = high_dates, exclude_df_dt = "date") pm_excl pm_excl_caaqs <- get_caaqs(pm_excl) pm_excl_caaqs
Note that there are some new columns in the caaqs plus managament results data frame:
excluded
- Whether any days were excluded due to Exceptional Events or Transboundary Flows (see below). The TRUE
values reflect the fact that some dates in that metric have been excluded. (Note that excluding data in this manner does not influence the evaluation of data completeness).mgmt_metric_value
- The recalculated "management" CAAQS metric, calculated and rounded with excluded days according to the documentationmgmt_level
- The 'management' level of the management CAAQS metricWe can also define excluded dates with date ranges, but here we must specify both start and end column names:
high_dates <- data.frame(ems_id = "0310162", site = "Port Moody Rocky Point Park", start = as.Date(c("2012-06-11", "2012-07-01")), end = as.Date(c("2012-06-25", "2012-08-14"))) high_dates pm_excl <- caaqs_management(pm, exclude_df = high_dates, exclude_df_dt = c("start", "end")) pm_excl_caaqs <- get_caaqs(pm_excl) pm_excl_caaqs
To understand what's happening within the CAAQS functions we can take a look at this function flow chart:
knitr::include_graphics(system.file('function_overview.png', package = 'rcaaqs'))
Each step from the documentation has a corresponding function (or functions) in the rcaaqs
package, but most of these are hidden. The green boxes reflect the wrapper CAAQS functions that perform all the underlying steps. For example, the o3_caaqs()
function performs all the steps outlined in the second green column.
We can match our data to an airzone by using the assign_airzones()
function but you must have the following:
For the following examples, we'll use our pm
CAAQS data, but only for 2013. To
get the management levels, we must run caaqs_management()
on the caaqs
object,
even if we are not adding a data frame of dates to exclude.
pm <- caaqs_management(pm) pm pm_caaqs <- get_caaqs(pm) %>% filter(caaqs_year == 2013) pm_caaqs
We will get our map file from the bcmaps
package (note that you must also install the bcmaps.rdata package
).
library(bcmaps) az <- bcmaps::airzones() class(az) az
For the purposes of this vignette, we'll get our stations coordinates from a file included in the package. You can obtain the most up to date stations data from the BC Data Catalogue.
stations_file <- system.file("air_stations_2018-11-07.csv", package = "rcaaqs") stations <- readr::read_csv(stations_file, na = "N/A") %>% # Read csv file rename_all(tolower) %>% # Rename columns to lower case filter(ems_id %in% unique(pm_caaqs$ems_id)) %>% # Filter to ids in our pm data select(ems_id, latitude, longitude) %>% # Select only the important columns mutate(latitude = as.numeric(latitude), # Transform lat and lon to numeric longitude = as.numeric(longitude)) %>% distinct() # Remove any duplicates stations
Now we'll merge them into our CAAQS data
pm_caaqs <- left_join(pm_caaqs, stations, by = "ems_id") pm_caaqs
Now we have our airzones map and our station coordinates, we can match stations to airzones:
pm_az <- assign_airzone(pm_caaqs, airzones = az, coords = c("longitude", "latitude")) pm_az
pm_az2 <- airzone_metric(pm_az) pm_az2
library(sf) library(gridExtra) az_plot <- st_transform(az, crs = "+proj=longlat") %>% left_join(pm_az2, by = c("Airzone" = "airzone")) g1 <- ggplot() + geom_sf(data = az_plot, aes(fill = caaqs_ambient)) + geom_sf(data = st_as_sf(pm_az, coords = c("lon", "lat"), crs = "+proj=longlat"), shape = 21, colour = "black", aes(fill = caaqs_ambient)) + scale_fill_manual(values = get_colours(type = "achievement"), drop = FALSE) + labs(title = "Achievement Status: PM2.5 Annual CAAQS") + theme(legend.position = c(0.85, 0.85)) g2 <- ggplot() + geom_sf(data = az_plot, aes(fill = mgmt_level)) + geom_sf(data = st_as_sf(pm_az, coords = c("lon", "lat"), crs = "+proj=longlat"), shape = 21, colour = "black", aes(fill = mgmt_level)) + scale_fill_manual(values = get_colours(type = "management"), drop = FALSE) + labs(title = "Management Status: PM2.5 Annual CAAQS") + theme(legend.position = c(0.85, 0.85)) grid.arrange(g1, g2, ncol = 1)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.